myPdf3
 

myPdf3 is an application to convert Acrobat® files to XML format. Extracts the text of every page, and the document / page / image meta data.

myPdf3 supports:

  • Up to Pdf 1.6 (Acrobat 7.0)
  • Fonts Type1, Multiple Master, True Type, Open Type.
  • annotations - links
  • Win/Mac/Adobe/Custom encodings, UCS, Unicode Adobe Glyph List
  • Adobe® XMP
  • Doesn't require Adobe Acrobat or external libraries.
  • batch - automate - IAC
  • Windows® 2000/XP - Linux

Example: 

original file (391Kb)  processed file (22Kb)

myPdf3 is open-sourcehttp://sourceforge.net/projects/mypdf3


 Windows demo / Linux demo (the demo version trims the extracted text)