The Apache PDFBox library is an open source Java tool for working with PDF documents


PDFlib is a library for generating and manipulating files in Adobe’s well known Portable Document Format (PDF).


Quality Example
Tet slightly better

"Edit 31 march 2014 for what it s worth i have found that pdfbox is much better at text extraction than itextsharp notwithstanding a bespoke strategy implementation and pdflib tet is slightly better than pdfbox but it s quite expensive"

from question "If identifying text structure in PDF documents is so difficult, how do PDF readers do it so well?"

Back to Home
Data comes from Stack Exchange with CC-BY-SA-3.0