The Apache PDFBox library is an open source Java tool for working with PDF documents
PDFlib is a library for generating and manipulating files in Adobe’s well known Portable Document Format (PDF).
|Tet slightly better||
"Edit 31 march 2014 for what it s worth i have found that pdfbox is much better at text extraction than itextsharp notwithstanding a bespoke strategy implementation and pdflib tet is slightly better than pdfbox but it s quite expensive"
from question "If identifying text structure in PDF documents is so difficult, how do PDF readers do it so well?"