Library to create and manipulate PDF documents in Java and C#


The Apache PDFBox library is an open source Java tool for working with PDF documents



I have noticed that content extraction is faster in itext but searching words using regex in the content extracted by itext takes longer time than pdfbox

Pdfbox contains tools for text extraction;itext has more low-level support for text manipulation but you d have to write a considerable amount of code to get text extraction

On the downside pdfbox is less mature than itext so it has less features and documentation available

Pdfbox is a lot slower than itext when it comes to this

Start with pdfbox as it s text extraction abilities are better than itext s

Data comes from Stack Exchange with CC-BY-SA-4.0