Docling, an open-source parser from IBM Research, extracts table cells, OCR, and captions from PDFs entirely on a local machine. No API key or cloud upload is required. The output produces relational tables compatible with downstream RAG pipelines, making it suitable for enterprise documents that cannot leave the building.
Tap to vote and see what everyone thinks.
Summary by ByteBrief
Azure Layout Parses PDF Tables PyMuPDF Misses