PyMuPDF (fitz) concatenates table cells, returns empty strings on scanned pages, and drops text inside figures. Azure Document Intelligence's prebuilt-layout model recovers all three: native table cells, OCR for every page, and figure text. The downstream RAG pipeline reads the same relational tables regardless of engine.
Tap to vote and see what everyone thinks.