PDF Table Extraction API

Try on RapidAPI

Free tier: 25 requests/mo (not 100). Paid tiers include overage rates—see pricing.

PDF table extractor, parser, and converter API. Pull structured table data from PDFs into JSON, Excel, or CSV. Multipart or base64 upload. Page selection, lattice/stream detection. Text-based PDFs only, no OCR.

Why use this API?

Tables in PDFs are hard to extract programmatically. This API detects and extracts tables for spreadsheets, data pipelines, or automation.

What the API does

POST /extract with a PDF file (multipart file or JSON base64). Returns JSON (default), Excel (.xlsx), or CSV (ZIP) of extracted tables. Optional query params: outputFormat, pages, strategy, mergeTablesAcrossPages, confidenceScores.

Request & response schema

· ·

Try it in the playground

Add your RapidAPI key and run. Key is sent only to RapidAPI.

Optional parameters (sent as query string):

e.g. all or 1,3-5

File is sent to RapidAPI; not stored here.

Get code

Snippets use the same output format, pages, strategy, merge tables, and confidence scores choices as the playground above. Pick a language and copy, or click Refresh code after changing options.

Host: pdf-table-extraction-api.p.rapidapi.com, POST /extract, multipart field file. JSON by default; xlsx / csv return binary (Excel or ZIP).

Replace YOUR_RAPIDAPI_KEY and the file path. Java example requires OkHttp on the classpath.

What to expect

With outputFormat=json you get a JSON object with tables and summary. With xlsx or csv you get a binary file (Excel or ZIP of CSVs). Use your RapidAPI key in headers. Stateless; no data stored. See RapidAPI docs for full response schema.

Pricing & tiers (RapidAPI)

Quotas and overage differ from our typical 100 / 10k / 100k / 250k listings. Pro, Ultra, and Mega include per-request overage after the included monthly calls.

Basic

$0/mo

25 requests/month included

No overage shown (hard cap unless RapidAPI lists otherwise).

Pro

$9.99/mo

3,500 requests/month included

Overage: ~$0.005 per extra request.

Mega

$99/mo

100,000 requests/month included

Overage: ~$0.002 per extra request.

Confirm current prices on the live PDF Table Extraction API listing.

About this API

Who Should Use This API

Data teams and apps extracting tables from PDF reports and documents.

Also Known As

PDF table extractor API, PDF to table, table extraction API.

PDF Table Extraction API

Extract structured table data from PDF documents. The API makes a best-effort attempt to detect and extract tables within PDFs, returning the data in JSON (default), Excel (.xlsx), or CSV (ZIP) formats.

What This API Does

  • **Table detection** — Identifies table structures within PDFs
  • **Multi-table support** — Extracts all tables found, preserving order
  • **Multiple output formats** — JSON (default), Excel (.xlsx), or CSV (ZIP archive)
  • **Page selection** — Process all pages or specific page ranges
  • **Confidence scores** — Optional confidence metrics per extracted table
  • **Flexible input** — Multipart file upload or JSON body with base64-encoded PDF

**Important:** Best-effort detection. Non-table content (titles, paragraphs, headers) is filtered out. Results depend on PDF structure and quality. Text-based PDFs only; no OCR for scanned documents.

Key Features

  • **Stateless** — No data stored; 25MB max, 60s timeout
  • **Strategies** — `lattice` (grid lines), `stream` (most tables), `auto`
  • **Metadata** — Page ranges, row/column counts, warnings

Use Cases

  • **Data extraction** — Pull tables from reports, invoices, exports
  • **Automation** — Integrate table extraction into pipelines
  • **Spreadsheet import** — Get Excel/CSV for downstream tools

Frequently Asked Questions

  • Basic: 25/mo free. Pro: $9.99 for 3,500/mo + ~$0.005 overage. Ultra: $30 for 15,000/mo + ~$0.003 overage. Mega: $99 for 100,000/mo + ~$0.002 overage. Verify on RapidAPI.
  • No. Fully stateless.
  • Returns structured table data (e.g. JSON).