Incubrain - Incubrain

Best-in-Class Marathi OCR

Benchmark Results

Independently benchmarked on 2,500 real Marathi document line images. Our purpose-built model outperforms every major open-source OCR system across accuracy, word recognition, exact line matching, and speed.

6.23% Character Error Rate

Lowest error rate of any tested system on the MarathiLine benchmark. PaddleOCR scores 8.34% — over 33% more errors per character.

21.91% Word Error Rate

Half the word error rate of PaddleOCR (44.69%) because our model correctly predicts word boundaries — something competitors cannot do.

35.8% Exact Line Match

5x higher exact match rate than any competitor (PaddleOCR 7.4%, EasyOCR 5.6%). Over a third of all lines are recognised perfectly.

124.5 Lines Per Second

13–20x faster than every competitor. Enables processing of large government document archives in hours, not weeks.

MarathiLine 2.5K Benchmark

How We Compare

All models tested on the same 2,500-line dataset of real Marathi text — balanced across clean printed, degraded, synthetic multi-font, and mixed real sources. Same hardware, same preprocessing, character-level and word-level evaluation.

Character Error Rate (CER)

Lower is better — percentage of incorrectly recognised characters

Incubrain

6.23%

PaddleOCR

8.34%

EasyOCR

15.31%

Tesseract

16.72%

Word Error Rate (WER)

Lower is better — percentage of incorrectly recognised words

Incubrain

21.91%

PaddleOCR

44.69%

EasyOCR

52.75%

Tesseract

48.75%

Exact Line Match

Higher is better — percentage of lines recognised perfectly

Incubrain

35.8%

PaddleOCR

7.4%

EasyOCR

5.6%

Tesseract

5.6%

Throughput (lines/second)

Higher is better — processing speed on the same hardware

Incubrain

124.5

PaddleOCR

6.2

EasyOCR

9.5

Tesseract

7.7

Methodology

How the MarathiLine benchmark works — designed to be a robust, independently verifiable Marathi text recognition benchmark.

Dataset — MarathiLine 2.5K

2,500 line images sampled from real scanned documents — clean printed, degraded archival material, synthetic multi-font, and mixed real sources. Balanced across difficulty levels.

Controlled Conditions

All models run on identical hardware with the same image preprocessing. No cherry-picking — every line in the dataset is evaluated.

Four Evaluation Metrics

Character Error Rate, Word Error Rate, Exact Line Match, and Throughput. Industry-standard metrics computed using standard edit distance at both character and word levels.

Reproducibility

Dataset, evaluation scripts, and model weights will be published openly. Any researcher or institution can independently verify results.

Open Source

Coming Soon

Our model, benchmark dataset, and evaluation tools will be published openly — enabling independent verification and further research.

Model on Hugging Face

Pre-trained Marathi OCR model weights — download, fine-tune, or deploy. Coming soon.

MarathiLine Benchmark

The full 2,500-line evaluation dataset with ground truth — for independent verification. Coming soon.

Review Pipeline

See how documents flow from raw scans to verified, AI-ready text through our quality control system.