Job Type: Full-Time
ABOUT THE ROLE
Build OCR systems for printed and handwritten text in low-resource languages and complex scripts, enabling digitization of manuscripts, archives, and contemporary documents.
KEY RESPONSIBILITIES
- Develop end-to-end OCR pipelines: detection, segmentation, and recognition.
- Train and fine-tune models for handwritten text recognition (HTR).
- Adapt OCR frameworks (Tesseract, TrOCR, PaddleOCR, EasyOCR) to target languages.
- Generate synthetic data and curate annotated datasets.
- Implement preprocessing (deskewing, denoising, layout analysis) and optimize for production.
REQUIRED SKILLS & EXPERIENCE
- 2–4 years in computer vision or OCR.
- Strong Python with PyTorch or TensorFlow.
- Hands-on experience with at least one OCR framework.
- Familiarity with sequence modeling for OCR (CRNN, CTC, attention/seq2seq).
- Experience with non-Latin or complex scripts (conjuncts, diacritics, ligatures).
NICE TO HAVE
- Previous experience with Nepali language data and Devanagari script.
- Direct HTR experience.
- Document layout analysis and historical document processing.
- Synthetic data generation pipelines for OCR.
- Open-source contributions to OCR projects.
WHAT WE OFFER
- Collaborative and learning-driven work culture
- Career growth and professional development
- Competitive salary and benefits
Have any questions?
Get in touch with us
Ruby Shakya
Associate Director of HR and Operations