Job Type: Full-Time

ABOUT THE ROLE
Build OCR systems for printed and handwritten text in low-resource languages and complex scripts, enabling digitization of manuscripts, archives, and contemporary documents.

KEY RESPONSIBILITIES

  • Develop end-to-end OCR pipelines: detection, segmentation, and recognition.
  • Train and fine-tune models for handwritten text recognition (HTR).
  • Adapt OCR frameworks (Tesseract, TrOCR, PaddleOCR, EasyOCR) to target languages.
  • Generate synthetic data and curate annotated datasets.
  • Implement preprocessing (deskewing, denoising, layout analysis) and optimize for production.

REQUIRED SKILLS & EXPERIENCE

  • 2–4 years in computer vision or OCR.
  • Strong Python with PyTorch or TensorFlow.
  • Hands-on experience with at least one OCR framework.
  • Familiarity with sequence modeling for OCR (CRNN, CTC, attention/seq2seq).
  • Experience with non-Latin or complex scripts (conjuncts, diacritics, ligatures).

NICE TO HAVE

  • Previous experience with Nepali language data and Devanagari script.
  • Direct HTR experience.
  • Document layout analysis and historical document processing.
  • Synthetic data generation pipelines for OCR.
  • Open-source contributions to OCR projects.

WHAT WE OFFER

  • Collaborative and learning-driven work culture
  • Career growth and professional development
  • Competitive salary and benefits

Apply Now
Have any questions?

Get in touch with us

Ruby Shakya

Associate Director of HR and Operations