Portfolio Details

Traditional Nepali typing methods can be time-consuming, tedious, and prone to errors, especially in professional and official contexts. Vox solves this by offering Nepali speakers a quick way to convert Nepali speech to text. Designed to meet the needs of professionals, students, and officials, it converts spoken Nepali into written text effortlessly.

Key Features:

High Transcription Accuracy: Achieves 95% accuracy in Nepali speech-to-text conversion, ensuring precise and reliable transcription for clear speech.
Multi-Speaker Recognition: Incorporates Speaker Diarization to distinguish between multiple speakers, making it ideal for meetings and discussions.
Localized for the Nepali Language: Specifically tailored to Nepali language and dialects, providing native users with enhanced transcription accuracy and usability.
Optimized for Professional Use: With a 15% CER and noise reduction, Vox ensures reliable performance, particularly for government and professional applications.
Continuous Learning and Improvement: Vox continuously improves its performance through adaptive learning, ensuring ongoing enhancement in transcription accuracy and reliability.

Architecture

Vox uses the Wav2Vec 2.0 model to achieve accuracy in Nepali speech-to-text conversion, incorporating Speaker Diarization for speaker differentiation and acoustic models to capture contextual nuances. The system is fine-tuned to minimize transcription error with built-in noise reduction algorithms for improved performance in challenging environments.
Vox continuously adapts and enhances its accuracy through ongoing learning and model refinement, ensuring reliable transcription and efficient processing of spoken Nepali across varied contexts.