OCR in Indian languages

Optical character recognition (Also known as OCR) is the process of converting the image into text. OCR for English and other European languages has been able to achieve a high percentage of accuracy in conversion. But the OCR for Indian Languages were not able to achieve the kind of accuracy they achieved. This is mostly due to the complexity of Indian language, lack of standard representation, encoding, support of operating system and keyboard. Centre for Development of Advanced Computing (C-DAC) and Technology Development for Indian Languages, the premier R&D organisation of the Ministry of Electronics and Information Technology (Also known as MeitY) of India has done many projects for OCR. Their projects include OCR for Malayalam, Odia, Punjabi, Telugu and Devanagari script.

Examples

  1. SanskritOCR - OCR software for Sanskrit, Hindi and other Languages of India based on Devanagari Writing system|script.
  2. E-aksharayan - Optical character recognition engine for Indian languages
  3. Chitrankan - It is developed by C-DAC. It processes printed Hindi text either directly from scanner or from an image.

References

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.