Template:Tamil Optical Character Recognition Support Project

From Noolaham Foundation
Jump to navigation Jump to search

Project Number NF/PG/2013/0006 Grant Agency / Donors Noolaham Foundation Project Owner Shaseevan Ganeshananthan
Project Mentor Shaseevan Ganeshananthan Project Locations Colombo and Jaffna Project Period 2013 July – 2013 December
Stakeholders Department of Computer Science, University of Jaffna, Library, University of Jaffna (UOJ)

The Tamil Optical Character Recognition Support Project was aimed at providing scanned raw images of rare Tamil documents and to assist the Tamil OCR development project. Noolaham Foundation collaborated with department of Computer Science, University of Jaffna to implement this project. The Tamil digitization project is a joint venture of the Theekshana (School of Computing, University of Colombo) and the department of Computer Science, University of Jaffna and funded by ICTA. The main goal of the project is to develop the tools needed to automatically recognize the most common printed Tamil fonts from scanned images of books and documents for digitizing such content.

Noolaham Foundation provided scanned images of 51 rare documents to the department of Computer Science, University of Jaffna for training and testing the Tamil OCR system through this project. In the documents documented, 15 books (6,450 raw images) were already available at Noolaham Digital Archive. Another 36 documents were digitized specifically for this project with a view to using them for training and testing the Tamil OCR system which is being developed. All 51 documents are made available online through Noolaham Foundation’s Digital Library www.noolaham.org. This project was a successful initiative and received special appreciation from the research community.