
 
The scientific achievements appraising conference of multi-font printed Mongolian (mixed with Chinese and English) document recognition system and unified platform based Ethnic languages document recognition system organized by the Ministry of Education was held on January 29, 2007 at Tsinghua University.
The multi-font printed Mongolian document recognition system was developed by Professor Xiaoqing Ding’s research group, which includes experts and researchers from Tsinghua University, Inner Mongolia University and Inner Mongolia Normal University. Integrated with kernel modules of document layout analysis, text line and character segmentation, character recognition, lengthways text edit (revising, deleting, inserting, etc.) and special character display, the system solved the difficult problem of inputting large amount of Mongolian printed documents into the computer automatically, by converting document images scanned by an ordinary scanner, into pure text files or other formats of electronic documents (word, PDF, XML, etc.) that can be read and retrieved conveniently by computers. The system can robustly process mixed Mongolian/English/Chinese documents, integrated with the English and Chinese OCR (Optical Character Recognition) engines developed before. Since effective multi-font multi-size Mongolian character recognition techniques and multi-level information based character segmentation strategies were designed and implemented, the system achieved high recognition accurate of 96.89% on actual Mongolian (mixed with Chinese and English) document images collected from published books, magazines and newspapers, which indicated its promising application perspective.

On the basis of the abovementioned multi-font printed Mongolian (Mixed with Chinese and English) document recognition system and other 3 counterpart systems, say, multi-font printed Korean (Mixed with Chinese and English) document recognition system, multi-font printed Tibetan (Mixed with Chinese and English) document recognition system and multi-font printed Uighur/Kazakh/Kirghiz (Mixed with Chinese and English) document recognition system, which were developed in 2002, 2003 and 2004, respectively, a unified platform based Ethnic languages document recognition system was presented and integrated successfully. The system harmoniously put the recognition procedures of china’s 6 main minority nationality character sets, namely, Mongolian, Tibetan, Uighur, Kazakh, Korean and Kirghiz, into a uniform framework. By virtue of its modularized structure, universal inner-code output, similar human-computer interface and automatic layout analysis, the system can be used to recognize minority nationality character sets in the same way and to switch from document image in one language to that in another feasibly. Besides, it is provided with flexible expansibility, which makes it very easy to apply the proposed system to recognize other minority nationality characters at cost of some simple revisal work.

It is well known that characters are foundation of informatization, however, to input characters into computers automatically has become a bottleneck and key step of modern informatization process. It is an exciting issue to be congratulated that Tsinghua University developed a unified platform based Ethnic languages document recognition system. This system will certainly accelerate the informatization construction in the wide ethnic regions and strengthen the cultural communication and cooperation among China’s nationalities.
On the conference, the experts gave high praise to the system. The final appraisement is: “The system solved the difficult problem of practical Mongolian, Tibetan, Uighur, Kazakh, Korean and Kirghiz document recognition on a uniform platform for the first time in the world and improves the recognition performance for china’s minority nationality document. The main system performance indices reached the international leading level.”
At the conference, Tsinghua University gave the TH-OCR® softwares - unified platform based Ethnic languages document recognition system as gifts to National Press, Inner Mongolian Library, Xinjiang Daily Agency, and Xihua News Agency, etc.  (From Department of Electronic Engineering)
(Photo by Guo Haijun)