Google Research

How difficult is it to develop a perfect spell-checker? A cross-linguistic analysis through complex network approach

  • Monojit Choudhury
  • Markose Thomas
  • Animesh Mukherjee
  • Niloy Ganguly
  • Anupam Basu
Textgraphs 2 Workshop, at HLT/NAACL, ACL (2007), pp. 8

Abstract

The difficulties involved in spelling error detection and correction in a language have been investigated in this work through the conceptualization of SpellNet - a weighted network of words, where edges indicate orthographic proximity between two words. We construct SpellNets for three languages - Bengali, English and Hindi. Through appropriate mathematical analysis and/or intuitive justification, we interpret the different topological metrics of SpellNet from the perspective of the issues related to spell-checking. We make many interesting observations, the most significant being that the probability of making a read word error in a language is proportionate to the average weighted degree of SpellNet, which is found to be highest for Hindi, followed by Bengali and English.

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work