Letter perception emerges from unsupervised deep learning and recycling of natural image features

The use of written symbols is a major achievement of human cultural evolution. However, how abstract letter representations might be learned from vision is still an unsolved problem 1,2 . Here, we present a large-scale computational model of letter recognition based on deep neural networks 3,4 , which develops a hierarchy of increasingly more complex internal representations in a completely unsupervised way by fitting a probabilistic, generative model to the visual input 5,6 . In line with the hypothesis that learning written symbols partially recycles pre-existing neuronal circuits for object recognition 7 , earlier processing levels in the model exploit domain-general visual features learned from natural images, while domain-specific features emerge in upstream neurons following exposure to printed letters. We show that these high-level representations can be easily mapped to letter identities even for noise-degraded images, producing accurate simulations of a broad range of empirical findings on letter perception in human observers. Our model shows that by reusing natural visual primitives, learning written symbols only requires limited, domain-specific tuning, supporting the hypothesis that their shape has been culturally selected to match the statistical structure of natural environments 8 .

This is a preview of subscription content, access via your institution

Access options

Access Nature and 54 other Nature Portfolio journals

Get Nature+, our best-value online-access subscription

cancel any time

Subscribe to this journal

Receive 12 digital issues and online access to articles

133,45 € per year

only 11,12 € per issue

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Similar content being viewed by others

Word contexts enhance the neural representation of individual letters in early visual cortex

Article Open access 16 January 2020

Scale and translation-invariance for novel objects in human vision

Article Open access 29 January 2020

Qualitative similarities and differences in visual object representations between brains and deep networks

Article Open access 25 March 2021

Change history

References

  1. Grainger, J., Rey, A. & Dufau, S. Letter perception: from pixels to pandemonium. Trends Cogn. Sci.12, 381–387 (2008). ArticlePubMedGoogle Scholar
  2. Finkbeiner, M. & Coltheart, M. Letter recognition: from perception to representation. Cogn. Neuropsychol.26, 1–6 (2009). ArticlePubMedGoogle Scholar
  3. LeCun, Y., Bengio, Y. & Hinton, G. E. Deep learning. Nature521, 436–444 (2015). ArticleCASPubMedGoogle Scholar
  4. Hinton, G. E. & Salakhutdinov, R. Reducing the dimensionality of data with neural networks. Science313, 504–507 (2006). ArticleCASPubMedGoogle Scholar
  5. Zorzi, M., Testolin, A. & Stoianov, I. Modeling language and cognition with deep unsupervised learning: a tutorial overview. Front. Psychol.4, 515 (2013). ArticlePubMed CentralPubMedGoogle Scholar
  6. Hinton, G. E. Learning multiple layers of representation. Trends Cogn. Sci.11, 428–434 (2007). ArticlePubMedGoogle Scholar
  7. Dehaene, S. & Cohen, L. Cultural recycling of cortical maps. Neuron56, 384–398 (2007). ArticleCASPubMedGoogle Scholar
  8. Changizi, M. A., Zhang, Q. & Ye, H. The structures of letters and symbols throughout human history are selected to match those found in objects in natural scenes. Am. Nat.167, 117–139 (2006). ArticleGoogle Scholar
  9. Dehaene, S. Reading in the Brain: The New Science of How We Read (Penguin, London, 2009). Google Scholar
  10. Dehaene, S. & Cohen, L. The unique role of the visual word form area in reading. Trends Cogn. Sci.15, 254–262 (2011). ArticlePubMedGoogle Scholar
  11. Grainger, J., Dufau, S., Montant, M., Ziegler, J. C. & Fagot, J. Orthographic processing in baboons (Papio papio). Science336, 245–248 (2012). ArticleCASPubMedGoogle Scholar
  12. Grainger, J., Dufau, S. & Ziegler, J. C. A vision of reading. Trends Cogn. Sci.1529, 1–9 (2016). Google Scholar
  13. Dehaene, S., Cohen, L., Morais, J. & Kolinsky, R. Illiterate to literate: behavioural and cerebral changes induced by reading acquisition. Nat. Rev. Neurosci.16, 234–244 (2015). ArticleCASPubMedGoogle Scholar
  14. Riesenhuber, M. & Poggio, T. Hierarchical models of object recognition in cortex. Nat. Neurosci.2, 1019–1025 (1999). ArticleCASPubMedGoogle Scholar
  15. Felleman, D. J. & Van Essen, D. C. Distributed hierarchical processing in the primate cerebral cortex. Cereb. Cortex1, 1–47 (1991). ArticleCASPubMedGoogle Scholar
  16. DiCarlo, J. J., Zoccolan, D. & Rust, N. C. How does the brain solve visual object recognition? Neuron73, 415–434 (2012). ArticleCASPubMed CentralPubMedGoogle Scholar
  17. Dehaene, S., Cohen, L., Sigman, M. & Vinckier, F. The neural code for written words: a proposal. Trends Cogn. Sci.9, 335–341 (2005). ArticlePubMedGoogle Scholar
  18. Fiset, D. et al. Features for identification of uppercase and lowercase letters. Psychol. Sci.19, 1161–1168 (2008). ArticlePubMedGoogle Scholar
  19. Polk, T. A. & Farah, M. J. A simple common contexts explanation for the development of abstract letter identities. Neural Comput.9, 1277–1289 (1997). ArticleCASPubMedGoogle Scholar
  20. Testolin, A., Stoianov, I., Sperduti, A. & Zorzi, M. Learning orthographic structure with sequential generative neural networks. Cogn. Sci.40, 579–606 (2016). ArticlePubMedGoogle Scholar
  21. Carreiras, M., Armstrong, B. C., Perea, M. & Frost, R. The what, when, where, and how of visual word recognition. Trends Cogn. Sci.18, 90–98 (2014). ArticlePubMedGoogle Scholar
  22. Pelli, D. G., Farell, B. & Moore, D. C. The remarkable inefficiency of word recognition. Nature423, 752–756 (2003). ArticleCASPubMedGoogle Scholar
  23. Ziegler, J. C., Perry, C. & Zorzi, M. Modelling reading development through phonological decoding and self-teaching: implications for dyslexia. Philos. Trans. R. Soc. Lond. B. Biol. Sci.369, 20120397 (2014). ArticlePubMed CentralPubMedGoogle Scholar
  24. Harm, M. W. & Seidenberg, M. S. Phonology, reading acquisition, and dyslexia: insights from connectionist models. Psychol. Rev.106, 491–528 (1999). ArticleCASPubMedGoogle Scholar
  25. Thesen, T. et al. Sequential then interactive processing of letters and words in the left fusiform gyrus. Nat. Commun.3, 1284 (2012). ArticlePubMed CentralPubMedGoogle Scholar
  26. McClelland, J. L. & Rumelhart, D. E. An interactive activation model of context effects in letter perception: I. An account of basic findings. Psychol. Rev.88, 375–407 (1981). ArticleGoogle Scholar
  27. Rey, A., Dufau, S., Massol, S. & Grainger, J. Testing computational models of letter perception with item-level event-related potentials. Cogn. Neuropsychol.26, 7–22 (2009). ArticlePubMedGoogle Scholar
  28. Di Bono, M. G. & Zorzi, M. Deep generative learning of location-invariant visual word recognition. Front. Psychol.4, 635 (2013). PubMed CentralPubMedGoogle Scholar
  29. Chang, L.-Y., Plaut, D. C. & Perfetti, C. A. Visual complexity in orthographic learning: modeling learning across writing system variations. Sci. Stud. Read.8438, 1–22 (2015). Google Scholar
  30. Friston, K. J. The free-energy principle: a unified brain theory? Nat. Rev. Neurosci.11, 127–138 (2010). ArticleCASPubMedGoogle Scholar
  31. Testolin, A. & Zorzi, M. Probabilistic models and generative neural networks: towards an unified framework for modeling normal and impaired neurocognitive functions. Front. Comput. Neurosci.10, 73 (2016). ArticlePubMed CentralPubMedGoogle Scholar
  32. Stoianov, I. & Zorzi, M. Emergence of a ‘visual number sense’ in hierarchical generative models. Nat. Neurosci.15, 194–196 (2012). ArticleCASPubMedGoogle Scholar
  33. Anderson, M. L. Neural reuse: a fundamental organizational principle of the brain. Behav. Brain Sci.33, 245–313 (2010). ArticlePubMedGoogle Scholar
  34. Simoncelli, E. P. & Olshausen, B. A. Natural image statistics and neural representation. Annu. Rev. Neurosci.24, 1193–1216 (2001). ArticleCASPubMedGoogle Scholar
  35. Olshausen, B. A. & Field, D. J. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature381, 607–609 (1996). ArticleCASPubMedGoogle Scholar
  36. Bell, A. J. & Sejnowski, T. J. The ‘independent components’ of natural scenes are edge filters. Vision Res.37, 3327–3338 (1997). ArticleCASPubMed CentralPubMedGoogle Scholar
  37. Rao, R. P. N. & Ballard, D. H. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nat. Neurosci.2, 79–87 (1999). ArticleCASPubMedGoogle Scholar
  38. Snavely, N., Seitz, S. M. & Szeliski, R. Photo tourism: exploring photo collections in 3D. ACM Trans. Graph.25, 835–846 (2006). ArticleGoogle Scholar
  39. Hubel, D. H. & Wiesel, T. N. Receptive fields and functional architecture of monkey striate cortex. J. Physiol.195, 215–243 (1968). ArticleCASPubMed CentralPubMedGoogle Scholar
  40. Candès, E. & Donoho, D. Ridgelets: a key to higher-dimensional intermittency? Philos. Trans. R. Soc. Lond. A Math. Phys. Eng. Sci.357, 2495–2509 (1999). ArticleGoogle Scholar
  41. Olshausen, B. A. Highly Overcomplete Sparse Coding in Proceedings of SPIE Electronic Imaging8651 (2013).
  42. Hyvärinen, A., Hurri, J. & Hoyer, P. O. Natural Image Statistics: A Probabilistic Approach to Early Computational Vision. (Springer, London, 2009). BookGoogle Scholar
  43. Liu, L. et al. Spatial structure of neuronal receptive field in awake monkey secondary visual cortex (V2). Proc. Natl Acad. Sci. USA113, 1913–1918 (2016). ArticleCASPubMed CentralPubMedGoogle Scholar
  44. Chang, C. H. C. et al. Adaptation of the human visual system to the statistics of letters and line configurations. Neuroimage120, 428–440 (2015). ArticlePubMedGoogle Scholar
  45. Hutzler, F., Ziegler, J. C., Perry, C., Wimmer, H. & Zorzi, M. Do current connectionist learning models account for reading development in different languages? Cognition91, 273–296 (2004). ArticlePubMedGoogle Scholar
  46. Mueller, S. T. & Weidemann, C. T. Alphabetic letter identification: effects of perceivability, similarity, and bias. Acta Psychol. (Amst.)139, 19–37 (2012). ArticleGoogle Scholar
  47. Pelli, D. G., Burns, C. W., Farell, B. & Moore, D. C. Feature detection and letter identification. Vision Res.46, 4646–4674 (2006). ArticlePubMedGoogle Scholar
  48. Moret-Tatay, C. & Perea, M. Do serifs provide an advantage in the recognition of written words? J. Cogn. Psychol.23, 619–624 (2011). ArticleGoogle Scholar
  49. Parish, D. H. & Sperling, G. Object spatial frequencies, retinal spatial frequencies, noise, and the efficiency of letter discrimination. Vision Res.31, 1399–1415 (1991). ArticleCASPubMedGoogle Scholar
  50. Solomon, J. A. & Pelli, D. G. The visual filter mediating letter identification. Nature369, 395–397 (1994). ArticleCASPubMedGoogle Scholar
  51. Majaj, N. J., Pelli, D. G., Kurshan, P. & Palomares, M. The role of spatial frequency channels in letter identification. Vision Res.42, 1165–1184 (2002). ArticlePubMedGoogle Scholar
  52. Bengio, Y. Deep Learning of Representations for Unsupervised and Transfer Learning in Proceedings of the 2011 International Conference on Unsupervised and Transfer Learning Workshop27, 17–36 (2012). Google Scholar
  53. Cottrell, G. W. Looking Around the Backyard Helps to Recognize Faces and Digits. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2008).
  54. Larsen, A. & Bundesen, C. A template-matching pandemonium recognizes unconstrained handwritten characters with high accuracy. Mem. Cognit.24, 136–143 (1996). ArticleCASPubMedGoogle Scholar
  55. Zorzi, M. et al. Extra-large letter spacing improves reading in dyslexia. Proc. Natl Acad. Sci. USA109, 11455–11459 (2012). ArticleCASPubMed CentralPubMedGoogle Scholar
  56. Zachrisson, B. Studies in the Legibility of Printed Text (Almqvist & Wiksell, Stockholm, Sweden, 1965). Google Scholar
  57. Legge, G. E. Psychophysics of Reading: Normal and Low Vision (Lawrence Erlbaum Associates, Mahwah, NJ, 2007). Google Scholar
  58. Wiley, R. W., Wilson, C. & Rapp, B. The effects of alphabet and expertise on letter perception. J. Exp. Psychol. Hum. Percept. Perform.42, 1186–1203 (2016). ArticlePubMed CentralPubMedGoogle Scholar
  59. Snow, C., Burns, S. & Griffin, P. Preventing Reading Difficulties in Young Children (National Academies Press, Washington, DC, 1998). Google Scholar
  60. Hinton, G. E. Training products of experts by minimizing contrastive divergence. Neural Comput.14, 1771–1800 (2002). ArticlePubMedGoogle Scholar
  61. Hertz, J. A., Krogh, A. S. & Palmer, R. G. Introduction to the Theory of Neural Computation (Westview Press, Boulder, CO, 1991). Google Scholar
  62. Townsend, J. T. Theoretical analysis of an alphabetic confusion matrix. Percept. Psychophys.9, 40–50 (1971). ArticleGoogle Scholar
  63. Gilmore, G. C., Hersh, H., Caramazza, A. & Griffin, J. Multidimensional letter similarity derived from recognition errors. Percept. Psychophys.25, 425–431 (1979). ArticleCASPubMedGoogle Scholar
  64. Phillips, J. R., Johnson, K. O. & Browne, H. M. A comparison of visual and two modes of tactual letter resolution. Percept. Psychophys.34, 243–249 (1983). ArticleCASPubMedGoogle Scholar
  65. Loomis, J. M. Analysis of tactile and visual confusion matrices. Percept. Psychophys.31, 41–52 (1982). ArticleCASPubMedGoogle Scholar
  66. Van Der Heijden, A. H. C., Malhas, M. S. M. & van den Roovaart, B. P. An empirical interletter confusion matrix for continuous-line capitals. Percept. Psychophys.35, 85–88 (1984). ArticlePubMedGoogle Scholar
  67. LeBlanc, R. S. & Muise, J. G. Alphabetic confusion: a clarification. Percept. Psychophys.37, 588–591 (1985). ArticleCASPubMedGoogle Scholar
  68. Courrieu, P., Farioli, F. & Grainger, J. Inverse discrimination time as a perceptual distance for alphabetic characters. Vis. Cogn.11, 901–919 (2004). ArticleGoogle Scholar
  69. Simpson, I. C., Mousikou, P., Montoya, J. M. & Defior, S. A letter visual-similarity matrix for Latin-based alphabets. Behav. Res. Methods45, 431–439 (2012). ArticleGoogle Scholar
  70. Boles, D. B. & Clifford, J. E. An upper- and lowercase alphabetic similarity matrix, with derived generation similarity values. Behav. Res. Meth. Instrum. Comput.21, 579–586 (1989). ArticleGoogle Scholar
  71. Podgorny, P. & Garner, W. R. Reaction time as a measure of inter- and intraobject visual similarity: letters of the alphabet. Percept. Psychophys.26, 37–52 (1979). ArticleGoogle Scholar
  72. Pelli, D. G. & Bex, P. Measuring contrast sensitivity. Vision Res.90, 10–14 (2013). ArticlePubMed CentralPubMedGoogle Scholar
  73. Ziskind, A., Henaff, O., LeCun, Y. & Pelli, D. G. The Bottleneck in Human Letter Recognition: a Computational Model in Vision Sciences Society Annual Meeting 2014 (2014).
  74. Testolin, A., Stoianov, I., De Filippo De Grazia, M. & Zorzi, M. Deep unsupervised learning on a desktop PC: a primer for cognitive scientists. Front. Psychol.4, 251 (2013). ArticlePubMed CentralPubMedGoogle Scholar

Acknowledgements

This work was supported by grants from the European Research Council (no. 210922) and University of Padova (Strategic Grant NEURAT) to M.Z., I.S. was supported by a Marie Curie Intra European Fellowship PIEF-GA-2013-622882 within the 7th Framework Programme. We thank J. McClelland for useful discussions and K. Friston for suggestions on the simulation of the neuroimaging data. No funders had any role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Authors and Affiliations

  1. Department of General Psychology and Padova Neuroscience Center, University of Padova, via Venezia 8, Padova, 35131, Italy Alberto Testolin & Marco Zorzi
  2. Laboratoire de Psychologie Cognitive - UMR7290, Centre National de la Recherche Scientifique, Aix-Marseille Université, 3, place Victor Hugo, Marseille, 13331, CEDEX 3, France Ivilin Stoianov
  3. Institute of Cognitive Sciences and Technologies (ISTC), National Research Council (CNR), Via Martiri della Libertà 2, Padova, 35137, Italy Ivilin Stoianov
  4. IRCCS San Camillo Hospital Foundation, via Alberoni 70, Venice-Lido, 30126, Italy Marco Zorzi
  1. Alberto Testolin