Letter perception emerges from unsupervised deep learning and recycling of natural image features

The use of written symbols is a major achievement of human cultural evolution. However, how abstract letter representations might be learned from vision is still an unsolved problem 1,2 . Here, we present a large-scale computational model of letter recognition based on deep neural networks 3,4 , which develops a hierarchy of increasingly more complex internal representations in a completely unsupervised way by fitting a probabilistic, generative model to the visual input 5,6 . In line with the hypothesis that learning written symbols partially recycles pre-existing neuronal circuits for object recognition 7 , earlier processing levels in the model exploit domain-general visual features learned from natural images, while domain-specific features emerge in upstream neurons following exposure to printed letters. We show that these high-level representations can be easily mapped to letter identities even for noise-degraded images, producing accurate simulations of a broad range of empirical findings on letter perception in human observers. Our model shows that by reusing natural visual primitives, learning written symbols only requires limited, domain-specific tuning, supporting the hypothesis that their shape has been culturally selected to match the statistical structure of natural environments 8 .

This is a preview of subscription content, access via your institution

Access options

Access Nature and 54 other Nature Portfolio journals

Get Nature+, our best-value online-access subscription

cancel any time

Subscribe to this journal

Receive 12 digital issues and online access to articles

133,45 € per year

only 11,12 € per issue

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Prices may be subject to local taxes which are calculated during checkout

Word contexts enhance the neural representation of individual letters in early visual cortex

Article Open access 16 January 2020

Scale and translation-invariance for novel objects in human vision

Article Open access 29 January 2020

Qualitative similarities and differences in visual object representations between brains and deep networks

Article Open access 25 March 2021

Change history

References

Grainger, J., Rey, A. & Dufau, S. Letter perception: from pixels to pandemonium. Trends Cogn. Sci.12, 381–387 (2008). ArticlePubMedGoogle Scholar
Finkbeiner, M. & Coltheart, M. Letter recognition: from perception to representation. Cogn. Neuropsychol.26, 1–6 (2009). ArticlePubMedGoogle Scholar
LeCun, Y., Bengio, Y. & Hinton, G. E. Deep learning. Nature521, 436–444 (2015). ArticleCASPubMedGoogle Scholar
Hinton, G. E. & Salakhutdinov, R. Reducing the dimensionality of data with neural networks. Science313, 504–507 (2006). ArticleCASPubMedGoogle Scholar
Zorzi, M., Testolin, A. & Stoianov, I. Modeling language and cognition with deep unsupervised learning: a tutorial overview. Front. Psychol.4, 515 (2013). ArticlePubMed CentralPubMedGoogle Scholar
Hinton, G. E. Learning multiple layers of representation. Trends Cogn. Sci.11, 428–434 (2007). ArticlePubMedGoogle Scholar
Dehaene, S. & Cohen, L. Cultural recycling of cortical maps. Neuron56, 384–398 (2007). ArticleCASPubMedGoogle Scholar
Changizi, M. A., Zhang, Q. & Ye, H. The structures of letters and symbols throughout human history are selected to match those found in objects in natural scenes. Am. Nat.167, 117–139 (2006). ArticleGoogle Scholar
Dehaene, S. Reading in the Brain: The New Science of How We Read (Penguin, London, 2009). Google Scholar
Dehaene, S. & Cohen, L. The unique role of the visual word form area in reading. Trends Cogn. Sci.15, 254–262 (2011). ArticlePubMedGoogle Scholar
Grainger, J., Dufau, S., Montant, M., Ziegler, J. C. & Fagot, J. Orthographic processing in baboons (Papio papio). Science336, 245–248 (2012). ArticleCASPubMedGoogle Scholar
Grainger, J., Dufau, S. & Ziegler, J. C. A vision of reading. Trends Cogn. Sci.1529, 1–9 (2016). Google Scholar
Dehaene, S., Cohen, L., Morais, J. & Kolinsky, R. Illiterate to literate: behavioural and cerebral changes induced by reading acquisition. Nat. Rev. Neurosci.16, 234–244 (2015). ArticleCASPubMedGoogle Scholar
Riesenhuber, M. & Poggio, T. Hierarchical models of object recognition in cortex. Nat. Neurosci.2, 1019–1025 (1999). ArticleCASPubMedGoogle Scholar
Felleman, D. J. & Van Essen, D. C. Distributed hierarchical processing in the primate cerebral cortex. Cereb. Cortex1, 1–47 (1991). ArticleCASPubMedGoogle Scholar
DiCarlo, J. J., Zoccolan, D. & Rust, N. C. How does the brain solve visual object recognition? Neuron73, 415–434 (2012). ArticleCASPubMed CentralPubMedGoogle Scholar
Dehaene, S., Cohen, L., Sigman, M. & Vinckier, F. The neural code for written words: a proposal. Trends Cogn. Sci.9, 335–341 (2005). ArticlePubMedGoogle Scholar
Fiset, D. et al. Features for identification of uppercase and lowercase letters. Psychol. Sci.19, 1161–1168 (2008). ArticlePubMedGoogle Scholar
Polk, T. A. & Farah, M. J. A simple common contexts explanation for the development of abstract letter identities. Neural Comput.9, 1277–1289 (1997). ArticleCASPubMedGoogle Scholar
Testolin, A., Stoianov, I., Sperduti, A. & Zorzi, M. Learning orthographic structure with sequential generative neural networks. Cogn. Sci.40, 579–606 (2016). ArticlePubMedGoogle Scholar
Carreiras, M., Armstrong, B. C., Perea, M. & Frost, R. The what, when, where, and how of visual word recognition. Trends Cogn. Sci.18, 90–98 (2014). ArticlePubMedGoogle Scholar
Pelli, D. G., Farell, B. & Moore, D. C. The remarkable inefficiency of word recognition. Nature423, 752–756 (2003). ArticleCASPubMedGoogle Scholar
Ziegler, J. C., Perry, C. & Zorzi, M. Modelling reading development through phonological decoding and self-teaching: implications for dyslexia. Philos. Trans. R. Soc. Lond. B. Biol. Sci.369, 20120397 (2014). ArticlePubMed CentralPubMedGoogle Scholar
Harm, M. W. & Seidenberg, M. S. Phonology, reading acquisition, and dyslexia: insights from connectionist models. Psychol. Rev.106, 491–528 (1999). ArticleCASPubMedGoogle Scholar
Thesen, T. et al. Sequential then interactive processing of letters and words in the left fusiform gyrus. Nat. Commun.3, 1284 (2012). ArticlePubMed CentralPubMedGoogle Scholar
McClelland, J. L. & Rumelhart, D. E. An interactive activation model of context effects in letter perception: I. An account of basic findings. Psychol. Rev.88, 375–407 (1981). ArticleGoogle Scholar
Rey, A., Dufau, S., Massol, S. & Grainger, J. Testing computational models of letter perception with item-level event-related potentials. Cogn. Neuropsychol.26, 7–22 (2009). ArticlePubMedGoogle Scholar
Di Bono, M. G. & Zorzi, M. Deep generative learning of location-invariant visual word recognition. Front. Psychol.4, 635 (2013). PubMed CentralPubMedGoogle Scholar
Chang, L.-Y., Plaut, D. C. & Perfetti, C. A. Visual complexity in orthographic learning: modeling learning across writing system variations. Sci. Stud. Read.8438, 1–22 (2015). Google Scholar
Friston, K. J. The free-energy principle: a unified brain theory? Nat. Rev. Neurosci.11, 127–138 (2010). ArticleCASPubMedGoogle Scholar
Testolin, A. & Zorzi, M. Probabilistic models and generative neural networks: towards an unified framework for modeling normal and impaired neurocognitive functions. Front. Comput. Neurosci.10, 73 (2016). ArticlePubMed CentralPubMedGoogle Scholar
Stoianov, I. & Zorzi, M. Emergence of a ‘visual number sense’ in hierarchical generative models. Nat. Neurosci.15, 194–196 (2012). ArticleCASPubMedGoogle Scholar
Anderson, M. L. Neural reuse: a fundamental organizational principle of the brain. Behav. Brain Sci.33, 245–313 (2010). ArticlePubMedGoogle Scholar
Simoncelli, E. P. & Olshausen, B. A. Natural image statistics and neural representation. Annu. Rev. Neurosci.24, 1193–1216 (2001). ArticleCASPubMedGoogle Scholar
Olshausen, B. A. & Field, D. J. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature381, 607–609 (1996). ArticleCASPubMedGoogle Scholar
Bell, A. J. & Sejnowski, T. J. The ‘independent components’ of natural scenes are edge filters. Vision Res.37, 3327–3338 (1997). ArticleCASPubMed CentralPubMedGoogle Scholar
Rao, R. P. N. & Ballard, D. H. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nat. Neurosci.2, 79–87 (1999). ArticleCASPubMedGoogle Scholar
Snavely, N., Seitz, S. M. & Szeliski, R. Photo tourism: exploring photo collections in 3D. ACM Trans. Graph.25, 835–846 (2006). ArticleGoogle Scholar
Hubel, D. H. & Wiesel, T. N. Receptive fields and functional architecture of monkey striate cortex. J. Physiol.195, 215–243 (1968). ArticleCASPubMed CentralPubMedGoogle Scholar
Candès, E. & Donoho, D. Ridgelets: a key to higher-dimensional intermittency? Philos. Trans. R. Soc. Lond. A Math. Phys. Eng. Sci.357, 2495–2509 (1999). ArticleGoogle Scholar
Olshausen, B. A. Highly Overcomplete Sparse Coding in Proceedings of SPIE Electronic Imaging8651 (2013).
Hyvärinen, A., Hurri, J. & Hoyer, P. O. Natural Image Statistics: A Probabilistic Approach to Early Computational Vision. (Springer, London, 2009). BookGoogle Scholar
Liu, L. et al. Spatial structure of neuronal receptive field in awake monkey secondary visual cortex (V2). Proc. Natl Acad. Sci. USA113, 1913–1918 (2016). ArticleCASPubMed CentralPubMedGoogle Scholar
Chang, C. H. C. et al. Adaptation of the human visual system to the statistics of letters and line configurations. Neuroimage120, 428–440 (2015). ArticlePubMedGoogle Scholar
Hutzler, F., Ziegler, J. C., Perry, C., Wimmer, H. & Zorzi, M. Do current connectionist learning models account for reading development in different languages? Cognition91, 273–296 (2004). ArticlePubMedGoogle Scholar
Mueller, S. T. & Weidemann, C. T. Alphabetic letter identification: effects of perceivability, similarity, and bias. Acta Psychol. (Amst.)139, 19–37 (2012). ArticleGoogle Scholar
Pelli, D. G., Burns, C. W., Farell, B. & Moore, D. C. Feature detection and letter identification. Vision Res.46, 4646–4674 (2006). ArticlePubMedGoogle Scholar
Moret-Tatay, C. & Perea, M. Do serifs provide an advantage in the recognition of written words? J. Cogn. Psychol.23, 619–624 (2011). ArticleGoogle Scholar
Parish, D. H. & Sperling, G. Object spatial frequencies, retinal spatial frequencies, noise, and the efficiency of letter discrimination. Vision Res.31, 1399–1415 (1991). ArticleCASPubMedGoogle Scholar
Solomon, J. A. & Pelli, D. G. The visual filter mediating letter identification. Nature369, 395–397 (1994). ArticleCASPubMedGoogle Scholar
Majaj, N. J., Pelli, D. G., Kurshan, P. & Palomares, M. The role of spatial frequency channels in letter identification. Vision Res.42, 1165–1184 (2002). ArticlePubMedGoogle Scholar
Bengio, Y. Deep Learning of Representations for Unsupervised and Transfer Learning in Proceedings of the 2011 International Conference on Unsupervised and Transfer Learning Workshop27, 17–36 (2012). Google Scholar
Cottrell, G. W. Looking Around the Backyard Helps to Recognize Faces and Digits. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2008).
Larsen, A. & Bundesen, C. A template-matching pandemonium recognizes unconstrained handwritten characters with high accuracy. Mem. Cognit.24, 136–143 (1996). ArticleCASPubMedGoogle Scholar
Zorzi, M. et al. Extra-large letter spacing improves reading in dyslexia. Proc. Natl Acad. Sci. USA109, 11455–11459 (2012). ArticleCASPubMed CentralPubMedGoogle Scholar
Zachrisson, B. Studies in the Legibility of Printed Text (Almqvist & Wiksell, Stockholm, Sweden, 1965). Google Scholar
Legge, G. E. Psychophysics of Reading: Normal and Low Vision (Lawrence Erlbaum Associates, Mahwah, NJ, 2007). Google Scholar
Wiley, R. W., Wilson, C. & Rapp, B. The effects of alphabet and expertise on letter perception. J. Exp. Psychol. Hum. Percept. Perform.42, 1186–1203 (2016). ArticlePubMed CentralPubMedGoogle Scholar
Snow, C., Burns, S. & Griffin, P. Preventing Reading Difficulties in Young Children (National Academies Press, Washington, DC, 1998). Google Scholar
Hinton, G. E. Training products of experts by minimizing contrastive divergence. Neural Comput.14, 1771–1800 (2002). ArticlePubMedGoogle Scholar
Hertz, J. A., Krogh, A. S. & Palmer, R. G. Introduction to the Theory of Neural Computation (Westview Press, Boulder, CO, 1991). Google Scholar
Townsend, J. T. Theoretical analysis of an alphabetic confusion matrix. Percept. Psychophys.9, 40–50 (1971). ArticleGoogle Scholar
Gilmore, G. C., Hersh, H., Caramazza, A. & Griffin, J. Multidimensional letter similarity derived from recognition errors. Percept. Psychophys.25, 425–431 (1979). ArticleCASPubMedGoogle Scholar
Phillips, J. R., Johnson, K. O. & Browne, H. M. A comparison of visual and two modes of tactual letter resolution. Percept. Psychophys.34, 243–249 (1983). ArticleCASPubMedGoogle Scholar
Loomis, J. M. Analysis of tactile and visual confusion matrices. Percept. Psychophys.31, 41–52 (1982). ArticleCASPubMedGoogle Scholar
Van Der Heijden, A. H. C., Malhas, M. S. M. & van den Roovaart, B. P. An empirical interletter confusion matrix for continuous-line capitals. Percept. Psychophys.35, 85–88 (1984). ArticlePubMedGoogle Scholar
LeBlanc, R. S. & Muise, J. G. Alphabetic confusion: a clarification. Percept. Psychophys.37, 588–591 (1985). ArticleCASPubMedGoogle Scholar
Courrieu, P., Farioli, F. & Grainger, J. Inverse discrimination time as a perceptual distance for alphabetic characters. Vis. Cogn.11, 901–919 (2004). ArticleGoogle Scholar
Simpson, I. C., Mousikou, P., Montoya, J. M. & Defior, S. A letter visual-similarity matrix for Latin-based alphabets. Behav. Res. Methods45, 431–439 (2012). ArticleGoogle Scholar
Boles, D. B. & Clifford, J. E. An upper- and lowercase alphabetic similarity matrix, with derived generation similarity values. Behav. Res. Meth. Instrum. Comput.21, 579–586 (1989). ArticleGoogle Scholar
Podgorny, P. & Garner, W. R. Reaction time as a measure of inter- and intraobject visual similarity: letters of the alphabet. Percept. Psychophys.26, 37–52 (1979). ArticleGoogle Scholar
Pelli, D. G. & Bex, P. Measuring contrast sensitivity. Vision Res.90, 10–14 (2013). ArticlePubMed CentralPubMedGoogle Scholar
Ziskind, A., Henaff, O., LeCun, Y. & Pelli, D. G. The Bottleneck in Human Letter Recognition: a Computational Model in Vision Sciences Society Annual Meeting 2014 (2014).
Testolin, A., Stoianov, I., De Filippo De Grazia, M. & Zorzi, M. Deep unsupervised learning on a desktop PC: a primer for cognitive scientists. Front. Psychol.4, 251 (2013). ArticlePubMed CentralPubMedGoogle Scholar

Acknowledgements

This work was supported by grants from the European Research Council (no. 210922) and University of Padova (Strategic Grant NEURAT) to M.Z., I.S. was supported by a Marie Curie Intra European Fellowship PIEF-GA-2013-622882 within the 7th Framework Programme. We thank J. McClelland for useful discussions and K. Friston for suggestions on the simulation of the neuroimaging data. No funders had any role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Authors and Affiliations

Department of General Psychology and Padova Neuroscience Center, University of Padova, via Venezia 8, Padova, 35131, Italy Alberto Testolin & Marco Zorzi
Laboratoire de Psychologie Cognitive - UMR7290, Centre National de la Recherche Scientifique, Aix-Marseille Université, 3, place Victor Hugo, Marseille, 13331, CEDEX 3, France Ivilin Stoianov
Institute of Cognitive Sciences and Technologies (ISTC), National Research Council (CNR), Via Martiri della Libertà 2, Padova, 35137, Italy Ivilin Stoianov
IRCCS San Camillo Hospital Foundation, via Alberoni 70, Venice-Lido, 30126, Italy Marco Zorzi

Alberto Testolin