Letter perception emerges from unsupervised deep learning and recycling of natural image features
The use of written symbols is a major achievement of human cultural evolution. However, how abstract letter representations might be learned from vision is still an unsolved problem 1,2 . Here, we present a large-scale computational model of letter recognition based on deep neural networks 3,4 , which develops a hierarchy of increasingly more complex internal representations in a completely unsupervised way by fitting a probabilistic, generative model to the visual input 5,6 . In line with the hypothesis that learning written symbols partially recycles pre-existing neuronal circuits for object recognition 7 , earlier processing levels in the model exploit domain-general visual features learned from natural images, while domain-specific features emerge in upstream neurons following exposure to printed letters. We show that these high-level representations can be easily mapped to letter identities even for noise-degraded images, producing accurate simulations of a broad range of empirical findings on letter perception in human observers. Our model shows that by reusing natural visual primitives, learning written symbols only requires limited, domain-specific tuning, supporting the hypothesis that their shape has been culturally selected to match the statistical structure of natural environments 8 .
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
133,45 € per year
only 11,12 € per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Word contexts enhance the neural representation of individual letters in early visual cortex
Article Open access 16 January 2020
Scale and translation-invariance for novel objects in human vision
Article Open access 29 January 2020
Qualitative similarities and differences in visual object representations between brains and deep networks
Article Open access 25 March 2021
Change history
References
- Grainger, J., Rey, A. & Dufau, S. Letter perception: from pixels to pandemonium. Trends Cogn. Sci.12, 381–387 (2008). ArticlePubMedGoogle Scholar
- Finkbeiner, M. & Coltheart, M. Letter recognition: from perception to representation. Cogn. Neuropsychol.26, 1–6 (2009). ArticlePubMedGoogle Scholar
- LeCun, Y., Bengio, Y. & Hinton, G. E. Deep learning. Nature521, 436–444 (2015). ArticleCASPubMedGoogle Scholar
- Hinton, G. E. & Salakhutdinov, R. Reducing the dimensionality of data with neural networks. Science313, 504–507 (2006). ArticleCASPubMedGoogle Scholar
- Zorzi, M., Testolin, A. & Stoianov, I. Modeling language and cognition with deep unsupervised learning: a tutorial overview. Front. Psychol.4, 515 (2013). ArticlePubMed CentralPubMedGoogle Scholar
- Hinton, G. E. Learning multiple layers of representation. Trends Cogn. Sci.11, 428–434 (2007). ArticlePubMedGoogle Scholar
- Dehaene, S. & Cohen, L. Cultural recycling of cortical maps. Neuron56, 384–398 (2007). ArticleCASPubMedGoogle Scholar
- Changizi, M. A., Zhang, Q. & Ye, H. The structures of letters and symbols throughout human history are selected to match those found in objects in natural scenes. Am. Nat.167, 117–139 (2006). ArticleGoogle Scholar
- Dehaene, S. Reading in the Brain: The New Science of How We Read (Penguin, London, 2009). Google Scholar
- Dehaene, S. & Cohen, L. The unique role of the visual word form area in reading. Trends Cogn. Sci.15, 254–262 (2011). ArticlePubMedGoogle Scholar
- Grainger, J., Dufau, S., Montant, M., Ziegler, J. C. & Fagot, J. Orthographic processing in baboons (Papio papio). Science336, 245–248 (2012). ArticleCASPubMedGoogle Scholar
- Grainger, J., Dufau, S. & Ziegler, J. C. A vision of reading. Trends Cogn. Sci.1529, 1–9 (2016). Google Scholar
- Dehaene, S., Cohen, L., Morais, J. & Kolinsky, R. Illiterate to literate: behavioural and cerebral changes induced by reading acquisition. Nat. Rev. Neurosci.16, 234–244 (2015). ArticleCASPubMedGoogle Scholar
- Riesenhuber, M. & Poggio, T. Hierarchical models of object recognition in cortex. Nat. Neurosci.2, 1019–1025 (1999). ArticleCASPubMedGoogle Scholar
- Felleman, D. J. & Van Essen, D. C. Distributed hierarchical processing in the primate cerebral cortex. Cereb. Cortex1, 1–47 (1991). ArticleCASPubMedGoogle Scholar
- DiCarlo, J. J., Zoccolan, D. & Rust, N. C. How does the brain solve visual object recognition? Neuron73, 415–434 (2012). ArticleCASPubMed CentralPubMedGoogle Scholar
- Dehaene, S., Cohen, L., Sigman, M. & Vinckier, F. The neural code for written words: a proposal. Trends Cogn. Sci.9, 335–341 (2005). ArticlePubMedGoogle Scholar
- Fiset, D. et al. Features for identification of uppercase and lowercase letters. Psychol. Sci.19, 1161–1168 (2008). ArticlePubMedGoogle Scholar
- Polk, T. A. & Farah, M. J. A simple common contexts explanation for the development of abstract letter identities. Neural Comput.9, 1277–1289 (1997). ArticleCASPubMedGoogle Scholar
- Testolin, A., Stoianov, I., Sperduti, A. & Zorzi, M. Learning orthographic structure with sequential generative neural networks. Cogn. Sci.40, 579–606 (2016). ArticlePubMedGoogle Scholar
- Carreiras, M., Armstrong, B. C., Perea, M. & Frost, R. The what, when, where, and how of visual word recognition. Trends Cogn. Sci.18, 90–98 (2014). ArticlePubMedGoogle Scholar
- Pelli, D. G., Farell, B. & Moore, D. C. The remarkable inefficiency of word recognition. Nature423, 752–756 (2003). ArticleCASPubMedGoogle Scholar
- Ziegler, J. C., Perry, C. & Zorzi, M. Modelling reading development through phonological decoding and self-teaching: implications for dyslexia. Philos. Trans. R. Soc. Lond. B. Biol. Sci.369, 20120397 (2014). ArticlePubMed CentralPubMedGoogle Scholar
- Harm, M. W. & Seidenberg, M. S. Phonology, reading acquisition, and dyslexia: insights from connectionist models. Psychol. Rev.106, 491–528 (1999). ArticleCASPubMedGoogle Scholar
- Thesen, T. et al. Sequential then interactive processing of letters and words in the left fusiform gyrus. Nat. Commun.3, 1284 (2012). ArticlePubMed CentralPubMedGoogle Scholar
- McClelland, J. L. & Rumelhart, D. E. An interactive activation model of context effects in letter perception: I. An account of basic findings. Psychol. Rev.88, 375–407 (1981). ArticleGoogle Scholar
- Rey, A., Dufau, S., Massol, S. & Grainger, J. Testing computational models of letter perception with item-level event-related potentials. Cogn. Neuropsychol.26, 7–22 (2009). ArticlePubMedGoogle Scholar
- Di Bono, M. G. & Zorzi, M. Deep generative learning of location-invariant visual word recognition. Front. Psychol.4, 635 (2013). PubMed CentralPubMedGoogle Scholar
- Chang, L.-Y., Plaut, D. C. & Perfetti, C. A. Visual complexity in orthographic learning: modeling learning across writing system variations. Sci. Stud. Read.8438, 1–22 (2015). Google Scholar
- Friston, K. J. The free-energy principle: a unified brain theory? Nat. Rev. Neurosci.11, 127–138 (2010). ArticleCASPubMedGoogle Scholar
- Testolin, A. & Zorzi, M. Probabilistic models and generative neural networks: towards an unified framework for modeling normal and impaired neurocognitive functions. Front. Comput. Neurosci.10, 73 (2016). ArticlePubMed CentralPubMedGoogle Scholar
- Stoianov, I. & Zorzi, M. Emergence of a ‘visual number sense’ in hierarchical generative models. Nat. Neurosci.15, 194–196 (2012). ArticleCASPubMedGoogle Scholar
- Anderson, M. L. Neural reuse: a fundamental organizational principle of the brain. Behav. Brain Sci.33, 245–313 (2010). ArticlePubMedGoogle Scholar
- Simoncelli, E. P. & Olshausen, B. A. Natural image statistics and neural representation. Annu. Rev. Neurosci.24, 1193–1216 (2001). ArticleCASPubMedGoogle Scholar
- Olshausen, B. A. & Field, D. J. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature381, 607–609 (1996). ArticleCASPubMedGoogle Scholar
- Bell, A. J. & Sejnowski, T. J. The ‘independent components’ of natural scenes are edge filters. Vision Res.37, 3327–3338 (1997). ArticleCASPubMed CentralPubMedGoogle Scholar
- Rao, R. P. N. & Ballard, D. H. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nat. Neurosci.2, 79–87 (1999). ArticleCASPubMedGoogle Scholar
- Snavely, N., Seitz, S. M. & Szeliski, R. Photo tourism: exploring photo collections in 3D. ACM Trans. Graph.25, 835–846 (2006). ArticleGoogle Scholar
- Hubel, D. H. & Wiesel, T. N. Receptive fields and functional architecture of monkey striate cortex. J. Physiol.195, 215–243 (1968). ArticleCASPubMed CentralPubMedGoogle Scholar
- Candès, E. & Donoho, D. Ridgelets: a key to higher-dimensional intermittency? Philos. Trans. R. Soc. Lond. A Math. Phys. Eng. Sci.357, 2495–2509 (1999). ArticleGoogle Scholar
- Olshausen, B. A. Highly Overcomplete Sparse Coding in Proceedings of SPIE Electronic Imaging8651 (2013).
- Hyvärinen, A., Hurri, J. & Hoyer, P. O. Natural Image Statistics: A Probabilistic Approach to Early Computational Vision. (Springer, London, 2009). BookGoogle Scholar
- Liu, L. et al. Spatial structure of neuronal receptive field in awake monkey secondary visual cortex (V2). Proc. Natl Acad. Sci. USA113, 1913–1918 (2016). ArticleCASPubMed CentralPubMedGoogle Scholar
- Chang, C. H. C. et al. Adaptation of the human visual system to the statistics of letters and line configurations. Neuroimage120, 428–440 (2015). ArticlePubMedGoogle Scholar
- Hutzler, F., Ziegler, J. C., Perry, C., Wimmer, H. & Zorzi, M. Do current connectionist learning models account for reading development in different languages? Cognition91, 273–296 (2004). ArticlePubMedGoogle Scholar
- Mueller, S. T. & Weidemann, C. T. Alphabetic letter identification: effects of perceivability, similarity, and bias. Acta Psychol. (Amst.)139, 19–37 (2012). ArticleGoogle Scholar
- Pelli, D. G., Burns, C. W., Farell, B. & Moore, D. C. Feature detection and letter identification. Vision Res.46, 4646–4674 (2006). ArticlePubMedGoogle Scholar
- Moret-Tatay, C. & Perea, M. Do serifs provide an advantage in the recognition of written words? J. Cogn. Psychol.23, 619–624 (2011). ArticleGoogle Scholar
- Parish, D. H. & Sperling, G. Object spatial frequencies, retinal spatial frequencies, noise, and the efficiency of letter discrimination. Vision Res.31, 1399–1415 (1991). ArticleCASPubMedGoogle Scholar
- Solomon, J. A. & Pelli, D. G. The visual filter mediating letter identification. Nature369, 395–397 (1994). ArticleCASPubMedGoogle Scholar
- Majaj, N. J., Pelli, D. G., Kurshan, P. & Palomares, M. The role of spatial frequency channels in letter identification. Vision Res.42, 1165–1184 (2002). ArticlePubMedGoogle Scholar
- Bengio, Y. Deep Learning of Representations for Unsupervised and Transfer Learning in Proceedings of the 2011 International Conference on Unsupervised and Transfer Learning Workshop27, 17–36 (2012). Google Scholar
- Cottrell, G. W. Looking Around the Backyard Helps to Recognize Faces and Digits. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2008).
- Larsen, A. & Bundesen, C. A template-matching pandemonium recognizes unconstrained handwritten characters with high accuracy. Mem. Cognit.24, 136–143 (1996). ArticleCASPubMedGoogle Scholar
- Zorzi, M. et al. Extra-large letter spacing improves reading in dyslexia. Proc. Natl Acad. Sci. USA109, 11455–11459 (2012). ArticleCASPubMed CentralPubMedGoogle Scholar
- Zachrisson, B. Studies in the Legibility of Printed Text (Almqvist & Wiksell, Stockholm, Sweden, 1965). Google Scholar
- Legge, G. E. Psychophysics of Reading: Normal and Low Vision (Lawrence Erlbaum Associates, Mahwah, NJ, 2007). Google Scholar
- Wiley, R. W., Wilson, C. & Rapp, B. The effects of alphabet and expertise on letter perception. J. Exp. Psychol. Hum. Percept. Perform.42, 1186–1203 (2016). ArticlePubMed CentralPubMedGoogle Scholar
- Snow, C., Burns, S. & Griffin, P. Preventing Reading Difficulties in Young Children (National Academies Press, Washington, DC, 1998). Google Scholar
- Hinton, G. E. Training products of experts by minimizing contrastive divergence. Neural Comput.14, 1771–1800 (2002). ArticlePubMedGoogle Scholar
- Hertz, J. A., Krogh, A. S. & Palmer, R. G. Introduction to the Theory of Neural Computation (Westview Press, Boulder, CO, 1991). Google Scholar
- Townsend, J. T. Theoretical analysis of an alphabetic confusion matrix. Percept. Psychophys.9, 40–50 (1971). ArticleGoogle Scholar
- Gilmore, G. C., Hersh, H., Caramazza, A. & Griffin, J. Multidimensional letter similarity derived from recognition errors. Percept. Psychophys.25, 425–431 (1979). ArticleCASPubMedGoogle Scholar
- Phillips, J. R., Johnson, K. O. & Browne, H. M. A comparison of visual and two modes of tactual letter resolution. Percept. Psychophys.34, 243–249 (1983). ArticleCASPubMedGoogle Scholar
- Loomis, J. M. Analysis of tactile and visual confusion matrices. Percept. Psychophys.31, 41–52 (1982). ArticleCASPubMedGoogle Scholar
- Van Der Heijden, A. H. C., Malhas, M. S. M. & van den Roovaart, B. P. An empirical interletter confusion matrix for continuous-line capitals. Percept. Psychophys.35, 85–88 (1984). ArticlePubMedGoogle Scholar
- LeBlanc, R. S. & Muise, J. G. Alphabetic confusion: a clarification. Percept. Psychophys.37, 588–591 (1985). ArticleCASPubMedGoogle Scholar
- Courrieu, P., Farioli, F. & Grainger, J. Inverse discrimination time as a perceptual distance for alphabetic characters. Vis. Cogn.11, 901–919 (2004). ArticleGoogle Scholar
- Simpson, I. C., Mousikou, P., Montoya, J. M. & Defior, S. A letter visual-similarity matrix for Latin-based alphabets. Behav. Res. Methods45, 431–439 (2012). ArticleGoogle Scholar
- Boles, D. B. & Clifford, J. E. An upper- and lowercase alphabetic similarity matrix, with derived generation similarity values. Behav. Res. Meth. Instrum. Comput.21, 579–586 (1989). ArticleGoogle Scholar
- Podgorny, P. & Garner, W. R. Reaction time as a measure of inter- and intraobject visual similarity: letters of the alphabet. Percept. Psychophys.26, 37–52 (1979). ArticleGoogle Scholar
- Pelli, D. G. & Bex, P. Measuring contrast sensitivity. Vision Res.90, 10–14 (2013). ArticlePubMed CentralPubMedGoogle Scholar
- Ziskind, A., Henaff, O., LeCun, Y. & Pelli, D. G. The Bottleneck in Human Letter Recognition: a Computational Model in Vision Sciences Society Annual Meeting 2014 (2014).
- Testolin, A., Stoianov, I., De Filippo De Grazia, M. & Zorzi, M. Deep unsupervised learning on a desktop PC: a primer for cognitive scientists. Front. Psychol.4, 251 (2013). ArticlePubMed CentralPubMedGoogle Scholar
Acknowledgements
This work was supported by grants from the European Research Council (no. 210922) and University of Padova (Strategic Grant NEURAT) to M.Z., I.S. was supported by a Marie Curie Intra European Fellowship PIEF-GA-2013-622882 within the 7th Framework Programme. We thank J. McClelland for useful discussions and K. Friston for suggestions on the simulation of the neuroimaging data. No funders had any role in study design, data collection and analysis, decision to publish or preparation of the manuscript.
Author information
Authors and Affiliations
- Department of General Psychology and Padova Neuroscience Center, University of Padova, via Venezia 8, Padova, 35131, Italy Alberto Testolin & Marco Zorzi
- Laboratoire de Psychologie Cognitive - UMR7290, Centre National de la Recherche Scientifique, Aix-Marseille Université, 3, place Victor Hugo, Marseille, 13331, CEDEX 3, France Ivilin Stoianov
- Institute of Cognitive Sciences and Technologies (ISTC), National Research Council (CNR), Via Martiri della Libertà 2, Padova, 35137, Italy Ivilin Stoianov
- IRCCS San Camillo Hospital Foundation, via Alberoni 70, Venice-Lido, 30126, Italy Marco Zorzi
- Alberto Testolin