(by Michael Lavine)
I recently came across two articles about linguistics that make their points with nothing more than basic statistics: simple Markov chains and simple linear regression.
First, in 2008, was Theoretical and empirical evidence for the impact of inductive biases on cultural evolution, by Griffiths, Kalish, and Lewandowsky, in Philosophical Transactions of the Royal Society, B, available at http://rstb.royalsocietypublishing.org/content/363/1509/3503.full.pdf.
The authors compare two hypotheses about language evolution. As they explain, “Research on language evolution also explores the relationship between inductive biases and cultural transmission, examining how constraints on language learning influence the languages that a population of learners comes to speak. Human languages form a subset of all logically possible communication schemes, with some properties being shared by all languages (Greenberg 1963; Comrie 1981; Hawkins 1988). Traditionally, these ‘linguistic universals’ are explained by appealing to the constraints of an innate system specific to the acquisition of language (e.g. Chomsky 1965). A popular alternative explanation is that the universal properties of human languages have arisen as a consequence of languages being learned anew by each generation, with each learner having only weak, domain-general inductive biases (e.g. Kirby 2001). This alternative explanation relies upon the possibility that cultural transmission can emphasize the inductive biases of language learners, allowing such weak biases to be translated into strong and systematic universals of the kind seen in human languages.
“… we explore … the effects of inductive biases on one simple form of knowledge transmission: the case where information is passed from one person to another …” Their conclusion is that weak biases of language learners are sufficient to explain the common features of modern languages and we need not posit strong constraints of an innate system specific to language acquisition.
The tools are laboratory experiments and simple Markov chain models.
The second article, published by Atkinson in Science in 2011, is Phonemic Diversity Supports a Serial Founder Effect Model of Language Expansion from Africa.
The article shows “that the number of phonemes [perceptually distinct units of sound] used in a global sample of 504 languages … fits a serial founder–effect model of expansion from an inferred origin in Africa. This result, which is not explained by more recent demographic history, local language diversity, or statistical non-independence within language families, … supports an African origin of modern human languages.”
The primary data are presented in a scatterplot. Each point in the plot represents a language. (There are about 500 points.) The x-axis is the language’s distance from Africa; the y-axis is the language’s number of phonemes. The relationship is roughly linear, or at least monotonically decreasing, which the author interprets as showing that language arose only once in the world — in Africa — and dispersed from there, rather than arising multiple times in multiple locations. He makes an analogy to the relationship between genetic diversity and distance from Africa, which is widely interpreted to mean that human beings arose only once — in Africa — and dispersed from there.