The Speech Learning Model

Flege, J. E. (1995). Second language speech learning: Theory, findings, and problems. Speech perception and linguistic experience: Issues in cross-language research, 233-277.

I promised a few weeks ago that I’d write about Jim Flege’s Speech Learning Model next. Dutifully, I read his chapter in Winifred Strange’s Speech Perception and Linguistic Experience, made notes in the margin, carried the paper around with me and looked at it again and again over the course of a week. And then I had to renew my visa, move to a new apartment, apply to the ethics board at Sophia for my next study, work on a couple of programming projects, make some figures for a paper on Chinese schools in Japan… let’s just say I haven’t exactly had a surplus of free time recently. The really horrible thing about doing what you love is that you never have a shortage of things to do.

But! My task scheduler is making me feel increasingly guilty for not writing anything in an entire month, when I had scheduled no less than two updates per week. In order to appease my digital deity and the high priests of GTD, here’s the homework I assigned myself, three weeks late. Mea culpa.

A model of adult acquisition

By now the whole world knows that kids learn languages faster, or better, or more fully than adults do. Or at least they think they know that. I’m still not entirely convinced, myself. It’s pretty hard to test the hypothesis in a principled way. Childhood and adult acquisition of second languages follow totally different processes, and I don’t mean that in a high-minded, theoretical sense. I mean that the specific life circumstances are so wildly different that the comparison seems almost silly. If you want to see if kids are really better at learning languages than adults, you’d need to put a 30-year-old man in kindergarten and have him hang around with 6-year-olds and loving caregivers all day; then you’d need to saddle a six-year-old with a mortgage, two kids, a spouse, and a joyless salaried position working unpaid overtime and washing his brain with barley and hops every evening to unwind.

So just from the outset, let’s say that I’m a bit skeptical about a model that sets out to “account for age-related limits on the ability to produce L2 vowels and consonants in a native-like fashion” (p.237). I don’t think Flege is wrong, of course—that’d be a mighty bold claim coming from a first-year doctoral student. I just don’t think we’re at the point where we can really make a good working model of age differences in acquisition, because we simply can’t factor out all the enumerable other factors, like learning environment, motivations, time constraints, and so on. It’s would be an ambitious project even now, but this paper came out twenty years ago. As a guiding framework, I think SLM does a lot of good things. Even if none of its predictions turns out to be completely accurate, it still charts out the territory a bit, ties the threads of disparate research together, and motivates new research.


Let’s dig into the real nuts and bolts of the model. The table on p. 239 gives a quick rundown of SLM, enumerating four postulates and eight hypotheses that follow from them. The rest of the paper goes about backing up those statements, pretty meticulously, with research on all kinds of topics in second language phonology. I don’t really want to test the mettle of York Press’ legal team, so I won’t reprint the table here, but you can get the paper from Jim Flege’s web page.

The Speech Learning Model starts from four base assumptions. First, that the mental processes used in learning the sounds of one’s first language don’t just dry up at puberty; they’re available throughout life, and we can use them to learn new sounds. Second, that we make mental categories for sounds (phonemes), specific to each language*. Third, those phonemes evolve over the lifetime of a speaker to reflect all of the different sounds (phones) that are identified in the categories. And finally, bilinguals** store all of these phonemes in the same mental space, and try to keep the categories separated between their various languages. In short, we have a system for categorizing sounds that doesn’t (completely) break apart over time, and it develops throughout life, maintaining contrasts that are important to the languages we use.

That seems to map pretty well to my own intuitions. I have a very different perspective on the sounds of English after having learned a bunch of Japanese, and (mostly because of the line of work I’ve chosen for myself) I’m constantly re-evaluating and changing how I think about the sounds of English. If it were really true that the phonological system is totally set around the first year of life, then I shouldn’t be able to produce and distinguish French front rounded vowels. Even my English sound system is clearly changing, since just last week I figured out what exactly is meant by ‘dark L’ and how my own native dialect uses it differently from most other English speakers. But I want to be cautious here: mapping well to my intuitions doesn’t mean the model is accurate. I want a model that reflects reality, and appealing to my intuitions just means that it’s convincing, not necessarily right.


The hypotheses in SLM are drawn pretty directly from observations in the research, and Flege backs them up in detail. I get the sense that they’re not meant to directly follow from the postulates of the SLM; rather, Flege looked at decades worth of data on second language phonology, generated possible explanations for the phenomena he saw, and developed the SLM from there. That seems like a more honest way to go about it, in my opinion.

Flege writes that ‘sounds in the L1 and L2 are related perceptually to one another at a position-sensitive allophonic level’ (p. 239). Let’s flesh that out using his example, Japanese listeners’ perception of the /l/ and /r/ phonemes of English. Contrary to popular belief, it’s actually not the case that Japanese people are always bad at distinguishing /l/ and /r/. Rather, specific phonetic implementations of those sounds (allophones) are more difficult than others. Japanese listeners are pretty good at hearing /l/ and /r/ differences at the beginning of a word or between vowels, and they’re really good at differentiating them at the end of a word. But most Japanese listeners are down under 60% accuracy—almost chance level—when the /l/ or /r/ follows another consonant, like in ‘blue’ vs. ‘brew’. I happen to think (and will probably actually prove in my next experiment) that devoiced /l/ and /r/ are the hardest sounds for Japanese listeners, so they should have the biggest problems differentiating words like ‘clown’ and ‘crown’, or ‘play’ and ‘pray’. So when Flege says that the L1 and L2 sound systems are connected at the level of specific phonetic sounds rather than phonemic categories, I’m right on board with him. I’ll go a step further and say that I think you can explain a pretty wide range of L2 phonology just by looking at the acoustic differences between specific implementations of sounds—the acoustic differences between devoiced /l/ and /r/ are mighty subtle.

From here we look at how new categories form. Flege writes that a learner needs to hear at least some of the ‘phonetic differences’ (I’d prefer to say acoustic differences). From there it follows that the greater the differences, the easier it will be to differentiate sounds, so English speakers are unlikely to mistake the /x/ sound (like ‘Loch’ in Scots or ‘Bach’ in German) for an /h/ sound, even though we don’t usually use /x/. It’s the really similar sounds, like Japanese し /çi/ versus the English ‘shi’ /ʃi/ that trip people up. That gets at the fifth hypothesis, that a new sound will be sort of ‘slotted into’ an existing phonemic category if the sounds are similar enough. This also accounts for Japanese listeners’ trouble with /l/ and /r/ from a different perspective; since they don’t usually need to attend to the part of the acoustic signal that differentiates the two sounds (the 3rd formant frequency), the sounds are very similar to their perspective.

H4 is where I part ways a bit. Flege writes that “The likelihood of phonetic differences between L1 and L2 sounds, and between L2 sounds that are contrastive in the L1, being discerned decreases as AOL [Age of Learning] increases.” On the face of it, yes, adults are less likely to discern the differences, so I agree. I just don’t think it’s necessarily a matter of neural plasticity or maturation, or at least not entirely. Little kids can get away with just shouting random sounds, repeating phrases they like until their parents go crazy, and have absolutely gobs and gobs of input from speakers who are trying to adapt and make their speech easy to hear. Probably neural plasticity plays a part, and probably hormonal differences play a part (see Michael T. Ullman’s D/P Model), but I’m not convinced that all adults are inherently less capable of differentiating new sounds. I’ve taught Japanese retirees to differentiate /b/ and /v/, /f/ and /h/, /r/ and /l/, and all the rest, and they don’t seem to be especially handicapped, given that those students studied for perhaps two or three hours a week max. Little kids get a good 10 or 12 hours of practice a day, so no wonder they’re good at it.

The sixth hypothesis seems like the most intriguing seed for a research proposal, though. Bilinguals, Flege says, will tend to differ from monolinguals, because their systems try to maintain contrasts more strongly, or will be based on attending to different parts of the acoustic signal. Flege uses the example of voice onset time in Spanish-English bilinguals, pre-voicing sounds like /b/ in Spanish more strongly than monolingual Spanish speakers (p. 259). That makes for a stronger contrast with the English /b/ sound, which is really more like a /p/ that lacks aspiration (a puff of air). Listen to French and Spanish speakers pronounce /b/, /d/, and /g/ sounds versus /p/, /t/ and /k/, and you’ll see what I mean. This makes me wonder if bilingual Spanish speakers dentalize their /t/ and /d/ sounds more than monolinguals, in order to make a bigger contrast between English and Spanish consonants.

My take on it

Although I’m not quite on board with the mainstream view on age differences in acquisition, I think the SLM does a good job of honestly assessing and synthesizing the research up to that time. Probably 80% of the paper is given over to summarizing specific research that leads to the SLM’s conclusions, so it’s a really ‘crunchy’ paper. Although I don’t agree with every single interpretation of the data, the overall picture is helpful and worth the read. In the back of my head, I was constantly thinking of ‘yeah-but-what-abouts’ and ‘I-wonder-what-the-result-would-be-ifs’. If you take the position that a good model should generate research questions, then SLM deserves its reputation as a big contribution to second language phonology.

*Fellow Sophia doctoral student Tomohiko Ooigawa pointed out to me that Flege, coming from a psychology background, uses the term phonetic category, but those of us who come at it from a more traditional, Chomsky-and-Halle linguistics perspective say phoneme. I’ma use phoneme, because that’s just how I roll. It’s also a lot shorter.

**Bilingual in the literature almost never means ‘two languages’, it’s just shorthand for anyone who speaks more than one. I think it’s kind of indicative of the monolingual bias in the English-speaking world that we use a term that includes most people in the world as if bilinguals were a special case. Not a criticism of Flege, mind you, it’s just how the discussion has been framed by the field.

I haven’t decided which paper I’m going to look at next, but I’m thinking I’ll step off the heavy theory track for a bit and look at something a little bit less abstract. Maybe something to do with lip-reading, since it’s related to the video research that I’m working on.

Thanks for reading!


