Massive Study Shows How Languages Change

(Inside Science)-- More than 100 years ago, the playwright Oscar Wilde had one of his British characters say that England and America "have everything in common nowadays except, of course, language.” It turns out, according to linguists, he was almost right. But lately, the two languages are getting closer.
Languages change over time -- some faster than others. Some reflect changes in the world around them, according to a new paper published by The Royal Society in London. There are universal and historical factors at work, and languages change at varying rates, the scientists found.
The researchers used the Google Books Ngram corpus to monitor word and phrase usage in the past five centuries in eight languages. They drew from 8 million books – roughly 6 percent of all the books ever published, according to Google's own estimates. The books were scanned into a database by Google.
While linguists have always known that the changes vary, this use of the gigantic Google database is by far the largest.
The researchers were an international group that ironically had its own language difficulties.
The lead author was Søren Wichmann, a Dane working at the Max Plank Institute for Evolutionary Anthropology in Leipzig, Germany. His coauthors were Valery Solovyev, a linguist at Kazan Federal University in the Republic of Tartarstan in Russia, and astrophysicist Vladimir Bochkarev, also at Kazan, who was interested in languages. The work was done at the Kazan linguistics lab.
Research was hampered by the fact Wichmann did not speak Russian, and Bochkarev didn’t speak English.
Wichmann’s wife translated part of the time. Otherwise they used Google’s translator, which was not always useful.
For this study, they delved into written languages, which are more conservative in their expressions, rather than tackle spoken languages for which there is no good record. They looked specifically at how frequently words were used. Each word form counted as one word; for instance "park" and "parked" were counted as two different words.
The process they used is called "glottochronology" by linguists.
Language Shaped by Culture
“One word which was earlier specialized might take on a broader meaning and can replace the word that had a broader meaning before,” Wichmann said.
Sometimes it is just a matter of fashion; sometimes it is outside events. For instance, the early English word for “dog” was “hound.” Now “hound" is a specific kind of dog. The same thing may be happening in reverse to the word “vodka,” which in some places is replacing “liquor.”
“Any major change in society will change the frequency of words,” Wichmann said.
Mostly, the researchers found, languages change at a similar rate but that rate usually is measured in terms of half a century unless something intervenes, like a war. When wars come, Wichmann said, changes in vocabulary came more rapidly as new words like “Nazis” came into the language and people start thinking about things they did not contemplate before hostilities, he said.
During the Victorian era, the height of the British Empire and a very stable time in Britain, the language was fairly steady. With the tumult and chaos of the 20th century, vocabulary changes came more rapidly.
From about 1850 on, British English and American English drifted apart. For the first half of the 19th century the Queen’s English and American English were the same except that the British English lagged behind about 20 years. New words came into the American English lexicon, but only appeared in Britain about 20 years later.
Then, the influence of the mass media began to bring the two languages together starting in 1950. Now, the two languages are far more similar than they were before, Wichmann said.
Challenges in Learning Languages
Ever wonder why some languages are harder for adults to learn than others? The researchers point out that languages contain what linguists call a “kernel lexicon,” meaning a list of words that constitute 75 percent of the written language. If you know those words, you can make out much of the literature. These also are the words least likely to change even as the language morphs.
The kernel lexicon for English is less than 2,400 words. If you know them you can read 75 percent of the text. The kernel lexicon for Russian is about 24,000 words. So, even though the whole of the English language has about 600,000 words and Russian only has about a sixth of that, without the crucial 21,000 kernel words, most Russian writing would be largely incomprehensible.
"The fact a given word might be used a lot in one period doesn't necessarily mean the word is new," said Brian Joseph, distinguished university professor of linguistics at the Ohio State University in Columbus. For instance, one word now trending in English is "cupcake."
Sometimes words combine, like "labradoodles," he said.
Definitions change too. Some words meant one thing to Shakespeare but mean something else to us, said David Lightfoot, a professor of linguistics at Georgetown University in Washington, D.C. "Scientist" is in the current lexicon but before the 19th century, they were called "natural philosophers."
Sometimes the change in wording tells us more than we think it would. In recent years, the use of the word “divorce” has become more frequent than “marry,” Wichmann said.
Perhaps more telling, “information” is replacing “wisdom.”
Joel Shurkin is a freelance writer based in Baltimore. He is the author of nine books on science and the history of science, and has taught science journalism at Stanford University, UC Santa Cruz and the University of Alaska Fairbanks. He tweets at @shurkin.
http://www.insidescience.org/content/massive-study-shows-how-languages-change/2096
no subject
Date: 2014-10-12 05:10 pm (UTC)1. В Оксфордском словаре 600 тысяч, в словаре Даля - 200 тысяч слов. В новых порядка 130-150 тысяч. В любом случае, отношение не 6:1. Но это всё равно очень хреновый источник для статистики, потому что это число единиц в словаре, а не в речи носителей. Методы включения лексики в словари могут очень сильно различаться.
2. И при чём тут это вообще, если анализ делали по корпусам текстов, а не по словарям?
3. Глоттохронология - это отбор базовой лексики потенциально родственных языков для оценки процента совпадений, чтобы определить, как давно языки разошлись от общего предка. Если авторы что-то об этом и писали, то в статье этого не видно.
4. Глоттохронология работает только при соблюдении ряда строгих требований к подбору лексики. Учёт всех словоформ как отдельных лексем не отвечает этим требованиям никак.
5. Зато понятно, почему они насчитали 24 тысячи необходимых слов для общения на русском. Так-то у Пушкина в активном словаре было всего 20 тысяч.
По ссылкам, может, и стоит пройти. Что по ссылкам, я ещё не смотрела.
Но читать то, что пишет про лингвистику этот журналист, точно бессмысленно.
no subject
Date: 2014-10-12 07:03 pm (UTC)no subject
Date: 2014-10-12 08:00 pm (UTC)no subject
Date: 2014-10-12 09:40 pm (UTC)Я так понимаю, за слово принимают кусок текста от пробела до пробела или до знака препинания.
no subject
Date: 2014-10-12 06:17 pm (UTC)одни только падежи, рода и суффиксы чего стоят
не говоря уже о произношении
no subject
Date: 2014-10-12 06:30 pm (UTC)no subject
Date: 2014-10-12 07:20 pm (UTC)no subject
Date: 2014-10-12 07:23 pm (UTC)no subject
Date: 2014-10-12 09:43 pm (UTC)- очень большое количество согласных звуков
- которые ечаются в довольно жестоких кластерах, нетипичных для других языков
- часто не соответствуют тому, что написано; типично по звонкости/глухости и твердости/мягкости. "молодежь" читается мʌлəд'ош. Имеются также другие приколы, не так, как написано, произносятся окончания -ого, -его, слова типа то, мякий, шибли, сонце, лесница, , и.т.п.
- тонкости ассимиляции по звонкости-мягкости ("мо горит", "и шести")
- редукция гласных (в слове "молоко" все буквы "о" обозначают разные звуки)
- тонкости просодии: одно и то же предложение можно произнести кучей способов и то будет влиять на смысл
"ТЫ меня предал!" (а Вася - не предал)
"Ты МЕНЯ предал!" (меня лично, а не нашу роту)
"Ты меня ПРЕДАЛ!" (вот гад)
"Ты меня предал." (какая жалость)
"Ты меня предал?" (или нет?)
- отсутствие точен над буквой ё, что приводит к тому, что "е" читается непредсказуемо (полетный vs. заметный, не говоря уже о все/всё)
- подвижное ударение (ногА, но поднял нОгу, а уронил так вообще нА ногу)
Можно, наверное, еще добавить - это так, навскидку.
no subject
Date: 2014-10-13 02:22 am (UTC)no subject
Date: 2014-10-13 02:59 pm (UTC)no subject
Date: 2014-10-12 06:30 pm (UTC)no subject
Date: 2014-10-12 07:13 pm (UTC)>they used Google's translator
Вся эта история очень сомнительна. Начиная с использования Google Books Ngram для оценки динамики словоупотребления и заканчивая привлечением к научному исследованию людей, которые двух слов по-английски не могут связать.