What an incredible article by Eugene Sadko—unmissable if you’re even remotely curious about the Cyrillic alphabet and Slavic languages. It’s a fascinating text analysis of how they work and how much they differ from English, providing a good amount of visual information to get a good sense of just how much.
Like Eugene mentions, he paints a good portrait of the Russian language. If you don’t feel like diving deeper into the topic, here’s why you should consider learning more about Cyrillic: it’s probably bigger than you think. Here are its top languages, and how many speakers they have:
- Russian (260 million)
- Ukranian (50 million)
- Serbo-Croatian (21 million)
- Bulgarian (9 million)
- Belarusian (9 million)
And we’re just barely scratching the surface on Cyrillic. This alphabet, as a whole, is used by more than 100 languages and more than 300 million speakers combined. So as you can see, there’s quite a demand for high-quality typefaces in this world too, for a very good reason.
Eugene gives us a lot of interesting stats about the language, through a lot of thorough analysis, on the overall composition of the Russian language:
The letters о, е, а, и, н make up almost half of all texts in Russian
Russian-language texts have 40 times less uppercase than lowercase letters and they are used in special cases: At the beginning of sentences, in proper names, abbreviations, and all caps headings (there is no Title Case in Russian). Therefore, their distribution in texts is different.
But it gets even more fascinating! Eugene ran frequency analysis on Russian texts and got a whopping number: 37% of the language consist of 100 words. And eight single-letter words и, в, с, я, а, к, у, о cover 13% of the National Corpus.
One fun fact I didn’t know about is how different Russian is regarding title-cases: in fact, capital letters only account for 2.6% of the language, and starting a sentence with a capital letter follows different rules than we tend to do in most Roman languages.
Test words for early typeface drafts may also be improved if one takes to account the prevalence of letters. For example, a ten-letter «Нобельфайк» represents 47% of Russian-language texts and nearly all types of graphemes in the Russian alphabet. Not bad for a single word. But «Нобервудлайк» covers 60% of the language, all forms, ascenders and descenders, being just three letters longer.
More alphabet and language dissections like this are always welcome ✍️