“Kyiv” vs “Kiev” — using AI to trace propaganda within Reddit communities, part 2

LEXYR
5 min readMay 17, 2022

--

In our last article, we talked about memes — atomic units of knowledge and information that can be traced through a population not unlike an infection or a virus. We talked about how tracking certain memes could give us an insight into spheres of geopolitical influence — in particular, the implicit allegiance that is signaled by spelling the name of the Ukrainian capital currently embroiled in the center of an international conflict.

To recap, the key points were:

  1. “Kiev”/”Kyiv” has become a political shibboleth, a meme that signals your allegiances. This fact is recognized by Ukrainian officials and many others.
  2. On the social networking site Reddit, post submissions to the community are the leaders of public opinion. The community adapts around the content submitted to it — not vice versa. Using a keyword tracking tool like SocialGrep, one can trace the introduction of “Kyiv” spelling to posts submitted to the social network.

While the word “redditor” is widely used in modern parlance to describe a specific culture, a specific group of people, the site itself is anything but united. Inter-community scuffles happen on the daily, some subsections of Reddit even making it their goal to document “drama” — conflicts between groups of the site’s users. We highlighted sitewide trends in our previous article — but stopping short of digging into the individual subreddits? We would be doing ourselves a disservice.

Quantifying dialogue is still an unsolved problem. However, metrics can help us see what simple reading wouldn’t show. What were the accepted spellings among different subreddits? How did they change since February the 24th of 2022, the beginning of military action?

The laws of the English language did not change overnight- nor did most Redditors’ immediate surroundings. The impulse came from online news sources and social media. The spread of “Kyiv” is a quantifier of the public’s response to news — as a baseline group of people with no access to mass media would not change their preferred spelling of a foreign city’s name on a whim.

One of the easiest, most straightforward ways to see the effect of an event on a community is to compare the before and after. Take a reference period — two months before the date, and compare it to the period of two months after. How did the proportion of people using “Kyiv” change between the different subreddits?

Note how the most extreme shifts are congregated around Ukrainian and other regional subreddits. Samuel Woolley, a writer for the Brookings Institution, has the following to say on the topic of location’s role in propaganda efforts:

In the last five years there has been a dramatic rise in global concern over illicit socio-political uses of digital technology and citizens’ personal data, and what we are calling “geo-propaganda” is the latest of these tactics to combine personal data with pernicious online advertising tactics in order to spread propaganda. Geo-propaganda refers to the novel use of geofencing and other mechanisms to gather digital location-tracking data and to then use that data in political messaging and advertising across a variety of platforms.

The chart above allows us to divide most subreddits participating in the discussion into three rough groups — based on whether the proportions seen above increased, decreased, or stayed the same.

The informed group stays up to date with the latest news. Their reaction is based on the information provided to them by governments and verified sources. /r/teenagers is a prime example of such a community — an illustration of the common wisdom of “net-savvy” youth.

The uninformed, or steadfast, participate in discussion of the topic as well. However, their conservative attitude towards most news sources does not allow them to evaluate the gravity of the situation as appropriately and rapidly as the first group. That being said, misinformation is not prevalent among this group either.

The misinformed consume information on Ukraine as well — however, the sources vary from journalism of dubious quality to malicious government propaganda. Subreddits of this group are most likely to engage in behavior such as war crime denial, spreading information confirmed to be false, or amplifying dangerous controversial narratives. Thankfully, the misinformed are the smallest group.

The dominating narratives in the informational conflict can be found by further inspecting the kinds of comments made. Below we compare the percentage-wise likelihoods of comment keywords co-occurring with mentions of “Kiev” and “Kyiv” correspondingly.

Left — top words occurring with “Kiev”, compared to “Kyiv”. Right — vice versa.

It is not surprising to see narratives already seen before take the lead. The extent of civilian casualties during the conflict has been an active spot in the Reddit information battleground, and prevalence of foreign words shows propaganda being aimed towards non-English language sources. Nostupidquestions from the right column was seen before as one of the highest “Kyiv”-supporting subreddits — an educational community for people who feel out of the loop on settled issues.

An interesting takeaway noticed from the table above is the abundance of formatting symbols and technical information in the right column. Those include “autotldr”, a name of a helper utility designed to process news articles, and combinations of punctuation symbols. That signals that “Kyiv” is mentioned more in comments rich in formatting, and ones made by helper tools like automoderation bots — which shows institutional support within Reddit for “Kyiv” as an idea.

Another vector of propaganda investigation is so-called topic modelling — machine powered selection of chief topics of discussion within a community. By analyzing dominant phrases in the Ukraine discussion, we can see which topics are colored more or less by malicious propaganda efforts.

Visualized topic distributions. The difference in circles’ sizes represents preferences towards spelling.

We can see that noticeable preference towards “Kiev” is observed in topics #11 and #34. What are the keyword compositions of these topics?

Keyword compositions of detected topics #11 and #34.

As we can see, they are not English. Spanish, popular in /r/Argentina, dominates the left column — while French places itself on the right, further cementing the geopropaganda angle.

In conclusion, we can clearly see that propaganda has evolved far from its pre-Internet roots. Modern technology offers new, more immersive methods to deliver propaganda to people — and statistical methods can often be used to identify and stop foreign influence on social media.

— Aleph

If Internet propaganda interests you, then you can use a good tool to sift through it. SocialGrep is a reliable Reddit search engine with high level of precision and plenty of useful features including dataset export and real-time alerts. Sign up now to start your investigations — whether they are amateur OSINT ventures or professional data projects.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

LEXYR
LEXYR

Written by LEXYR

Data-driven investigative journalism. What the people say.

No responses yet

Write a response