A very interesting blog post from the people at socialflow was the inspiration for this little study. The socialflow study analyzed the Twitter outlets of the main news providers like CNN, NYT to find out if they have a common audience, how they compare when it comes to being retweeted an so on. So I thought it is a good idea to try to come up with something similar for German newspapers. Another issue is the simple fact that the media analysis of TV, Radio or newspapers is strongly focusing on the demographics of their readers (see screenshot below) but totally neglect the following issues:
I’ve decided to focus on a couple of very general research questions:
Since the german newspaper ecosystem is quite fragmented there are quite a couple of different publishers and thousands of different (daily, weekly) newspapers and magazines. I’ve decided to focus on the following ones:
What we see from the general overview is that News agencies differ quite a lot in the number of active accounts. SPIEGEL has 24 Twitter accounts with a total of almost 500.000 followers. The leading tabloid BILD despite having a huge reach of 12 Mio Users offline, only accumulates 170.000 followers on Twitter.
When we have a look how the total of these 118 Twitter accounts are linked with each other a pattern appears (see figure below). It seems like the general norm is to have a main Twitter Outlet (e.g. BILD_News or zeitonline) which is connected with the remaining topic specific accounts, which themselves are all connected to the other Twitter outlets of the same publishing house. Twitter Outlets are not connected with each other between different publishers. Comparison of Followers, Messages and Friends Looking at the distribution of followers, I have found that more than 70% of the analyzed accounts have less than 10.000 Followers. Among the Top 20 Follower outlets it is striking that we find more than 8 Outlets of the SPIEGEL Account. Apparently this publisher seems to dominate the field.
If we look at the number of Tweets produced, we see that again around 70% of all accounts have generated less than 10.000 Tweets during their existence. In the top 20 we find such extreme examples like focussport or focusonline, which produce up to 90 tweets / day. At such a frequency I am asking myself how the followers of such accounts cope with the flood of tweets from these accounts.
Looking at the figure of followees we find the most surprising finding. It seems like only the account of TAZ (tazgezwitscher) is following his readers back and at least offers potential to read what readers have to say.This brings us to the question: If we are in social media and interaction with the readership is a given, how do these outlets actually interact with their readers?
To measure how much these outlets interact with their readers, I have collected all tweets of each account and counted how often they refer to somebody using the @ sign. I have made the distinction of counting how often thy refer to their own accounts, and how often they refer to actual readers. The results are rather surprising: Out of 270.000 tweets only 13.000 tweets are actually interacting with somebody. Out of these almost 10000 tweets are referring to own accounts (eg. when BILD_NEWS refers to BILD_Sport). So only 3000 tweets are actually interacting with readers, which is a meager 1%. So we can say that interaction with readers is taking place at a shockingly low level.
There are theories about a connected readership online, speculating that the social media readers of accounts are connected to each other and are exchanging and discussing content online. In order to find out if such a structure is emerging, I have exemplary analyzed the account of fr_online, and collected all of its ~9000 readers. Below you see a spring layout of 9000 nodes in gephi. You find the typical core-periphery structure, where 10% of readers do not have any connections to other readers, 50% of readers have less than 15 links and finally you see that there is a core of highly connected readers. Among these highly connected readers we actually find commercial or celebrity accounts such as: ntvde, derfreitag, Calmund, tagesspiegel_de, Piratenpartei, handelsblatt, hronline, spdde, …
In order to measure how the accounts differ in reader engagement, I have collected all retweets for all tweets of all accounts and created two ratios:
Looking at all accounts I found that almost 90% of all accounts got less than one retweet / tweet on average. This is still a respectable result, if we think of the findings of Romero et. al, who found that users retweet only one in 318 links. If we look at the top 20 accounts with the highest retweets/message ratio, we see that the news breaking account of spiegel emerges with a total of 10 retweets / tweets on average. Similar results are only yielded by the main accounts of ZEIT and TAZ. On the other end of the spectrum we find accounts like focuspanorama ( 11.000 Messages / 14 Retweets) oder focussport (95.000 Nachrichten / 3 Retweets).
Regarding the retweets / follower 79 accounts had a ratio of less than 0,1. Which means that every 10 followers they got one retweet. Among the top 20 the highest ratio of 1 retweet for each 3 followers was achieved by tazgezwitscher. It seems that this account has the most engaged readership, that helps this account to spread their news well beyond their direct readership. Among the accounts with the lowest audience engagement we find BILD_Bundesliga (with 40.000 Followers and 1000 Retweets) or SPIEGEL_Rezens (with 30.000 Followers and 300 Retweets). We can speculate that especially sports related content is not retweeted that often because soccer results are simply consumed and not shared. Exemplary analysis of the engaged readership of one account In order to see the structure of readers that have retweeted at least one tweet from an account I have collected such users for the account of fr_online, laid them out with gephi, and applied the modularisation community finding algorithm. The results below show that readers actually cluster in different communities, which differ on their political orientation or interests.
Knowing that retweets yield an extended readership (see below), one goal was to take a glimpse of what such an extended readership might mean for the reach of one account.
To get an idea how the extended readership helps to boost an accounts reach I have collected all tweets and respective retweets of these accounts. For each retweet I looked up how many followers this reader had. By simply adding up all followers for each reader that did a retweet for this account you get a number that is the potentially maximal extended audience that might have been reached through these retweets. I am saying potentially maximal because I am not taking into account if persons who retweeted messages might have a shared audience (e.g. Imagine reader5 and reader6 being the same person in the figure above)
We notice that the total of 27.000 Retweets of zeitonline have generated an extended audience of 4.2 Mio readers or in the case of tazgezwitscher we see that 15000 retweets resulted in more than 2 Mio additional readers. What we can take away from this calculation is that retweets really change the distribution game: While zeitonline has approximately 80.000 followers they have managed to get some of their news to be seen by a total of 4.2 Mio people , which is a multiplication of ~50x. I think this shows the true power of social media.
When drilling down in the data we have found readers that are especially valuable for an account because they have a high number of followers themselves, serving as huge multipliers for the audience. We find that three cases emerge quite often:
Regarding this diffusion patterns I asked myself if we can compute something similar like a two step flow of information, which is the percentage of retweeted material that has been retweeted because it has been seen not on the original account itself, but has reached a reader by an intermediary. We defined the two-step-flow ratio as:
The number of people that have retweeted an account and follow directly / total amount of people that have retweeted the account.
The ratio can be as high as 1 if everybody that retweeted that account is directly following him and as low as 0 when everybody that retweeted an account is not directly following this account. We have ordered the accounts by the lowest ratio first, and we see that some accounts like zeitonline_wir achieve a ratio of less than 0.5 which means that half of their retweets were from people who were not directly following this account. Now there can be two explanations for such a low ratio: a) people have received the retweet from a broker or middleman and then retweeted it (which is in favor of the two-step-flow hypothesis) or people simply have seen the article on the website and decided to retweet it. Since we didn’t analyse this in detail we can only guess about the percentage, but it would definitely be worth an own analysis.
The final step of this analysis was to find out how many readers the outlets had in common (see orange people in the graphic below). The common readers measure can have a maximal value of 0.5 when e.g. each account has 100 users and both are following both accounts (100/200) or can be minimal 0 when 0 users are in common .
We computed this ratio for each combination of accounts and displayed in a symmetric matrix (see image below). We additionally grouped the accounts in the matrix by publisher (see blue boxes). The higher the ratio the greener the cells , red = lower.
What we see in this visualization is that especially among accounts of the same publisher (e.g. Spiegel_eil, Spiegel_news, Spiegel_reise…) a common readership emerges. Thus people who like the spiegel are very often following the other accounts. This pattern emerges even better when we group the shared audience by the publisher (below). What really strikes out is that the tabloid paper BILD has an audience which is very different from the other audiences. On the other hand “intellectual” and social media established newspapers such as the ZEIT or SPIEGEL seem to share a relative big audience (~ 8%).
If we highlight the shared audience that is three deviations higher than the average value (0,03) we also note that there are certain accounts that are not part of the same publisher but have a very big shared audience (green cells in the matrix below)
Since the matrix above is not really good at showing the structure that emerges in the data, we have simply visualized the data in a network format, connecting the accounts that share an audience, the line-strength was chosen accordingly to the percentage of shared audience (see below)
In this visualization a number of interesting observations emerge:
We have arrived at the end of our little explorative analysis. A couple of take aways are:
That is it for today, I am excited to hear your comments
I am presenting this small analysis tomorrow at the SGKM conference (on journalism, social media and communication) and am excited to hear what the audience has to say.