You are reading..
Datamining, Journalism

Audience analysis of major Twitter news outlets


A very interesting blog post from the people at socialflow was the inspiration for this little study. The socialflow study analyzed the Twitter outlets of the main news providers like CNN, NYT to find out if they have a common audience, how they compare when it comes to being retweeted an so on. So I thought it is a good idea to try to come up with something similar for German newspapers. Another issue is the simple fact that the media analysis of TV, Radio or newspapers is strongly focusing on the demographics of their readers (see screenshot below)  but totally neglect the following issues:

  • Social media as a medium (incl. Twitter, Facebook etc..) is not analyzed at all (how do accounts compare on their followers, friending, tweet content/frequency …)
  • The reader’s relationships with each other ( is there a connected audience?)
  • How the readership is extended by the sharing functions (retweets)  (How do stories get passed along, which ones are the most popular…)

A screenshot from ma-reichweite.de

Research Questions

I’ve decided to focus on a couple of very general research questions:

  • How many outlets does each publisher have and how are they connected with each other?
  • How do accounts compare regarding their Followers, Friends and Messages?
  • How does the  user engagement in terms of retweets differ between the outlets?
  • How do retweets help to reach a wider audience?
  • Is there a shared audience between those accounts and publishers?


Since the german newspaper ecosystem is quite fragmented there are quite a couple of different publishers and thousands of different (daily, weekly) newspapers and magazines. I’ve decided to focus on the following ones:


What we see from the general overview is that News agencies differ quite a lot in the number of active accounts. SPIEGEL has 24 Twitter accounts with a total of almost 500.000 followers. The leading tabloid BILD despite having a huge reach of 12 Mio Users offline, only accumulates 170.000 followers on Twitter.

Structure of the Twitter Outlets among each other

When we have a look how the total of these 118 Twitter accounts are linked with each other a pattern appears (see figure below). It seems like the general norm is to have a main Twitter Outlet (e.g. BILD_News or zeitonline) which is connected with the remaining topic specific accounts, which themselves are all connected to the other Twitter outlets of the same publishing house. Twitter Outlets are not connected with each other between different publishers. Comparison of Followers, Messages and Friends Looking at the distribution of followers, I have found that more than 70% of the analyzed accounts have less than 10.000 Followers. Among the Top 20 Follower outlets it is striking that we find more than 8 Outlets of the SPIEGEL Account. Apparently this publisher seems to dominate the field.

Overview of Tweets

If we look at the number of Tweets produced, we see that again around 70% of all accounts have generated less than 10.000 Tweets during their existence. In the top 20 we find such extreme examples like focussport or focusonline, which produce up to 90 tweets / day. At such a frequency I am asking myself how the followers of such accounts cope with the flood of tweets from these accounts.

Followees distribution

Looking at the figure of followees we find the most surprising finding. It seems like only the account of TAZ (tazgezwitscher) is following his readers back and at least offers potential to read what readers have to say.This brings us to the question: If we are in social media and interaction with the readership is a given, how do these outlets actually interact with their readers?

 Interaction with readers

To measure how much these outlets interact with their readers, I have collected all tweets of each account and counted how often they refer to somebody using the @ sign. I have made the distinction of counting how often thy refer to their own accounts, and how often they refer to actual readers. The results are rather surprising: Out of 270.000 tweets only 13.000  tweets are actually interacting with somebody. Out of these almost 10000 tweets are referring to own accounts (eg. when  BILD_NEWS refers to BILD_Sport). So only 3000 tweets are actually interacting with readers, which is a meager 1%. So we can say that interaction with readers is  taking place at a shockingly low level.

How does a readership of an account look like

There are theories about a connected readership online, speculating that the social media readers of accounts are connected to each other and are exchanging and discussing content online. In order to find out if such a structure is emerging, I have exemplary analyzed the account of fr_online, and collected all of its ~9000 readers. Below you see a spring layout of 9000 nodes in gephi. You find the typical core-periphery structure, where 10% of readers do not have any connections to other readers, 50% of readers have less than 15 links and finally you see that there is a core of highly connected readers. Among these highly connected readers we actually find commercial or celebrity accounts such as: ntvde, derfreitag, Calmund, tagesspiegel_de, Piratenpartei, handelsblatt, hronline, spdde, …

Network layout of 9000 readers of fr_online

Engagement of Readers

In order to measure how the accounts differ in reader engagement, I have collected all retweets for all tweets of all accounts and created two ratios:

  • Retweets / Message
  • Retweets / Follower

Retweets / Message

Looking at all accounts I found that almost 90% of all accounts got less than  one retweet / tweet on average. This is still a respectable result, if we think of the findings of Romero et. al, who found that users retweet only one in 318 links.  If we look at the top 20 accounts with the highest retweets/message ratio, we see that the news breaking account of spiegel emerges with a total of 10 retweets / tweets on average. Similar results are only yielded by  the main accounts of ZEIT and TAZ. On the other end of the spectrum we find accounts like focuspanorama ( 11.000 Messages / 14 Retweets) oder focussport (95.000 Nachrichten / 3 Retweets).

Retweets / Follower

Regarding the retweets / follower 79 accounts had a ratio of less than 0,1. Which means that every 10 followers they got one retweet. Among the top 20 the highest ratio of 1 retweet for each 3 followers was achieved by tazgezwitscher. It seems that this account has the most engaged readership, that helps this account to spread their news well beyond their direct readership. Among the accounts with the lowest audience engagement we find  BILD_Bundesliga (with 40.000 Followers and 1000 Retweets) or  SPIEGEL_Rezens (with 30.000 Followers and 300 Retweets). We can speculate that especially sports related content is not retweeted that often because soccer results are simply consumed and not shared. Exemplary analysis of the engaged readership of one account In order to see the structure of readers that have retweeted at least one tweet from an account I have collected such users for the account of fr_online, laid them out with gephi, and applied the modularisation community finding algorithm. The results below show that readers actually cluster in different communities, which differ on their political orientation or interests.

Structural overview of readers of fr_online that retweeted at least one of its messages

 Extended Readership

Knowing that retweets yield an extended readership (see below), one goal was to take a glimpse of what such an extended readership might mean for the reach of one account.

Extended Readership through retweets

To get an idea how the extended readership helps to boost an accounts reach I have collected all tweets and respective retweets of these accounts. For each retweet I looked up how many followers this reader had. By simply adding up all followers for each reader that did a retweet for this account you get a number that is the potentially maximal extended audience that might have been reached through these retweets. I am saying potentially maximal because I am not taking into account if persons who retweeted messages might have a shared audience (e.g. Imagine reader5 and reader6 being the same person in the figure above)

Extended audience through retweets

We notice that the total of 27.000 Retweets of zeitonline have generated an extended audience of 4.2 Mio readers or in the case of tazgezwitscher we see that 15000 retweets resulted in more than 2 Mio additional readers. What we can take away from this calculation is that retweets really change the distribution game: While zeitonline has approximately 80.000 followers they have managed to get some of their news to be seen by a total of 4.2 Mio people , which is a multiplication of ~50x. I think this shows the true power of social media.

Potential multipliers

When drilling down in the data we have found readers that are especially valuable for an account because they have a high number of followers themselves, serving as huge multipliers for the audience. We find that three cases emerge quite often:

  • Publishers use their own main-accounts to boost the readership of smaller thematic-accounts (e.g. when bild_sport (10.000 followers) is retweeted by bild_news (80.000 followers), or zeitonline_wir (3.000 followers) is retweeted by zeitonline (80.000)
  • Influential users retweet the content (e.g. tweets from BILD_Digital – 4600 Follower, SPIEGEL_Reise – 14000 Follower , SPIEGEL_Netz -22000 Follower are retweeted by rather unknown readers that have a high number of followers einerHaupka -170000 Follower, AxelKoster – 120000 Follower, haukepetersen 70000 Follower)
  • The subject of the content retweets himself ( e.g. A tweet about the band “jetward”  from bild_aktuell(35.000 followers is retweeted by a fan account planetjetward 300.000 followers, or jeffjarvis retweets (80.000 followers) retweets the focuslive account (10.000) who made an interview with him

“Two-Step-Flow” of information

Regarding this diffusion patterns I asked myself if we can compute something similar like a two step flow of information, which is the percentage of retweeted material that has been retweeted because it has been seen not on the original account itself, but has reached a reader by an intermediary. We defined the two-step-flow ratio as:

The number of people that have retweeted an account and follow  directly / total amount of people that have retweeted the account.

Readers following an account and retweeting it (green) , Readers NOT following an account and retweeting it (orange). Potential Two-Step-Flow dashed line.

The ratio can be as high as 1 if everybody that retweeted that account is directly following him and as low as 0 when everybody that retweeted an account is not directly following this account. We have ordered the accounts by the lowest ratio first, and we see that some accounts like zeitonline_wir achieve a ratio of less than 0.5 which means that half of their retweets were from people who were not directly following this account. Now there can be two explanations for such a low ratio: a) people have received the retweet from a broker or middleman and then retweeted it (which is in favor of the two-step-flow hypothesis) or people simply have seen the article on the website and decided to retweet it. Since we didn’t analyse this in detail we can only guess about the percentage, but it would definitely be worth an own analysis.

(in red) Ratio of people that tweeted an article and were directly following an account / all people that retweeted an article

Shared Readers

The final step of this analysis was to find out how many readers the outlets had in common (see orange people in the graphic below). The common readers measure can have a maximal value of 0.5 when e.g. each account has 100 users and both are following both accounts (100/200) or can be minimal 0 when 0 users are in common .

Shared Readers

We computed this ratio for each combination of accounts and displayed in a symmetric matrix (see image below). We additionally grouped the accounts in the matrix by publisher (see blue boxes). The higher the ratio the greener the cells , red = lower.

Shared audience by publisher

Symetric matrix of shared audience What we see in this visualization is that especially among accounts of the same publisher (e.g. Spiegel_eil, Spiegel_news, Spiegel_reise…) a common readership emerges. Thus people who like the spiegel are very often following the other accounts. This pattern emerges even better when we group the shared audience by the publisher (below). What really strikes out is that the tabloid paper BILD has an audience which is very different from the other audiences. On the other hand “intellectual” and social media established newspapers such as the ZEIT or SPIEGEL seem to share a relative  big audience (~ 8%). View of shared audience grouped by publisher

Shared audience by account

If we highlight the shared audience that is three deviations higher than the average value (0,03) we also note that there are certain accounts that are not part of the same publisher but have a very big shared audience (green cells in the matrix below)

Shared audience with between accounts (Green = three SD higher than average)

Since the matrix above is not really good at showing the structure that emerges in the data, we have simply visualized the data in a network format, connecting the accounts that share an audience, the line-strength was chosen accordingly to the percentage of shared audience (see below)

Shared audience network visualization

In this visualization a number of interesting observations emerge:

  • Accounts focusing on the spread of top-news (red e.g. Spiegel_EIL, BILD_NEWS, BILD_AKTUELL, Spiegel_TOP, tazgezwitscher) have a shared audience.
  • We see the same pattern of readers of readers following accounts of the same publisher (e.g. zeitonline_wir, zeitonline_kul, zeitonline_wis und zeitonline_pol or Spiegel_wirtsch, Spiegel_politik, Spiegel_pano, Spiegel_seite2, Spiegelzwischen, Spiegel_SPAM)
  • Accounts that have a thematic focus seem to generate a shared audience. See Travel:  Stern_reise, Welt_reise, Faz_reise, Focusreise. Or Cars: ocusauto, FAZauto, SZ_Auto


We have arrived at the end of our little explorative analysis. A couple of take aways are:

  • Some publishers use Twitter quite successfully as a channel to enhance their reach and the interaction with their readers (as in the examples of spiegel, zeit or taz)
  • Despite the enthusiasm, the image of an interconnected audience, does not emerge that strongly, as readers do not interact with the outlets too much and a high number of readers is only weakly connected to each other
  • Engagement of readers can quite nicely be measured in retweets/message and retweets/follower capturing different aspects.
  • Using a simple modularity analysis  of the retweets network of an account can bring interesting insights on how the audience of an account is clustered (as in the case of fr_online)
  • Retweets in general and the resulting Two-Step-Flow of information can boost the reach of an account by a potential magnitude of ~10-50x
  • Some very influential readers emerge as their audience often is bigger than the audience of the outlet itself
  • A shared audience emerges between accounts of the same publisher, but it also emerges between accounts of different publishers when they share a common topic (e.g. travel)

That is it for today, I am excited to hear your comments




I am presenting this small analysis tomorrow at the SGKM conference (on journalism, social media and communication) and am excited to hear what the audience has to say.


About plotti2k1

Thomas Plotkowiak is working at the MCM Institute in the Social Media and Mobile communication group which belongs to the University of St. Gallen. His PhD research in Social Media is researching how the structure of social networks like Facebook and Twitter influences the diffusion of information. His main focus of work is Twitter, since it allows public access (and has a nice API). Make sure to also have a look at his recent publications. Thomas majored 2008 in Computer Science and Economics at the University of Mannheim and was involved at the computer science institutes for software development and multimedia technoIogy: SWT and PI4. During his studies I focused on Artificial Intelligence, Multimedia Technology, Logistics and Business Informatics. In his diploma/master thesis he developed an adhoc p2p audio engine for 3D Games. Thomas was also a researcher for a year at the University of Waterloo in Canada and in the Macquarie University in Sydney. He was part of the CSIRO ICT researcher group. In his freetime thomas likes to swim in his houselake (drei weiher) and run and enjoy hiking in the Appenzell region. Otherwise you will find him coding ideas he recently had or enjoying a beer with colleagues in the MeetingPoint or Schwarzer Engel.


No comments yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: