You are reading..
Opinion Leaders

The role of opinion leaders in interest based communities


We have all heard about the importance of so called opinion leaders, mavens, influencers or simply central people. I have covered in a recent article in my blog.

Using the same approach in this blog article I will try to find out how  much such opinion-leadership differs across different communities and which factors predict it best. For this task I collected 100 interest based communites. An overview over the communities and attempts can be found here in my recent blogpost and is also shown below. The table shows an overview over the dataset of  interest based commmunities. These contain 100 people each. They have been chosen because they have been highly listed for this certain topic. As we can see in the table the number of tweets,  and exchanged retweets differs slightly  across the communities.

Open the table in google docs

Research Question:

Given this data I can now  ask the question:

How much does a structural opinion leader position in such a community affects the number of retwets you receive in this community.

This question is quite interesting since it has been covered as a hypothesis for a myriad of studies not only in the social media context, but also in medicine or advertising. It corresponds to the famous saying “the messenger is the message” (as Thomas Valente likes to put it).

In this context I have recently stumbled across a survey of Twitter users, where they were asked why they retweet information (see below). As you can see 92% of users state that it depends on the content, but a striking 84% say it is about the personal connection towards the person. From my point of view this also means nothing else than the persons position in the network. So I will surely retweet somebody who is very much respected and embedded in my interest based community, but I won’t see much value of retweeting people that are in the periphery of my community. Therfore central opinion leaders should be able to generate more retweets. We will check this assumption across the 100 interest based topic communities.

(P.S.The question of the message has an influence on whether or not a person will be retweeted will be covered in another blogpost. This task is somehow tricky because we have to come up with an idea how to measure if content was “interesting”. Ideas are welcome :))


To check which centrality metrics are good at predicting retweets in our network I have chosen the standard ones and then computed a pearson correlation between those and the number of retweets a person received from the community.

I have generated three types of networks for each of these communities. A friend-and-follower network, basically capturing the attion people towards each other, An interaction network computed by the @replies that people exchange with each other, and a information diffusion network, computed by the retweets people exchange with eachother. To read those in in networkX I used this code:

FF = nx.read_edgelist('%s_AT.edgelist' % project_name1, nodetype=str, data=(('weight',float),),create_using=nx.DiGraph())
AT = nx.read_edgelist('%s_AT.edgelist' % project, nodetype=str, data=(('weight',float),),create_using=nx.DiGraph())
RT = nx.read_edgelist('%s_RT.edgelist' % project, nodetype=str, data=(('weight',float),),create_using=nx.DiGraph())

To determine central people I have used the standard network measures already implemented in networkX:

#AT Network

dAT = nx.degree_centrality(AT)
dAT_in = nx.in_degree_centrality(AT)
dAT_out = nx.out_degree_centrality(AT)
dAT_closeness = nx.closeness_centrality(AT)
dAT_pagerank = nx.pagerank(AT)

#FF Network

dFF = nx.degree_centrality(FF)
dFF_in = nx.in_degree_centrality(FF)
dFF_out = nx.out_degree_centrality(FF)
dFF_closeness = nx.closeness_centrality(FF)

dRT = nx.degree_centrality(RT)
dRT_in = nx.in_degree_centrality(RT)


To see how well the centrality measures in the FF and AT networks correlate with the number of Retweets received (–> This is the dRT_in value in our retweet network) I computed the pearson correlations for each of those thematic communities. Using the 4 centrality metrics for the AT network and 4 centrality metrics for the FF network we have a sum of  8 different combinations:

1. AT Indegree vs. Retweet Indegree - The more I am mentioned ...
2. AT Outdegree vs. Retweet Indegree - The more I mention others ...
3. AT Closeness vs. Retweet Indegree - The closer I am in the network to others ...
4. AT Pagerank vs. Retweet Indegree - The more authority I posses ...
5. FF Indegree vs. Retweet Indegree - The more people follow me ...
6. FF Outdegree vs. Retweet Indegree - The more people I follow ...
7. FF Closeness vs. Retweet Indegree - ~ The more information I consume ...
8. FF Pagerank vs. Retweet Indegree  - The more authority I posses ...
... the more my tweets are retweeted by others in the community.

Correlation of measures

To compute the correlation I used the scipy stats feaure. For example to compute the correlation between the FF_in network and the RT_in network I used this code:

values = match_values(dFF_in,dRT_in)
output = sp.pearsonr(values[0],values[1])

The output contains the r and p in a simple array.

As you can see above I also used a function called match_values. This function makes sure that the two vectors have the same size. So for example if a person was not retweeted even once this person won’t show up in the retweet network, and therfore I won’t be able to compute how many retweets this person has received. (I could set it to zero but I preferred to rather skip these cases)


Open the correlations in google docs

As you can see in the table above the results show that especially four types of centralities metrics yielded the most significant correlations (p)

1. AT Indegree vs. Retweet Indegree - The more I am mentioned
5. FF Indegree vs. Retweet Indegree - The more people follow me
8. FF Pagerank vs. Retweet Indegree  - The more authority I posses
2. AT Outdegree vs. Retweet Indegree - The more I mention others

The closeness and pagerank values did not do so well when correlating them to the number of retweets that the person received. (There might be a problem because the pearson correlation assumes that we have normally distributed data but our centrality values are highly skewed. I will have to investigate this).


So what did we learn from this? It seems that when trying to capture the opinion leadership in a community it seems to matter

  • How often I am mentioned by others
  • How many people follow me
  • What my pagerank in the friend and  follower network is and
  • How often I mention others (which I think is a bit surprising)

If we were to create a “how-to-be-retweeted” document I would recommend others to intereact with others in their community (and hope that they mention me sometimes, too), try to follow interesting people from the community (and hope that they follow me back) and so hope to achieve a somewhat central position in this community. Of course somehow this is easier said then done, since at the end it is also about what I write. This dualism of content and structure is indeed an interesting one since we can speculate that the outcome where those people  have become central in the community is also a result of their interesting contents or an authrity that goes beyond what we can measure on Twitter.


In the next blogpost I will try to use what we found, namely the most promising  independent variables and see if we can build a linear model that predicts the amount of retweets I receive. It could turn out that the factors that I found are highly correlated and load onto the same factor, thus measure the same thing.



About plotti2k1

Thomas Plotkowiak is working at the MCM Institute in the Social Media and Mobile communication group which belongs to the University of St. Gallen. His PhD research in Social Media is researching how the structure of social networks like Facebook and Twitter influences the diffusion of information. His main focus of work is Twitter, since it allows public access (and has a nice API). Make sure to also have a look at his recent publications. Thomas majored 2008 in Computer Science and Economics at the University of Mannheim and was involved at the computer science institutes for software development and multimedia technoIogy: SWT and PI4. During his studies I focused on Artificial Intelligence, Multimedia Technology, Logistics and Business Informatics. In his diploma/master thesis he developed an adhoc p2p audio engine for 3D Games. Thomas was also a researcher for a year at the University of Waterloo in Canada and in the Macquarie University in Sydney. He was part of the CSIRO ICT researcher group. In his freetime thomas likes to swim in his houselake (drei weiher) and run and enjoy hiking in the Appenzell region. Otherwise you will find him coding ideas he recently had or enjoying a beer with colleagues in the MeetingPoint or Schwarzer Engel.


No comments yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: