We have all heard about the importance of so called opinion leaders, mavens, influencers or simply central people. I have covered in a recent article in my blog.
Using the same approach in this blog article I will try to find out how much such opinion-leadership differs across different communities and which factors predict it best. For this task I collected 100 interest based communites. An overview over the communities and attempts can be found here in my recent blogpost and is also shown below. The table shows an overview over the dataset of interest based commmunities. These contain 100 people each. They have been chosen because they have been highly listed for this certain topic. As we can see in the table the number of tweets, and exchanged retweets differs slightly across the communities.
Given this data I can now ask the question:
How much does a structural opinion leader position in such a community affects the number of retwets you receive in this community.
This question is quite interesting since it has been covered as a hypothesis for a myriad of studies not only in the social media context, but also in medicine or advertising. It corresponds to the famous saying “the messenger is the message” (as Thomas Valente likes to put it).
In this context I have recently stumbled across a survey of Twitter users, where they were asked why they retweet information (see below). As you can see 92% of users state that it depends on the content, but a striking 84% say it is about the personal connection towards the person. From my point of view this also means nothing else than the persons position in the network. So I will surely retweet somebody who is very much respected and embedded in my interest based community, but I won’t see much value of retweeting people that are in the periphery of my community. Therfore central opinion leaders should be able to generate more retweets. We will check this assumption across the 100 interest based topic communities.
(P.S.The question of the message has an influence on whether or not a person will be retweeted will be covered in another blogpost. This task is somehow tricky because we have to come up with an idea how to measure if content was “interesting”. Ideas are welcome :))
To check which centrality metrics are good at predicting retweets in our network I have chosen the standard ones and then computed a pearson correlation between those and the number of retweets a person received from the community.
I have generated three types of networks for each of these communities. A friend-and-follower network, basically capturing the attion people towards each other, An interaction network computed by the @replies that people exchange with each other, and a information diffusion network, computed by the retweets people exchange with eachother. To read those in in networkX I used this code:
FF = nx.read_edgelist('%s_AT.edgelist' % project_name1, nodetype=str, data=(('weight',float),),create_using=nx.DiGraph()) AT = nx.read_edgelist('%s_AT.edgelist' % project, nodetype=str, data=(('weight',float),),create_using=nx.DiGraph()) RT = nx.read_edgelist('%s_RT.edgelist' % project, nodetype=str, data=(('weight',float),),create_using=nx.DiGraph())
To determine central people I have used the standard network measures already implemented in networkX:
#AT Network dAT = nx.degree_centrality(AT) dAT_in = nx.in_degree_centrality(AT) dAT_out = nx.out_degree_centrality(AT) dAT_closeness = nx.closeness_centrality(AT) dAT_pagerank = nx.pagerank(AT) #FF Network dFF = nx.degree_centrality(FF) dFF_in = nx.in_degree_centrality(FF) dFF_out = nx.out_degree_centrality(FF) dFF_closeness = nx.closeness_centrality(FF) #RT dRT = nx.degree_centrality(RT) dRT_in = nx.in_degree_centrality(RT)
To see how well the centrality measures in the FF and AT networks correlate with the number of Retweets received (–> This is the dRT_in value in our retweet network) I computed the pearson correlations for each of those thematic communities. Using the 4 centrality metrics for the AT network and 4 centrality metrics for the FF network we have a sum of 8 different combinations:
1. AT Indegree vs. Retweet Indegree - The more I am mentioned ...
2. AT Outdegree vs. Retweet Indegree - The more I mention others ...
3. AT Closeness vs. Retweet Indegree - The closer I am in the network to others ...
4. AT Pagerank vs. Retweet Indegree - The more authority I posses ...
5. FF Indegree vs. Retweet Indegree - The more people follow me ...
6. FF Outdegree vs. Retweet Indegree - The more people I follow ...
7. FF Closeness vs. Retweet Indegree - ~ The more information I consume ...
8. FF Pagerank vs. Retweet Indegree - The more authority I posses ...
... the more my tweets are retweeted by others in the community.
Correlation of measures
To compute the correlation I used the scipy stats feaure. For example to compute the correlation between the FF_in network and the RT_in network I used this code:
values = match_values(dFF_in,dRT_in) output = sp.pearsonr(values,values)
The output contains the r and p in a simple array.
As you can see above I also used a function called match_values. This function makes sure that the two vectors have the same size. So for example if a person was not retweeted even once this person won’t show up in the retweet network, and therfore I won’t be able to compute how many retweets this person has received. (I could set it to zero but I preferred to rather skip these cases)
As you can see in the table above the results show that especially four types of centralities metrics yielded the most significant correlations (p)
1. AT Indegree vs. Retweet Indegree - The more I am mentioned
5. FF Indegree vs. Retweet Indegree - The more people follow me
8. FF Pagerank vs. Retweet Indegree - The more authority I posses
2. AT Outdegree vs. Retweet Indegree - The more I mention others
The closeness and pagerank values did not do so well when correlating them to the number of retweets that the person received. (There might be a problem because the pearson correlation assumes that we have normally distributed data but our centrality values are highly skewed. I will have to investigate this).
So what did we learn from this? It seems that when trying to capture the opinion leadership in a community it seems to matter
- How often I am mentioned by others
- How many people follow me
- What my pagerank in the friend and follower network is and
- How often I mention others (which I think is a bit surprising)
If we were to create a “how-to-be-retweeted” document I would recommend others to intereact with others in their community (and hope that they mention me sometimes, too), try to follow interesting people from the community (and hope that they follow me back) and so hope to achieve a somewhat central position in this community. Of course somehow this is easier said then done, since at the end it is also about what I write. This dualism of content and structure is indeed an interesting one since we can speculate that the outcome where those people have become central in the community is also a result of their interesting contents or an authrity that goes beyond what we can measure on Twitter.
In the next blogpost I will try to use what we found, namely the most promising independent variables and see if we can build a linear model that predicts the amount of retweets I receive. It could turn out that the factors that I found are highly correlated and load onto the same factor, thus measure the same thing.