We have all heard about the importance of so called opinion leaders, mavens, influencers or simply central people. I have covered in a recent article in my blog.
Using the same approach in this blog article I will try to find out how much such opinion-leadership differs across different communities and which factors predict it best. For this task I collected 100 interest based communites. An overview over the communities and attempts can be found here in my recent blogpost and is also shown below. The table shows an overview over the dataset of interest based commmunities. These contain 100 people each. They have been chosen because they have been highly listed for this certain topic. As we can see in the table the number of tweets, and exchanged retweets differs slightly across the communities.
Given this data I can now ask the question:
How much does a structural opinion leader position in such a community affects the number of retwets you receive in this community.
This question is quite interesting since it has been covered as a hypothesis for a myriad of studies not only in the social media context, but also in medicine or advertising. It corresponds to the famous saying “the messenger is the message” (as Thomas Valente likes to put it).
In this context I have recently stumbled across a survey of Twitter users, where they were asked why they retweet information (see below). As you can see 92% of users state that it depends on the content, but a striking 84% say it is about the personal connection towards the person. From my point of view this also means nothing else than the persons position in the network. So I will surely retweet somebody who is very much respected and embedded in my interest based community, but I won’t see much value of retweeting people that are in the periphery of my community. Therfore central opinion leaders should be able to generate more retweets. We will check this assumption across the 100 interest based topic communities.
(P.S.The question of the message has an influence on whether or not a person will be retweeted will be covered in another blogpost. This task is somehow tricky because we have to come up with an idea how to measure if content was “interesting”. Ideas are welcome :))
To check which centrality metrics are good at predicting retweets in our network I have chosen the standard ones and then computed a pearson correlation between those and the number of retweets a person received from the community.
I have generated three types of networks for each of these communities. A friend-and-follower network, basically capturing the attion people towards each other, An interaction network computed by the @replies that people exchange with each other, and a information diffusion network, computed by the retweets people exchange with eachother. To read those in in networkX I used this code:
FF = nx.read_edgelist('%s_AT.edgelist' % project_name1, nodetype=str, data=(('weight',float),),create_using=nx.DiGraph()) AT = nx.read_edgelist('%s_AT.edgelist' % project, nodetype=str, data=(('weight',float),),create_using=nx.DiGraph()) RT = nx.read_edgelist('%s_RT.edgelist' % project, nodetype=str, data=(('weight',float),),create_using=nx.DiGraph())
To determine central people I have used the standard network measures already implemented in networkX:
#AT Network dAT = nx.degree_centrality(AT) dAT_in = nx.in_degree_centrality(AT) dAT_out = nx.out_degree_centrality(AT) dAT_closeness = nx.closeness_centrality(AT) dAT_pagerank = nx.pagerank(AT) #FF Network dFF = nx.degree_centrality(FF) dFF_in = nx.in_degree_centrality(FF) dFF_out = nx.out_degree_centrality(FF) dFF_closeness = nx.closeness_centrality(FF) #RT dRT = nx.degree_centrality(RT) dRT_in = nx.in_degree_centrality(RT)
To see how well the centrality measures in the FF and AT networks correlate with the number of Retweets received (–> This is the dRT_in value in our retweet network) I computed the pearson correlations for each of those thematic communities. Using the 4 centrality metrics for the AT network and 4 centrality metrics for the FF network we have a sum of 8 different combinations:
1. AT Indegree vs. Retweet Indegree - The more I am mentioned ...
2. AT Outdegree vs. Retweet Indegree - The more I mention others ...
3. AT Closeness vs. Retweet Indegree - The closer I am in the network to others ...
4. AT Pagerank vs. Retweet Indegree - The more authority I posses ...
5. FF Indegree vs. Retweet Indegree - The more people follow me ...
6. FF Outdegree vs. Retweet Indegree - The more people I follow ...
7. FF Closeness vs. Retweet Indegree - ~ The more information I consume ...
8. FF Pagerank vs. Retweet Indegree - The more authority I posses ...
... the more my tweets are retweeted by others in the community.
To compute the correlation I used the scipy stats feaure. For example to compute the correlation between the FF_in network and the RT_in network I used this code:
values = match_values(dFF_in,dRT_in) output = sp.pearsonr(values,values)
The output contains the r and p in a simple array.
As you can see above I also used a function called match_values. This function makes sure that the two vectors have the same size. So for example if a person was not retweeted even once this person won’t show up in the retweet network, and therfore I won’t be able to compute how many retweets this person has received. (I could set it to zero but I preferred to rather skip these cases)
As you can see in the table above the results show that especially four types of centralities metrics yielded the most significant correlations (p)
1. AT Indegree vs. Retweet Indegree - The more I am mentioned
5. FF Indegree vs. Retweet Indegree - The more people follow me
8. FF Pagerank vs. Retweet Indegree - The more authority I posses
2. AT Outdegree vs. Retweet Indegree - The more I mention others
The closeness and pagerank values did not do so well when correlating them to the number of retweets that the person received. (There might be a problem because the pearson correlation assumes that we have normally distributed data but our centrality values are highly skewed. I will have to investigate this).
So what did we learn from this? It seems that when trying to capture the opinion leadership in a community it seems to matter
If we were to create a “how-to-be-retweeted” document I would recommend others to intereact with others in their community (and hope that they mention me sometimes, too), try to follow interesting people from the community (and hope that they follow me back) and so hope to achieve a somewhat central position in this community. Of course somehow this is easier said then done, since at the end it is also about what I write. This dualism of content and structure is indeed an interesting one since we can speculate that the outcome where those people have become central in the community is also a result of their interesting contents or an authrity that goes beyond what we can measure on Twitter.
In the next blogpost I will try to use what we found, namely the most promising independent variables and see if we can build a linear model that predicts the amount of retweets I receive. It could turn out that the factors that I found are highly correlated and load onto the same factor, thus measure the same thing.
I was recently analyzing Twitter networks and started to create groups according to a certain topic. An example would be the 100 most relevant people for the keyword “snowboarding”. Based on using the Twitter list feature where you can tag people with the word snowboarding the more often people are tagged the more they are relevant for this keyword.
Sharing knowledge in a public way has a long tradition. Analogously there is also a long tradition of the study of communities of practice evolving around the sharing of knowledge and the underlying computer mediated communication (Hildreth, Kimble, & Wright, 1998). Wasko et al (Wasko & Faraj, 2000) find that knowledge exchange is motivated by moral obligation and community interest rather than by narrow self-interest. Focusing on the behavioral rather the structural variables affecting knowledge exchange they find that people participate primarily out of community interest, generalized reciprocity and prosocial behavior. Millen et al (Millen, Fontaine, & Muller, 2002) emphasize the role of communities of practice on the improvement of organizational performance, which is also valid beyond the organizational boundaries. In the software development field such communities also hold a disciplinary background, similar work activities and tools, shared stories, contexts and values. Twitter is only a new way of social interaction between such communities beyond the existing ones (e.g., hallway exchanges and water-cooler conversations, meetings and conferences, brown bag lunches, newsletters, teleconferences, online environments, shared web spaces, email lists, discussion forums, and synchronous chat.). In the form of communities of interests, Henri and Pudelko (Henri & Pudelko, 2003) note a very similar phenomenon of people who are ” assembled around a topic of common interest. Its members take part in the community to exchange information, to obtain answers to personal questions or problems, to improve their understanding of a subject, to share common passions or to play.” Hence analyzing programming communities on Twitter is an ideal laboratory to explore how this new medium affects the flow of information in communities of practice or interest. For this task I have selected in a set of five keywords that represent programming communities, which represent five modern programming languages, which are widely used among software developers:
Those keywords are particularly useful because they are seldomly used outside of context of programming languages and do not stand for other communities that might also be represented in Twitter. Finally investigating how the informal connections and the sharing of information in such communities leads to acquiring social capital which then leads to a better communication relates directly to the research questions. The exploration of communities of practice which are considered to realize social capital especially for members who show, and share expertise and experience provides an interesting field both from a practical but also theoretical view.
For each of those keywords I assembled the 100 most relevant Twitter users according to the above mentioned list feature. The result list is somewhat puzzling (see figure 1) I have visualized it in a so called spring-embedding social network layout. Each node corresponds to a person and the color corresponds to a programming language. The more ties nodes have together the closer they are.
Of course the image is only a small part of the Twitter network, meaning that each of those 100 people are embedded in a bigger network, that is not shown here. To cover that I have calculated the number of ties that the those people have with the “outside” word, which is everything that beyond those 500 people. An overview of the communication has been captured in table 1, which shows the amount of Follower ties, @-replies and retweets for those groups.
Studying the communication pattern I was thinking: So what exactly does this mean? Is it that despite the fact that we are all developers and programmers somewhow we stick to our own turf, and rather neglect what is happening in other domains? Regarding the little theoretical overview towards communities of practice it seems like although there is exchange going on programmers prefer to connect with their own people, simply because the tweets are more relevant, if you do not program in a foreign language.
P.S. This perfectly fits into the homophily or assortative mixing considerations found in sociology. It is still puzzling to find it so emerging for the field of programming.
P.P.S. Normally to analyze such tendencies there is a SNA Metric called External-Internal-Index that calculates a significance how much those ties that we see differ from random distribution. This index also varifies this tendency, yet it is hard to interpret due to the lack of comparable data. I guess I will compare this tendencies with other groups on Twitter.
Research on opinion leaders dates back quite a while now. It was in 1944 when Lazarsfeld & Katz (The peoples choice) and his team researched public communication and found out that communication does not directly flow to the mass but is actually interpreted first by opinion leaders and then forwarded to the rest of the people. They have described this process as the “Two-Step-Flow of Communication”.
The Two-Step-Flow of Communicaiton asserts the following points:
Generally it is assumed that opinion leaders have certain characteristics which make them special.One definition of opinion leaders is the one of Kotler (in his book Marketing Management) where he defines them as “people who can influence members in the social community because of special techniques, knowledge, personalities and other uniqueness”
Rogers (in his book Diffusion of Innovations) describes opinion leaders as people with
Robertson (in his book Innovative Behavior and Communication) mentions that they are:
According to Rogers, who describes the process of innovation, the decision process has five stages
Especially in the recognition and evaluation of innovations people are under the influences of their interpersonal network. Here opinion leaders can play an important role. In the recognition and understanding of an innovation opinion leaders can provide valuable information for members of their communication network. In the evaluation stage opinion leaders can serve as a norming element publicly deciding for a group
which innovations are good and which are bad.
Having a central role in their community opinion leaders can help to spread the word about an
innovation faster because of their many ties they have with the members.
Rogers provides four ways of identifying opinion leaders:
To identify opinion leaders in e.g. Twitter networks we can use the first suggested method since people in twitter leave tons of traces and their networks are public. A first attempt of doing this could be:
In the next article I will show you how to achieve this and which problems and discussions we encounter on the way.