In Twitter we have the situation that the network between users is multiplex (people can hold numerous ties with each other): Users can either a) follow each other b) interact with each other or c) retweet each other. The three types of ties, manifest themselves in three different networks that can be sort of laid on top of each other. This idea got me thinking. I stumbled upon a very interesting chapter for a book from Stephen Borgatti, who introduced network flow model that in my eyes seems to fit perfectly for the Twitter network. The network model from his paper is depicted below:
In his model Borgatti describes the model as two kinds of phenomena, which are called backcloth and traffic in the original work of Atkin. By adapting this model for Twitter we can explain how and why the three types of ties that we have in Twitter can be laid on top of each other and how they influence information diffusion in Twitter. I have therefore made a version that shows how the concepts map onto Twitter, that is depicted below.
The backcloth is the infrastructure that enables the traffic and the traffic consists of information flowing through the network. In the case of Twitter the backcloth corresponds to the cognitive similarities among Twitter users (see below), and to their friend and follower connections. The traffic layer consists of the interactions and flow of information that takes place on top of these phenomena.
Borgatti describes the four categories as following: “The similarities category refers to physical proximity, co-membership in social categories and sharing of behaviors, attitudes and beliefs. Generally we do not see these items as social ties, but we do often see them as increasing the probabilities of certain relations and dyadic events.” This definition corresponds to the notion of implicit ties (ties we cannot directly see) of Twitter users: These implicit ties can basically be a shared interest, a shared location, a shared demographic, a shared audience and so on. Basically every type of attribute that makes two Twitter users similar to each other. The idea that similar people with the same attributes tend to flock together is known as homophily and the general process of people forming ties with similar people is called selection mechanism (e.g. think of people that smoke becoming friends with other smokers)
The next three types of phenomena take place on so called explicit ties because these type of ties can actually be seen or measured explicitly on Twitter. Borgatti defines the social relations category as “ the classic kinds of social ties that are ubiquitous [(which friend and follower ties are in Twitter)] and serve either as role-based or cognitive/affective ties. Role-based includes kinships and role-relations such as boss of, teacher of and friend of.[(In Twitter: follower of)] They can easily be non-symmetric [(which friend and follower ties are)].” It is apparent that, these type of ties exactly relate to the explicit follower ties, which share the same attributes and characteristics. When we think about the other reasons why people become friends other than being similar, we stumble upon all the network effects that are a core part of network literature. Therefore I have indicated those with the back and forth arrow above social relations. In Twitter these type of processes take place everyday: People follow prominent outlets e.g. CNN (preferential attachment), become friends with friends of friends (triadic closure) or simply follow back a person that just followed them (reciprocity). There are much more of such effects, but we don’t want to go into detail here, but instead look at the next type of ties.
Borgatti describes the interactions category as “discrete and separate events that may occur frequently but then stop, such as talking with, fighting with, or having lunch with”. This category translates into the interactional (@mention) ties in Twitter, which have exactly these behavioral traits: People do intentionally mention each other in Tweets, but also might stop doing so for certain reasons. Depending on when one looks at two users in Twitter, this interactional connection might be exist at this point in time or not. The first reason according to the network flow model, why I would interact with someone is because I follow them, which makes perfectly sense for Twitter. Now are there are more reasons why people might interact with each other and a number of those reasons is already covered in various information diffusion theories: One example is that people like to interact with others who they perceive as opinion leaders for a topic. Another example is the brokerage theory that says that such brokers tend profit from interaction with two different groups. The third type of families are the threshold models, where people believed to are lured into interaction or adoption once a certain threshold of their friends talks about a certain topic. Processes like this could easily be taking place on Twitter too.
Finally the flows category is described by Borgatti as “things such as resources, information and diseases that move from node to node. They may transfer (being only at one place at a time) and duplicate (as in information).”. This definition translates directly into the explicit retweet ties that always exist when information is transferred from one actor to another. The final network layer follows the same reasoning as the one before: The first reason why I would retweet someone is because I follow that person and I have already interacted with that person. The reasoning about information diffusion theories applies here too.
Finally I thought it would be nice to add the influence mechanism in this model, which is basically people becoming more similar to each other because of the networks that people already have. All three types of networks (friend and follower ties, @interactions and retweets) might have that effect. The classic influence example is non-smokers being friends with smokers, and then starting to smoke, might be imaginable in Twitter too. Yet there are strong indications that this effect is much smaller than people believe it to be.
Using the network flow model, we came up with a nice ordering of the different concepts that surround network science and sociology and could somehow connect this pieces to the Twitter network. I hope this extended network flow model was useful for you and hope to hear some comments on it.
A few months ago I’ve made a blog post (https://twitterresearcher.wordpress.com/2012/01/17/the-strength-of-ties-revisited/) investigating tie strenghts on Twitter and their influence on retweets. Well it turns out that my analysis was lacking a lot of detail, so I re-did it again considering more aspects than before. So lets get started.
The data that I am using for this analysis is the following: Each group of people consists of 100 people that have been highly listed for a given topic in Twitter e.g. snowboarding or comedy or any other topical interest that people have on Twitter. There are 170 of such groups, each consisting of exactly 100 members (You can read how I created such groups in my recent blog posts here https://twitterresearcher.wordpress.com/2012/06/08/how-to-generate-interest-based-communities-part-1/ and here https://twitterresearcher.wordpress.com/2012/06/12/how-to-generate-interest-based-communities-part-2/). In an abstract way you can imagine the structure of the network to looks something like this:
The graphic above indicates that we only have the friend-follower ties on Twitter between those people. But indeed there are quite a few more ties between people, resulting in a multiplex network between them. This network consists of three layers:
Schematically this looks something like this:
Now when we think about ties between those people especially in regard to tie-strengths we can come up with a couple of different definitions of ties ( I mentioned a couple of those in my blog post here https://twitterresearcher.wordpress.com/2012/05/24/tie-strength-in-twitter/)
Bridging vs. bonding ties:
Having all those definitions of ties we can now come up with a number of observations regarding the information diffusion between those people. The information diffusion is captured in the retweet network (see third layer in the schematic graphic) and the corresponding ties. In generall we want to look at how the different tie types affect the information diffused (retweets) between those people.
To get an overview over the data I will first have a look how many retweets have in total have been exchanged between the analyzed groups. I count how many retweets took place inside the group (blue) and between the groups (red). Each of the 170 groups is shown below:
Approximately a total of 214.000 retweets took place between groups (red) and 414.000 retweets that took place inside the groups (blue). In the graphic above we can clearly see the differences between the different interest groups. I’ve ordered the groups ascending to retweets inside the community and which makes us see that there are some groups that focus mostly on retweets inside the group (e.g. tennis or astronomy_physics) while other groups rather get mostly retweets from outside of their own group and do not retweet each other so much inside the group (e.g.poltics_news or liberal). Although we cannot clearly say that the group has an influence if it gets retweeted from outside the group, we can say that the members of the group at least have the choice to retweet other members of the group. If these members do not retweet each other it might have a reason about which you are free to speculate (or I will try to answer in the next blog post)
Given the different types of ties described above we can now ask the most important question:
How do the different non-valued bridging ties differ from the bonding ties in regard to their influence on the information diffused through those ties?
What do I mean by that? Having all retweets between the persons in the sample I want to find out through which ties these retweets have flown. So for example given that A has retweeted B three times , I ask the question which ties (that A and B already have in the friend and follower network or the interaction network) were “responsible” for this flow of information between those actors?
EXAMPLE: If two people have mentioned each other at least once, I will assume (according to the definition above) that they hold a reciprocated interaction tie. I will then assume that this tie was “responsible” for the retweet between them. NOTICE: This is a simplifying assumption because I assume that if there is a stronger tie it is always was responsible for the retweet and not the maybe underlying weaker tie (as in form of a friend and follower tie).
The assumption that I make here is therefore:
In order to compute which kind of ties were most successful of transmitting retweets, I compute the ratio of ties that had retweets that have flown through this TYPE of tie (e.g. ff_reciprocated_ties) and divide it through the amount of the same ties that no had no retweets (e.g. ff_reciprocated_ties between people where no retweet was exchanged between those persons). So if I have a total of 10.000 reciprocated ties and over 2000 a retweet took place while over the remaining 8000 no retweets have been transmitted the ratio for this type of tie is 0.25.
I have summarized the results in the table below. The std. deviation reports the deviation in the different retweet ties that belong to a certain edge type. (In the case of no_tie we have no data for no retweets because here we would have to count all the ties that are not present, which seems a bit unrealistic, given the structure of social networks)
As you can see in the table I have first of all differentiated if a tie belongs to a bridging tie or a bonding tie. Remember that bonding ties are between people who hold the same interest while bridging ties are between people who belong to different groups and thus share different interests.
As you can see first of all there are a couple of retweets that have taken place between people despite those people actually holding any ties. In the case of bridging ties we a bit more retweets than in the case of bonding ties. Yet regarding the total of almost 660.000 retweets, the approximately 73.000 retweets that took place without a tie are more or less only 10% of the total information diffusion. (So my appologies for the blog post on the importance of no ties was overstating their importance, given this new interpretation)
Friend and follower ties
What is more interesting are the friend and follower ties. We can see that in both cases holding a reciprocated tie with a person, results in a higher chance of getting retweeted by this person. Although when we look at the bonding ties this chance is almost 4 times as high, while in the bridging ties our chances improve only by less than 10%. When we compare the bonding with the bridging ties we clearly see that the reciprocated bonding ties have a magnitude of 10 higher chance of leading to a retweet than the bridging ties. This is very interesting. So despite the fact that of course bridging ties are important because they lead to a diffusion of information outside of the interest group, they are much more difficult to activate than ties between people who share the same interest. So from my point of view this fact shows exactly the weakness of weak ties. When I mean weak ties I refer to the bridging ties that link different topic interest communities together. We see that not only the weaker the tie the lower the chance of it carrying a retweet but also if the tie is a bridging tie the chances drop significantly.
Additionally we can also see that the reciprocated friend and follower ties correspond to the majority of the bandwidth of information exchanged. This is also an interesting fact since the stronger the ties get the higher the chance of obtaining a retweet through this tie, but at the same time the total amount of retweets flowing through these ties drops dramatically (we will also see this when we take a look at the valued at-interaction ties). Just by adding up the numbers we see that almost 3/4ths of all retweets inside the group have flown through the reciprocated friend and follower ties. So although those ties have only a ratio of 0.8 of retweets / no retweets they are the ties that are mostly responsible for the whole information diffusion inside the group.
When we analyze the interaction ties we find a similar pattern. We see that the bonding ties have a much higher chance of resulting in a retweet than their bridging counterparts, although the difference is not as dramatic. In general we also notice that the reciprocated at_ties have the higher chance of leading to retweets. Actually the ratio is higher than one in the reciprocated bonding ties. This means that per tie we obtain more than one retweet. From tie “maintainance perspective” it would seem smart to maintain such ties with your followers because on average they lead to the highest “earnings” or retweets. We shouldn’t jump the gun too early here, because up till now we have analyzed the rather “weak” ties. Why weak? Well having had a reciprocated conversation with a person is great but having had received 10 or 50 @ replies from that person is definitely a stronger tie, and might lead to a higher chance of getting retweeted by this person.
If we look at the valued ties we could replicate the table above and go through each tie strength separately, but its more fun to do this in a graphical way. I have therefore plotted the tie strength between two persons on the X-axis and the ratio (ties that had retweets flow through this type of tie / same type of ties that had no retweet) on the Y axis (make sure to click on the graphic to see it in full resolution)
So what do we see? Well first of all the red line marks the ratio of 1, which is receiving more retweets through this type of tie than not receiving retweets. Anything above one is awesome ;). You also notice that there is quite a lot of variance in the retweets, which is indicated by the error bars (std deviation). As the ties get stronger I would say that the standard deviation also gets higher (due to higher and less values in the retweets)
Bridging ties vs. bonding ties
What we notice is that both the bridging and bonding ties have a tendency to result in a higher chance of retweets flowing through this tie, the stronger they get. I would say this holds up to a certain point maybe the strength of 40? After this the curve starts to fluctuate so much that we can’t really tell if this behavior looks like this simply by chance (notice the high error bars). What we also see is that clearly the bridging ties have a lower chance of resulting in retweets than their bonding counterparts (comare green curve with the blue one). This is an observation that we have also noticed before. So again here it is, the weakness of weak ties. Weaker ties lead to a lower chance of resulting in retweets and the typical weak bridging ties also are much harder to activate than their bonding counterparts. What is not shown in this graph is the total number of retweets that have flown through those strong ties. Those are ~ 29000 retweets for bridging ties and ~ 37000 for bonding ties. Compared to the other tie types this is only a fraction of the total of exchanged retweets. Yet these strong ties in comparison have a very high chance leading to retweets, having sometimes ratios higher than 3 (i.e. there are thee times more retweets than flowing through this type of tie than no retweets flowing through this tie).
Well that was it for today. I will update this blog post with the reverse direction of ties tomorrow where Iwill have a look on the influence of outgoing ties on the incoming retweets. But don’t expect any surprises ;). Plus I will post the code that I used to generate this type of analysis.
I was recently analyzing Twitter networks and started to create groups according to a certain topic. An example would be the 100 most relevant people for the keyword “snowboarding”. Based on using the Twitter list feature where you can tag people with the word snowboarding the more often people are tagged the more they are relevant for this keyword.
Sharing knowledge in a public way has a long tradition. Analogously there is also a long tradition of the study of communities of practice evolving around the sharing of knowledge and the underlying computer mediated communication (Hildreth, Kimble, & Wright, 1998). Wasko et al (Wasko & Faraj, 2000) find that knowledge exchange is motivated by moral obligation and community interest rather than by narrow self-interest. Focusing on the behavioral rather the structural variables affecting knowledge exchange they find that people participate primarily out of community interest, generalized reciprocity and prosocial behavior. Millen et al (Millen, Fontaine, & Muller, 2002) emphasize the role of communities of practice on the improvement of organizational performance, which is also valid beyond the organizational boundaries. In the software development field such communities also hold a disciplinary background, similar work activities and tools, shared stories, contexts and values. Twitter is only a new way of social interaction between such communities beyond the existing ones (e.g., hallway exchanges and water-cooler conversations, meetings and conferences, brown bag lunches, newsletters, teleconferences, online environments, shared web spaces, email lists, discussion forums, and synchronous chat.). In the form of communities of interests, Henri and Pudelko (Henri & Pudelko, 2003) note a very similar phenomenon of people who are ” assembled around a topic of common interest. Its members take part in the community to exchange information, to obtain answers to personal questions or problems, to improve their understanding of a subject, to share common passions or to play.” Hence analyzing programming communities on Twitter is an ideal laboratory to explore how this new medium affects the flow of information in communities of practice or interest. For this task I have selected in a set of five keywords that represent programming communities, which represent five modern programming languages, which are widely used among software developers:
Those keywords are particularly useful because they are seldomly used outside of context of programming languages and do not stand for other communities that might also be represented in Twitter. Finally investigating how the informal connections and the sharing of information in such communities leads to acquiring social capital which then leads to a better communication relates directly to the research questions. The exploration of communities of practice which are considered to realize social capital especially for members who show, and share expertise and experience provides an interesting field both from a practical but also theoretical view.
For each of those keywords I assembled the 100 most relevant Twitter users according to the above mentioned list feature. The result list is somewhat puzzling (see figure 1) I have visualized it in a so called spring-embedding social network layout. Each node corresponds to a person and the color corresponds to a programming language. The more ties nodes have together the closer they are.
Of course the image is only a small part of the Twitter network, meaning that each of those 100 people are embedded in a bigger network, that is not shown here. To cover that I have calculated the number of ties that the those people have with the “outside” word, which is everything that beyond those 500 people. An overview of the communication has been captured in table 1, which shows the amount of Follower ties, @-replies and retweets for those groups.
Studying the communication pattern I was thinking: So what exactly does this mean? Is it that despite the fact that we are all developers and programmers somewhow we stick to our own turf, and rather neglect what is happening in other domains? Regarding the little theoretical overview towards communities of practice it seems like although there is exchange going on programmers prefer to connect with their own people, simply because the tweets are more relevant, if you do not program in a foreign language.
P.S. This perfectly fits into the homophily or assortative mixing considerations found in sociology. It is still puzzling to find it so emerging for the field of programming.
P.P.S. Normally to analyze such tendencies there is a SNA Metric called External-Internal-Index that calculates a significance how much those ties that we see differ from random distribution. This index also varifies this tendency, yet it is hard to interpret due to the lack of comparable data. I guess I will compare this tendencies with other groups on Twitter.
Some of you have probably read the very popular article on techcrunch on “The rise of interest based social network” such as pintrest, instagram, thumb, foodspotting and so on. While it seems like this is a new phenomenon I think that this kind of interest based social networks has existed for a long time in Twitter. While there are many theories why people form ties on Twitter, such following big stars (preferential attachment) or because of local proximity or closing triangles ( a friend of a friend) the most obvious reason is actually provided by twitter: “Follow your interests”. Actually at the time of writing this blog article I re-checked Twitter and saw that it now says “Find out what’s happening, right now, with the people and organizations you care about.”.
As you see on the left in the picture there we suspect that there is number of interest groups that are mainly interacting with each other but don’t care so much about other interest groups. It seems only natural that the principle of homophily (people like people who are alike) seems to foster such groups. In order to investigate such interest based social networks on Twitter, I have gathered 100 groups of 100 people based on a particular interest that is captured by a keyword.
To gather people that best represent a given interest or keyword I have used the Twitter list feature, where for each person you can see for which topics the person is listed. Using a rather sophisticated approach (I will cover it in another blogpost) I made sure to collect those people that are highly listed for a given keyword.
For each person I captured the number of tweets the person wrote and the number of retweets those tweets have received. Such corpus of 100 communities of each 100 persons allows us to study the “topic based social capital” – which can be defined on a number of levels.
This data allows us to analyze if there is such a thing like “topic based social capital” – which can somehow be the social capital aquired by individuals or groups that are based around a certain topic. On the individual level it means the more embedded (or central) I am in such a group, the higher we suspect the chances that I am important for this group and have some sort of influence on those people. Here when talking about influence I will simply measure the amount of retweets that person received from this group. On the group level we can think of this “topic based social capital” as a group feature, thus the better connected the group is and the more interaction takes place between their members, the higher the chances that those people actually exchange more information wich each other. This is also measured in the number of total retweets that have been exchanged between the members of this group. We will start the investigation with the group level version of social capital and cover the others later….
So in the overview we see that each community is based on a certain keyword and contains 100 persons and in total quite a number of tweets and retweets. In total that means that we are analyzing approximately 10 000 persons which in total have produced somewhere around 24 Mio Tweets and 95 Mio Retweets. I stored this data in a database in order to be able to access those communities any time.
The first ting that I noticed from an aggregate view is that when sorting the communities by the number of retweets that they managed to produce, we get interesting results.(Although here in this table the number of retweets is the total number of retweets and not the retweets that they managed to produce in their own community. We will come to this later). I noticed that the groups like celebrities, news, musicians, comedy, politicians managed to spark the highest amount of retweets. It is not that surprising, since we all know what popular TV and magazines are made from. In this sense twitter only represents what we are used to each day. When creating a ratio of Retweets/Tweet we and up with the same kind of sorting, meaning that those categories were most successfull at sparking retweets. If we took into account the number of followers these communities have, we might end up with a different result though (but we will also cover that in another blogpost). Back to the research question:
So regarding the social capital the group has we are interested if the higher it is the more retweets flow through the network.
To operationalize this question I have used networkX to compute three kinds of networks for those communities.
Now in order to measure the social capital the group has we can compute the densities in either the follower network or the interaction network. These densities will serve as a proxy of social capital the group has as a whole. The denser the network is the more embedded those people are with each other and the higher the total social capital of the group. In order to measure the resulting information diffusion I will also measure the density in the retweet network. The more retweets those people have exchanged with eachother INSIDE the community the more information diffusion took place.
We have a the MORE of something the MORE of something else relationship. Therfore using a regression is a nice way of analyzing those things.
To compute those densities for all of those communities I dumped the resulting networks using an edgelist format and then used networkX to compute the densities for each of those networks. Then I saved them in a csv format to import them into a statistics program like SPSS ( we could also use numpy or scipy to compute those regressions)
for project in communities: print "" print "############ Calculating Project %s ############### " % project print "" FF = nx.read_edgelist('data/%s_FF.edgelist' % project, nodetype=str, data=(('weight',float),),create_using=nx.DiGraph()) AT = nx.read_edgelist('data/%s_AT.edgelist' % project, nodetype=str, data=(('weight',float),),create_using=nx.DiGraph()) RT = nx.read_edgelist('data/%s_RT.edgelist' % project, nodetype=str, data=(('weight',float),),create_using=nx.DiGraph()) FF_density = nx.density(FF) AT_density = nx.density(AT) RT_density = nx.density(RT) csv_writer.writerow([project, FF_density, AT_density, RT_density])
We end up with a datafile like this:
Now the last step is to see which type of independent variable either the social capital measured by friend and follower ties or the social capital measured by at ties captures the information diffusion that is happening in the network.
For this task I have created a simple regression in SPSS where I used a backward step method to exclude factors. So this model starts with including both the FF densities and the AT densities and then checks if it makes sense to get rid of one predictor because it does not explain enough of the variance in the data.
So we see since the ff-density and the at-density are quite correlated .742 p
Looking at the beta coefficients we see that the group bonding social capital as captured by the simple density in the interaction-network is able to explain 81% of the information diffsuion in the group. Thus we can come to the conclusion that the more group “bonding” social capital the group posseses, which is captured by the interactions they are having, the more information is exchanged between the members of the group.
The conclusion is somewhat not that much of a surprise, but we have seen an easy way of trying to investigate this question.
If we look at the groups with the highest social capital we see that other groups lead now. The groups with the highest social capital are programming languages “php”, “python” or “ruby” or some sports based such as “motorcross” or a german political party called the “piraten”. Where the groups of “celebries” for example only “rank” on somewhere the 80th place. THis means that celebrities are good at sparking a lot of retweets but they don’t really care about each other as a group. They don’t interact much with each other and they don’t retweet each other as a community.
If you have any questions, comments or ideas about this approach let me know.