Archive for

How to build your Twitterbot part 2

So after building a very annoying bot in part 1 we will try to make it a bit smarter so it can do actually something useful.

Today we will give the bot the methods: get_friends, get_followers, to find out more about himself.

In order to perform some actions we will write the methods: follow_user,  retweet_somebody and favorite_some_tweet.

Follow user will follow a person from our list of persons, retweet_somebody will simply retweet somebody, and favorite_some_tweet will favorite a random tweet.So in a way the robot will mimic a typical user behavior on twitter.

def get_friends(username)
    result = Twitter.friend_ids(username)
    friends_ids = result.ids
    old_cursor = 0
    next_cursor = result.next_cursor
    while old_cursor != next_cursor and next_cursor != 0
      old_cursor = next_cursor
        result = Twitter.friend_ids username, :cursor => next_cursor
      friends_ids += result.ids
        next_cursor = result.next_cursor
    return friends_ids
    @log.error "Couldn't get friends"

 def get_followers(username)
    result = Twitter.follower_ids username
    followers_ids = result.ids
    old_cursor = 0
    next_cursor = result.next_cursor
    while old_cursor != next_cursor and next_cursor != 0
      old_cursor = next_cursor
      result = Twitter.follower_ids username, :cursor => next_cursor
        followers_ids += result.ids
      next_cursor = result.next_cursor
    return followers_ids
    @log.error "Could'nt get followers"

Those two methods are almost similar,  where for a given name we can get the follower resp. friends ids.So having those methods in the Twitterbot class we will use them in the initialization to have a bot that knows something about himself.

def initialize(config_file)
  # Adding those lines to our initial init method
  #New find out what my friends are
  @friends_ids = self.get_friends(Twitter.user.screen_name)
  #Find out what my followers are
  @followers_ids = self.get_followers(Twitter.user.screen_name)

def retweet_somebody
            all_retweets = Twitter.retweeted_to_me(:count => 100)
            retweets = []
            #Strategy 1: Only keep english AND keyword AND more then 1 retweet
            all_retweets.each{|e| retweets << e if e.retweeted_status.user.lang == "en" && e.retweeted_status.text.include?(@twitter_config["keyword"]) && e.retweet_count > 1}
                        #Strategy 2: Just retweet the most retweeted retweet.
            #best_retweet = retweets.sort{|a,b| a.retweet_count.to_i  b.retweet_count.to_i} # retweet highly retweeted ones
            if retweets.length > 0
                best_retweet = retweets[rand(retweets.length)]
                @log.info "Retweeted: #{best_retweet.retweeted_status.text} with RT:#{best_retweet.retweet_count}"
                return best_retweet
                @log.info "Wanted to retweet but found not material matching #{@twitter_config["keyword"]}"
                return []
            @log.error "Failed to retweet #{best_retweet.retweeted_status.text}"

The favorite some tweet method works similiar like the retweet method, where we first get a bunch of tweets from our timeline and then consider tweets as favoritable candidates if those tweets have been retweeted by somebody before and if the tweet has been written by some of my followers. We could have come up with a different heuristic but this one works just fine.

def follow_user(user)
        @log.info "Followed person #{user.screen_name}"
            @log.error "Could not follow person #{user.sceen_name}"

The follow method simply makes the bot follow a given user. For it to work we will need to add the following lines to the init function in order to have a list of persons that we would like to follow:

#Read in People to Follow
  @persons_to_follow_ids = CSV.read(@twitter_config["followers_file"]).flatten.collect{|f| f.to_i}

  #Find out whom of the persons that i should follow i am already following
  @persons_to_follow_ids = @persons_to_follow_ids - @friends_ids

Having written all of those extra functions for the bot we can make our bot logic a bit more intelligent. We will simply roll a dice and depending on the outcome the bot will either follow a person, retweet a person, or facorite a tweet or at_message a person:

#Follow procedure only follow people on a chance 1/3
dice = rand(3)

if dice == 1

 #Chose up to 4 persons
 t.persons_to_follow_ids[0..rand(4)].each do |person_to_follow|

  #Get a User
  user = t.get_user(person_to_follow)

  if t.twitter_config["follow_people"] == true

  #At Message
  if t.twitter_config["message_people"] == true

elsif dice == 2

 #Retweet some of my friends
 if t.twitter_config["retweet_people"] == true

elsif dice == 3

 #Favorite some Tweet
 if t.twitter_config["favorite_tweets"] == true

 t.log.info "Skipped run this time"

We will call the bot periodically to perform one of the actions and see how he is perceived by persons. Please consider that all of the above is rather an experimental toy example of  a bot and shouldnt be used to build spam bots on Twitter.


How to build your Twitter bot Part 1

In case you want to build a Twitter bot, it is quite easy with ruby and a few gems. We will write a small bot that will know a list of target persons and try to follow them and chat them up with some pickup lines.

Well I know this is kind of close to spam, so please use caution, but in part 2 I will try to teach the bot to write something more sophisticated in order to be useful. Plus it might definitely make much more sense to have some state machine mechanism in the background to control the robot. But we will start with something easy now.

To get started all you will need in ruby are those libraries twitter, csv, logger, yaml. The twitter gem is great to handle everything that has to do with Twitter. The logger gem is great if you want to maintain a log file of all the things that are going on.The remaining csv and yaml gems are practical to write down some human readable config files. So that is the code we have so far.

require 'rubygems'
require 'twitter'
require 'csv'
require 'logger'
require 'yaml'

class Twitterbot
    attr_accessor :persons_to_text_ids, :twitter_config, :log

    def initialize(config_file)
        @twitter_config = YAML.load_file(config_file)
        @log = Logger.new(@twitter_config["log_file"])

        #Read in Pickup Lines
        @pickup_lines = CSV.read(@twitter_config["pickup_file"])

        #Read in People to Follow
        @persons_to_follow_ids = CSV.read(@twitter_config["persons_file"]).flatten.collect{|f| f.to_i}


Obviously we need some config file and a follower file. The config file will contain everything config parameter for a given account and the follower file will contain the followers per account that the account should follow.

#Config Twitterbot
consumer_key: "your consumer key"
consumer_secret: "your consumer secret"
access_token: "your access token"
access_token_secret: "your access token secret"

#Configure Behavior
persons_file: "your persons file.csv"
log_file: "your log file.log"
pickup_file: "your file with pickup lines.csv"
message_people: true

Ok lets have a look on the persons file. This is a simple .csv file where we will put a twitter id in each line:


And lets have a look at the pickup lines file. It should contain some “pickup lines” that the bot writes towards this person.

I've seen you are enthousiastic about xyz. Want to chat about it?
Have you heard about our xyz. I am quite interested in xyz. Do you want to be friends?

Well now we have everything that bot needs in order to get started. It only needs a method where he addresses the persons.

def at_message(user)
            pickup_line = "@#{user.screen_name} #{@pickup_lines[rand(@pickup_lines.length)]}"
            @log.info "At Messaged: #{pickup_line}"
            @log.error "Failed to adress member #{user.screen_name}"

So you see we are using the log to note down whatever the bot did. Well now its time for a test run. So fire up your ruby console (irb) and load yourself a bot and have a look if it addresses our first person in the list.

require 'init'
t = Twitterbot.new("your config file")
user = t.persons_to_text_ids[0]

I hope that works for you. If it does you can create a small robot.rb file that you going to call by a cronjob.

require 'init'

#Config Robot Client
if ARGV[0] == nil
    puts "You have to provide a config file e.g. ruby robot.rb twitter.yml"
    t = Twitterbot.new(ARGV[0])

if t.twitter_config["message_people"] == true

Finally the last thing left is to call the robot periodically using a cronjob. (Use: crontab -e) Lets suppose you want to call it every hour:

*/60 * * * * cd /home/plotti/twitterbot/;  ruby robot.rb your_config_file.yml

Tadaaa, you are done. Now you have  a bot that wakes up every hour and texts one of the people in the list.

As already mentioned, its not very useful and even quite spam like behavior, but I promise we will make the bot a bit smarter in the next episode in order to be useful.

Cheers Thomas

Knowledge exchange between programming communities


I was recently analyzing Twitter networks and started to create groups according to a certain topic. An example would be the 100 most relevant people for the keyword “snowboarding”. Based on using the Twitter list feature where you can tag people with the word snowboarding the more often people are tagged the more they are relevant for this keyword.


Sharing knowledge in a public way has a long tradition. Analogously there is also a long tradition of the study of communities of practice evolving around the sharing of knowledge and the underlying computer mediated communication (Hildreth, Kimble, & Wright, 1998). Wasko et al (Wasko & Faraj, 2000) find that knowledge exchange is motivated by moral obligation and community interest rather than by narrow self-interest. Focusing on the behavioral rather the structural variables affecting knowledge exchange they find that people participate primarily out of community interest, generalized reciprocity and prosocial behavior. Millen et al (Millen, Fontaine, & Muller, 2002) emphasize the role of communities of practice on the improvement of organizational performance, which is also valid beyond the organizational boundaries. In the software development field such communities also hold a disciplinary background, similar work activities and tools, shared stories, contexts and values. Twitter is only a new way of social interaction between such communities beyond the existing ones (e.g., hallway exchanges and water-cooler conversations, meetings and conferences, brown bag lunches, newsletters, teleconferences, online environments, shared web spaces, email lists, discussion forums, and synchronous chat.). In the form of communities of interests, Henri and Pudelko (Henri & Pudelko, 2003) note a very similar phenomenon of people who are ” assembled around a topic of common interest. Its members take part in the community to exchange information, to obtain answers to personal questions or problems, to improve their understanding of a subject, to share common passions or to play.” Hence analyzing programming communities on Twitter is an ideal laboratory to explore how this new medium affects the flow of information in communities of practice or interest. For this task I have selected in a set of five keywords that represent programming communities, which represent five modern programming languages, which are widely used among software developers:

  • Ruby
  • Python
  • Java
  • PHP
  • Pearl
  • I did not collect lists for C/++ because its very noisy as of having only one letter

Those keywords are particularly useful because they are seldomly used outside of context of programming languages and do not stand for other communities that might also be represented in Twitter. Finally investigating how the informal connections and the sharing of information in such communities leads to acquiring social capital which then leads to a better communication relates directly to the research questions. The exploration of communities of practice which are considered to realize social capital especially for members who show, and share expertise and experience provides an interesting field both from a practical but also theoretical view.

For each of those keywords I assembled the 100 most relevant Twitter users according to the above mentioned list feature. The result list is somewhat puzzling (see figure 1) I have visualized it in a so called spring-embedding social network layout. Each node corresponds to a person and the color corresponds to a programming language. The more ties nodes have together the closer they are.


Of course the image is only a small part of the Twitter network, meaning that each of those 100 people are embedded in a bigger network, that is not shown here. To cover that I have calculated the number of ties that the those people have with the “outside” word, which is everything that beyond those 500 people. An overview of the communication has been captured in table 1, which shows the amount of Follower ties, @-replies and retweets for those groups.

Table 1:



Studying the communication pattern I was thinking: So what exactly does this mean? Is it that despite the fact that we are all developers and programmers somewhow we stick to our own turf, and rather neglect what is happening in other domains? Regarding the little theoretical overview towards communities of practice it seems like although there is exchange going on programmers prefer to connect with  their own people, simply because the tweets are more relevant, if you do not program in a foreign language.

P.S. This perfectly fits into the homophily or assortative mixing considerations found in sociology. It is still  puzzling to find it so emerging for the field of programming.

P.P.S. Normally to analyze such tendencies  there is a SNA Metric called External-Internal-Index that calculates a significance how much those ties that we see differ from random distribution. This index also varifies this tendency, yet it is hard to interpret due to the lack of comparable data. I guess I will compare this tendencies with other groups on Twitter.



Interest based social capital

Some of you have probably read the very popular article on techcrunch on “The rise of interest based social network” such as pintrest, instagram, thumb, foodspotting and so on. While it seems like this is a new phenomenon I think that this kind of interest based social networks has existed for a long time in Twitter. While there are many theories why people form ties on Twitter, such following big stars (preferential attachment) or because of local proximity or closing triangles ( a friend of a friend) the most obvious reason is actually provided by twitter: “Follow your interests”. Actually at the time of writing this blog article I re-checked Twitter and saw that it now says “Find out what’s happening, right now, with the people and organizations you care about.”.

As you see on the left in the picture there we suspect that there is number of interest groups that are mainly interacting with each other but don’t care so much about other interest groups. It seems only natural that the principle of homophily (people like people who are alike) seems to foster such groups. In order to investigate such interest based social networks on Twitter, I have gathered 100 groups of 100 people based on a particular interest that is captured by a keyword.

To gather people that best represent a given interest or keyword I have used the Twitter list feature, where for each person you can see for which topics the person is listed. Using a rather sophisticated approach (I will cover it in another blogpost) I made sure to collect those people that are highly listed for a given keyword.
For each person I captured the number of tweets the person wrote and the number of retweets those tweets have received. Such corpus of 100 communities of each 100 persons allows us to study the “topic based social capital” – which can be defined on a number of levels.

Research Questions:

This data allows us to analyze if there is such a thing like “topic based social capital” – which can somehow be the social capital aquired by individuals or groups that are based around a certain topic. On the individual level it means the more embedded (or central) I am in such a group, the higher we suspect the chances that I am important for this group and have some sort of influence on those people. Here when talking about influence I will simply measure the amount of retweets that person received from this group. On the group level we can think of this “topic based social capital” as a group feature, thus the better connected the group is and the more interaction takes place between their members, the higher the chances that those people actually exchange more information wich each other. This is also measured in the number of total retweets that have been exchanged between the members of this group.  We will start the investigation with the group level version of social capital and cover the others later….

The Data

Open the data overview in google docs

So in the overview we see that each community is based on a certain keyword and contains 100 persons and in total quite a number of tweets and retweets. In total that means that we are analyzing approximately 10 000 persons which in total have produced somewhere around 24 Mio Tweets and 95 Mio Retweets. I stored this data in a database in order to be able to access those communities any time.

The first ting that I noticed from an aggregate view is that when sorting the communities by the number of retweets that they managed to produce, we get interesting results.(Although here in this table the number of retweets is the total number of retweets and not the retweets that they managed to produce in their own community. We will come to this later). I noticed that the groups like celebrities, news, musicians, comedy, politicians managed to spark the highest amount of retweets. It is not that surprising, since we all know what popular TV and magazines are made from. In this sense twitter only represents what we are used to each day. When creating a ratio of Retweets/Tweet we and up with the same kind of sorting, meaning that those categories were most successfull at sparking retweets. If we took into account the number of followers these communities have, we might end up with a different result though (but we will also cover that in another blogpost). Back to the research question:

So regarding the social capital the group has we are interested if the higher it is the more retweets flow through the network.


To operationalize this question I have used networkX to compute three kinds of networks for those communities.

  • A follower network – that captures all friend and follower ties between each of those 100 persons.
  • An “interaction networks” – that captures all of the @replies of those persons and
  • A retweet network – that captures all of the retweets that those persons have exchanged with each other.

Now in order to measure the social capital the group has we can compute the densities in either the follower network or the interaction network. These densities will serve as a proxy of social capital the group has as a whole. The denser the network is the more embedded those people are with each other and the higher the total social capital of the group. In order to measure the resulting information diffusion I will also measure the density in the retweet network. The more retweets those people have exchanged with eachother INSIDE the community the more information diffusion took place.

We have a the MORE of something the MORE of something else relationship. Therfore using a regression is a nice way of analyzing those things.

To compute those densities for all of those communities I dumped the resulting networks using an edgelist format and then used networkX to compute the densities for each of those networks. Then I saved them in a csv format to import them into a statistics program like SPSS ( we could also use numpy or scipy to compute those regressions)

for project in communities:

    print ""
    print "############ Calculating Project %s ############### " % project
    print ""

    FF = nx.read_edgelist('data/%s_FF.edgelist' % project, nodetype=str, data=(('weight',float),),create_using=nx.DiGraph())
    AT = nx.read_edgelist('data/%s_AT.edgelist' % project, nodetype=str, data=(('weight',float),),create_using=nx.DiGraph())
    RT = nx.read_edgelist('data/%s_RT.edgelist' % project, nodetype=str, data=(('weight',float),),create_using=nx.DiGraph())

    FF_density = nx.density(FF)
    AT_density = nx.density(AT)
    RT_density = nx.density(RT)

    csv_writer.writerow([project, FF_density, AT_density, RT_density])

We end up with a datafile like this:

Open the datafile in google docs


Now the last step is to see which type of independent variable either the social capital measured by friend and follower ties or the social capital measured by at ties captures the information diffusion that is happening in the network.
For this task I have created a simple regression in SPSS where I used a backward step method to exclude factors. So this model starts with including both the FF densities and the AT densities and then checks if it makes sense to get rid of one predictor because it does not explain enough of the variance in the data.

Open the result of the regression analysis in google docs

So we see since the ff-density and the at-density are quite correlated .742 p


Looking at the beta coefficients we see that the group bonding social capital as captured by the simple density in the interaction-network is able to explain 81% of the information diffsuion in the group. Thus we can come to the conclusion that the more group “bonding” social capital the group posseses, which is captured by the interactions they are having, the more information is exchanged between the members of the group.

The conclusion is somewhat not that much of a surprise, but we have seen an easy way of trying to investigate this question.

If we look at the groups with the highest social capital we see that other groups lead now. The groups with the highest social capital are programming languages “php”, “python” or “ruby” or some sports based such as “motorcross” or a german political party called the “piraten”. Where the groups of “celebries” for example only “rank” on somewhere the 80th place. THis means that celebrities are good at sparking a lot of retweets but they don’t really care about each other as a group. They don’t interact much with each other and they don’t retweet each other as a community.

If you have any questions, comments or ideas about this approach let me know.
Cheers Thomas