You are reading..

Datamining Twitter: Part 1

In this short tutorial you will learn how to collect tweets using ruby and only two gems.

It is part of a series where I will show you what fantastic things you can do with twitter these days, if you love mining
data 🙂

The first gem I would like to introduce is sequel. It is a lightweight ORM layer that allows to to intterface a couple of of a
databases in ruby without pain. It works great with mysql or sqlite. We will use sqlite today.I have been using mysql in combination wit rails and the nice activerecord ORM, but for the most tasks it is a bit too bulky. The problem with Sqlite can be though that it does not provide multitasking capabilities. But we will bump into that later…

To get you started have a visit on http://sequel.rubyforge.org/
and have a look on the example. They are pretty straight forward. I can also recommend the cheatsheet under: http://sequel.rubyforge.org/rdoc/files/doc/cheat_sheet_rdoc.html

Step 1.

Install the sequel gem by and you are ready to go.

sudo gem install sequel

Step 2

Let us set up a little database to hold the tweets. If you are familiar with activerecord, you have probably used migrations before. So sequel works the same way. You write migration files and then simply run them. So here is mine to get you started with a very easy table. Its important to save it as a 01_migration_name.rb file the number is important otherwise sequel wont recognize which migration to run first. I saved it as 01_create_table.rbclass CreateTweetTable < Sequel::Migration

class CreateTweetTable &lt; Sequel::Migration

def up
  create_table :tweets do
    primary_key :id
    String :text
    String :username
    Time :created_at

def down


Step 3

Run the first migration. You will find a great tutorial on migrations on http://steamcode.blogspot.com/2009/03/sequel-migrations.html

sequel -m . -M 1 sqlite://tweets.db

If you are getting a “URI::InvalidURIError: the scheme sqlite does not accept registry part: …” then your database name probably contains some characters it shouldnt. Just try to use only letters and numbers.

So now you should have a sqlite database for the very basic needs of your tweets. But maybe you need a little bit more information on what you are capturing. So lets´write our second migration. In addition to just storing the text and the
username, I want to store the guid of the tweet and the timezone and the language used.

class AddLangAndGuid &lt; Sequel::Migration

    def up
        alter_table :tweets do
            add_column  :guid, Integer
            add_column  :lang, String
            add_column  :time_zone, String

    def down
        alter_table :tweets do
            drop_column :guid
            drop_column :lang
            drop_column :time_zone

After running

sequel -m . -M 2 sqlite://tweets.db

you have created a a nice database that will hold your tweets.

Step 4:

Lets see how it worked. To use sequel in your scripts you have to require rubygems and the seqel gem. What we want to do is to
connect to the  database. Just fire up your irb and get us started:

require 'rubygems'
require 'sequel'

DB = Sequel.sqlite("tweets.rb")
tweets = DB[:tweets]

In those few lines you loaded up your database and now have a tweets collection that holds your data. I think that is really convenient.In part 2 I will show you how to collect them. Enjoy.



About plotti2k1

Thomas Plotkowiak is working at the MCM Institute in the Social Media and Mobile communication group which belongs to the University of St. Gallen. His PhD research in Social Media is researching how the structure of social networks like Facebook and Twitter influences the diffusion of information. His main focus of work is Twitter, since it allows public access (and has a nice API). Make sure to also have a look at his recent publications. Thomas majored 2008 in Computer Science and Economics at the University of Mannheim and was involved at the computer science institutes for software development and multimedia technoIogy: SWT and PI4. During his studies I focused on Artificial Intelligence, Multimedia Technology, Logistics and Business Informatics. In his diploma/master thesis he developed an adhoc p2p audio engine for 3D Games. Thomas was also a researcher for a year at the University of Waterloo in Canada and in the Macquarie University in Sydney. He was part of the CSIRO ICT researcher group. In his freetime thomas likes to swim in his houselake (drei weiher) and run and enjoy hiking in the Appenzell region. Otherwise you will find him coding ideas he recently had or enjoying a beer with colleagues in the MeetingPoint or Schwarzer Engel.


No comments yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: