Although we covered in part 3, that we can use screen to run our collection in the background and detach from it safely, it has some minor drawbacks.
- To start the process I have to go through a manual setup routine of starting screen then executing the collection and then detaching from it.
- If my process dies somehow in screen, either by a buffer overflow or because I haven’t been prepared for all the eventualities and the process somehow disconneted from the source my datacollection will be corrupted.
Step 1: Deamons
So to compensate for those things I will show you a set up that allows
us better to make sure our process is running and is collecting
tweets. daemon gem and a bit of cronjob magic. To install the daemons gem them just write:
gem install daemons
We will need to create an additional file that will serve as our control program that will start and end the collection. I will call it the collect_tweets_control.rb
require 'rubygems' require 'daemons' Daemons.run("collect_tweets.rb")
We can use it like this:
ruby collect_tweets_control.rb start (collect_tweets.rb is now running in the background) ruby collect_tweets_control.rb.rb restart (...) ruby collect_tweets_control.rb stop
I think it is quite cool :).
For the first time we will test it by running collect_tweets.rbwithout
forking in the background:
ruby collect_tweets_control.rb run
If you are using files in the collect_tweets method, make sure you are using them with their full path.
path = File.dirname(File.expand_path(__FILE__)) #log = Logger.new('collect_tweets.log') log = Logger.new(path + "/" + 'collect_tweets.log') # This also applies for your sqlite database tweets = Sequel.sqlite(path + "/" + "tweets.sqlite")[:tweets]
Otherwise the daemon will complain about not finding your files. Make sure to check if it is running fine by running:
ruby collect_tweets_control.rb run
So now its time to start our process by:
ruby collect_tweets_control.rb start
You will notice that it created a little .pid file that indicates our deamon is up and running. You can also check by:
ps aux | grep collect_tweets.rb
It should show you your process.
Step 2: Script
So our collection process is up and running. We can check the logfile to see if things are going well. But in any case something might happen and our process dies.
Thats why I would like to have a cronjob that checks every 10 Minutes if my process is still doing fine.
If you are on debian it should automatically come with cronjob, or just install it with apt-get.
In Debian the cron package is installed as part of the base system, and will be running by default.
You will find a nice tutorial on cronjob on debian-administration: here
We will first create a little .sh script that will check if our collection is still in progress. I call it check_collection.rb
!/bin/sh up=`ps aux | grep collect_tweets.rb |grep -v "grep" | wc -l` if [ $up -eq 0 ] then /usr/local/bin/ruby /home/plotti/twitter/filme/collect_tweets_control.rb start else echo "Collection is running fine at `date` " fi
Watch out for those different quotation marks around date. What it does is using the ps command in combination with grep to look four our collection process. If it can find it it will output a 1 otherwise a 0.
If it is not running we will start our deamon again and otherwise just output that the collection process is doing fine.
You might want to make it runable with chmod and try it out by typing:
Step 3: Cronjob
Now everything is in place we just need an entry in cronjob that starts our little script which will take care of a respawn.To check if cron is running:
ps aux | grep cron
If its not running on debian you can start it like this:
Type the following command to enter cronjob:
Each cronjob has following syntax:
# +---------------- minute (0 - 59) # | +------------- hour (0 - 23) # | | +---------- day of month (1 - 31) # | | | +------- month (1 - 12) # | | | | +---- day of week (0 - 6) (Sunday=0 or 7) # | | | | | * * * * * command to be executed
So our command will look like this:
*/10 * * * * /home/plotti/twitter/check_collection.sh >> /var/log/cron
Which is a nice shortcut (instead of writing
0,10,20,30,40,50 * * * * ) to getting what we want. There is a cool cron genarator here.
The last part redirects the output of our script to the /var/log/cron file so we can see that it actually ran. You might want to check your /var/cron/log file to see if anything went wrong.