A lot of recommendation algorithms these days suffer under the so called cold start problem. Usually this problem is tackled by having the user fill out some initial forms, or provide some initial ratings e.g. for movies in order to give the algorithm something to work on. Another idea is to use what is already out there namely the information encoded in the friends and follower graph on Twitter.
I thought it would be fun to use my recent corpus of 16.000 Twitter users (that have been categorized by how people list them in the list feature) to determine what an arbitrary user is interested in. If this user follows one of these people this means that he might also be interested in the area that they represent. See schema figure below. The approach is really quite simple. Collect all the friends edges from a user, go through them and see if we can find this person in our pre-tagged set of users. The more users we find from one category the more this users seems to be interested in this topic.
Below is all that is needed to perform this user interest aggregation:
require '../config/environment' | |
interests = {} | |
rows = FasterCSV.read("#{RAILS_ROOT}/analysis/data/partitions/final_partitions_p100_200_0.2.csv") | |
rows.each do |row| | |
interests[row[0]] = {:category => row[1], :count => row[2]} | |
end | |
project = Project.last | |
ids = project.persons.collect{|p| p.twitter_id} | |
name = "zephoria" | |
Person.collect_person(name,2,100000) | |
person = Person.find_by_username(name) | |
out = [] | |
person.friends_ids.each do |id| | |
if ids.include? id | |
out << id | |
end | |
end | |
personal_interests = {} | |
out.each do |id| | |
interest = interests[Person.find_by_twitter_id(id).username] | |
if interest != nil | |
if personal_interests[interest[:category]] == nil | |
personal_interests[interest[:category]] = {:count => 1, :names => []} | |
personal_interests[interest[:category]][:names] << [Person.find_by_twitter_id(id).username, interest[:count]] | |
else | |
personal_interests[interest[:category]][:count] += 1 | |
personal_interests[interest[:category]][:names] << [Person.find_by_twitter_id(id).username, interest[:count]] | |
end | |
end | |
end | |
#personal_interests.sort{|a,b| b[1][:count] <=> a[1][:count]} | |
puts "Name: #{name} Interests: #{personal_interests.collect{|p| [p[0],p[1][:count]]}.sort{|a,b| b[1]<=>a[1]}.join(" ")}" |
The final partitions file in the code is only the output of a task that I performed in my last blog post . I think results of this very simple idea are quite satisfactory. But see for yourself. I have pre-computed the results for some people that I follow and am thinking of putting this online somewhere so you can also check for yourself. Below is the sample yaml output for the user zephoria (danah boyd). The second number next to the person in each category lists how high this person has been ranked in this category.
— | |
– – tech | |
– :count: 13 | |
:names: | |
– – BillGates | |
– "7" | |
– – biz | |
– "23" | |
– – ev | |
– "17" | |
– – mattcutts | |
– "73" | |
– – anildash | |
– "38" | |
– – jeffjarvis | |
– "122" | |
– – Caterina | |
– "123" | |
– – Scobleizer | |
– "12" | |
– – chr1sa | |
– "136" | |
– – digiphile | |
– "134" | |
– – ginatrapani | |
– "25" | |
– – waltmossberg | |
– "20" | |
– – om | |
– "28" | |
– – sociology | |
– :count: 9 | |
:names: | |
– – barrywellman | |
– "27" | |
– – SSRC_org | |
– "8" | |
– – eszter | |
– "51" | |
– – JessieNYC | |
– "50" | |
– – techsoc | |
– "70" | |
– – pewresearch | |
– "62" | |
– – marc_smith | |
– "97" | |
– – craigjcalhoun | |
– "24" | |
– – sociologically | |
– "6" | |
– – musician_singer | |
– :count: 7 | |
:names: | |
– – ladygaga | |
– "1" | |
– – Pink | |
– "5" | |
– – katyperry | |
– "2" | |
– – ddlovato | |
– "45" | |
– – amandapalmer | |
– "112" | |
– – imogenheap | |
– "18" | |
– – trent_reznor | |
– "21" | |
– – ceo | |
– :count: 7 | |
:names: | |
– – johnbattelle | |
– "74" | |
– – nilofer | |
– "51" | |
– – dsifry | |
– "61" | |
– – timoreilly | |
– "4" | |
– – loic | |
– "64" | |
– – Pistachio | |
– "36" | |
– – finkd | |
– "34" | |
– – humanrights_activism_justice | |
– :count: 7 | |
:names: | |
– – globalvoices | |
– "68" | |
– – UNICEF | |
– "7" | |
– – CornelWest | |
– "22" | |
– – NaomiAKlein | |
– "23" | |
– – jilliancyork | |
– "124" | |
– – racialicious | |
– "73" | |
– – GEMSGIRLS | |
– "144" | |
– – anthropology | |
– :count: 6 | |
:names: | |
– – hahhh | |
– "63" | |
– – DannyAnth | |
– "60" | |
– – BiellaColeman | |
– "86" | |
– – Grant27 | |
– "88" | |
– – mwesch | |
– "28" | |
– – AmericanAnthro | |
– "1" | |
– – tvshows_drama_actor_hollywood | |
– :count: 5 | |
:names: | |
– – BillCosby | |
– "40" | |
– – oliviawilde | |
– "63" | |
– – Janefonda | |
– "91" | |
– – Oprah | |
– "57" | |
– – iansomerhalder | |
– "31" | |
… (rest of the output omitted) |
Here are some shortened results (omitting the individual persons) of people I follow on twitter . If you like, you can tell me in the comments how well this approach actually captured your interests.
Again I’d like to note that in order to find out about user’s interests using this method, there is no need to study his tweets. His friends ties already reveal quite a lot. The first couple of interests are often not that surprising, but some of the later interests reveal things about persons that I was not aware of.
Cheers
Thomas