//
archives

Archive for

Finding out what people are interested in by using only structural information

A lot of recommendation algorithms these days suffer under the so called cold start problem. Usually this problem is tackled by having the user fill out some initial forms, or provide some initial ratings e.g. for movies in order to give the algorithm something to work on. Another idea is to use what is already out there namely the information encoded in the friends and follower graph on Twitter.

I thought it would be fun to use my recent corpus of 16.000 Twitter users (that have been categorized by how people list them in the list feature)  to determine what an arbitrary user is interested in. If this user follows one of these people this means that he might also be interested in the area that they represent.  See schema figure below. The approach is really quite simple. Collect all the friends edges from a user, go through them and see if we can find this person in our pre-tagged set of users.  The more users we find from one category the more this users seems to be interested in this topic.

 

Below is all that is needed to perform this user interest aggregation:


require '../config/environment'
interests = {}
rows = FasterCSV.read("#{RAILS_ROOT}/analysis/data/partitions/final_partitions_p100_200_0.2.csv")
rows.each do |row|
interests[row[0]] = {:category => row[1], :count => row[2]}
end
project = Project.last
ids = project.persons.collect{|p| p.twitter_id}
name = "zephoria"
Person.collect_person(name,2,100000)
person = Person.find_by_username(name)
out = []
person.friends_ids.each do |id|
if ids.include? id
out << id
end
end
personal_interests = {}
out.each do |id|
interest = interests[Person.find_by_twitter_id(id).username]
if interest != nil
if personal_interests[interest[:category]] == nil
personal_interests[interest[:category]] = {:count => 1, :names => []}
personal_interests[interest[:category]][:names] << [Person.find_by_twitter_id(id).username, interest[:count]]
else
personal_interests[interest[:category]][:count] += 1
personal_interests[interest[:category]][:names] << [Person.find_by_twitter_id(id).username, interest[:count]]
end
end
end
#personal_interests.sort{|a,b| b[1][:count] <=> a[1][:count]}
puts "Name: #{name} Interests: #{personal_interests.collect{|p| [p[0],p[1][:count]]}.sort{|a,b| b[1]<=>a[1]}.join(" ")}"

view raw

gistfile1.rb

hosted with ❤ by GitHub

The final partitions file in the code is only the output of a task that I performed in my last blog post . I think  results of this very simple idea are quite satisfactory. But see for yourself. I have pre-computed the results for some people that I follow and am thinking of putting this online somewhere so you can also check for yourself.  Below is the sample yaml output for the user zephoria (danah boyd). The second number next to the person in each category lists how high this person has been ranked in this category.


– – tech
– :count: 13
:names:
– – BillGates
– "7"
– – biz
– "23"
– – ev
– "17"
– – mattcutts
– "73"
– – anildash
– "38"
– – jeffjarvis
– "122"
– – Caterina
– "123"
– – Scobleizer
– "12"
– – chr1sa
– "136"
– – digiphile
– "134"
– – ginatrapani
– "25"
– – waltmossberg
– "20"
– – om
– "28"
– – sociology
– :count: 9
:names:
– – barrywellman
– "27"
– – SSRC_org
– "8"
– – eszter
– "51"
– – JessieNYC
– "50"
– – techsoc
– "70"
– – pewresearch
– "62"
– – marc_smith
– "97"
– – craigjcalhoun
– "24"
– – sociologically
– "6"
– – musician_singer
– :count: 7
:names:
– – ladygaga
– "1"
– – Pink
– "5"
– – katyperry
– "2"
– – ddlovato
– "45"
– – amandapalmer
– "112"
– – imogenheap
– "18"
– – trent_reznor
– "21"
– – ceo
– :count: 7
:names:
– – johnbattelle
– "74"
– – nilofer
– "51"
– – dsifry
– "61"
– – timoreilly
– "4"
– – loic
– "64"
– – Pistachio
– "36"
– – finkd
– "34"
– – humanrights_activism_justice
– :count: 7
:names:
– – globalvoices
– "68"
– – UNICEF
– "7"
– – CornelWest
– "22"
– – NaomiAKlein
– "23"
– – jilliancyork
– "124"
– – racialicious
– "73"
– – GEMSGIRLS
– "144"
– – anthropology
– :count: 6
:names:
– – hahhh
– "63"
– – DannyAnth
– "60"
– – BiellaColeman
– "86"
– – Grant27
– "88"
– – mwesch
– "28"
– – AmericanAnthro
– "1"
– – tvshows_drama_actor_hollywood
– :count: 5
:names:
– – BillCosby
– "40"
– – oliviawilde
– "63"
– – Janefonda
– "91"
– – Oprah
– "57"
– – iansomerhalder
– "31"
… (rest of the output omitted)

view raw

gistfile1.yml

hosted with ❤ by GitHub

Here are some shortened results (omitting the individual persons) of people I follow on twitter . If you like, you can tell me in the comments how well this approach actually captured your interests.

  • Name: plotti Interests: java 12 python 7 ruby 6 sociology 4 investor 2 database 2 tech 2 anthropology 1 anime 1 developer 1 innovation 1 mac_iphone 1 publicrelations 1 university 1 teaching 1
  • Name: barrywellman Interests: sociology 7 anthropology 3 linguistics 1 tech 1 multimedia 1 innovation 1 developer 1 highered 1
  • Name: marc_smith Interests: sociology 19 tech 14 innovation 7 ceo 7 investor 6 politics_news 5 geography 5 marketing 5 charity_philanthropy 4 developer 3 finance_economics 3 climatechange 2 comedy_funny 2 healthcare_medicine 2 highered 2 religion 1 director 1 publicrelations 1 mobile_smartphone 1 blogs 1 humanrights_activism_justice 1 pharma 1 multimedia 1 anthropology 1 engineering 1 hacking 1 branding 1 banking 1 mac_iphone 1 basketball 1 university 1 management 1 biology 1 democrat 1 radio 1 newspaper 1 ruby 1 agriculture 1 author 1 psychology_mentalhealth 1
  • Name: jorgefabrega Interests: sociology 8 tech 3 innovation 2 anthropology 1 mathematics 1 marketing 1 geography 1 developer 1 teaching 1 database 1 politics_news 1 university 1 finance_economics 1 philosophy 1 reporter 1
  • Name: PFCdgayo Interests: sociology 6 innovation 2 developer 2 mathematics 2 database 2 psychology_mentalhealth 1 php 1 comedy_funny 1 politics_news 1 engineering 1 anime 1 anthropology 1 tech 1 hacking 1 comics 1 biology 1 university 1
  • Name: chl Interests: python 11 flash 9 developer 8 tech 6 investor 6 ceo 5 database 3 ruby 3 html 3 biology 2 astronomy_physics 2 multimedia 1 gaming 1 geography 1 anthropology 1 mathematics 1 sociology 1 innovation 1 photography 1 neuroscience 1 buddhism 1 java 1 comedy_funny 1 chemistry 1 banking 1
  • Name: orgnet Interests: innovation 2 sociology 2 jewish 1 geography 1 publicrelations 1 management 1
  • Name: jure Interests: university 2 sociology 2 investor 1 liberal 1 sailing 1 finance_economics 1 database 1
  • Name: arnicas Interests: flash 8 python 7 developer 4 tech 4 html 3 database 2 sociology 2 innovation 2 jokes 2 engineering 2 astronomy_physics 2 tvshows_drama_actor_hollywood 2 mathematics 2 anime 1 dating 1 charity_philanthropy 1 multimedia 1 marketing 1 investor 1 anthropology 1 comedy_funny 1 politics_news 1 history 1 blogs 1 author 1 neuroscience 1 university 1 management 1 teaching 1 cinema 1 biology 1 climatechange 1 comics 1 reporter 1

Again I’d like to note that in order to find out about user’s interests using this method, there is no need to study his tweets. His friends ties already reveal quite a lot. The first couple of interests are often not that surprising, but some of the later interests reveal things about persons that I was not aware of.

Cheers

Thomas