You are reading..

Finding out what people are interested in by using only structural information

A lot of recommendation algorithms these days suffer under the so called cold start problem. Usually this problem is tackled by having the user fill out some initial forms, or provide some initial ratings e.g. for movies in order to give the algorithm something to work on. Another idea is to use what is already out there namely the information encoded in the friends and follower graph on Twitter.

I thought it would be fun to use my recent corpus of 16.000 Twitter users (that have been categorized by how people list them in the list feature)  to determine what an arbitrary user is interested in. If this user follows one of these people this means that he might also be interested in the area that they represent.  See schema figure below. The approach is really quite simple. Collect all the friends edges from a user, go through them and see if we can find this person in our pre-tagged set of users.  The more users we find from one category the more this users seems to be interested in this topic.


Below is all that is needed to perform this user interest aggregation:

The final partitions file in the code is only the output of a task that I performed in my last blog post . I think  results of this very simple idea are quite satisfactory. But see for yourself. I have pre-computed the results for some people that I follow and am thinking of putting this online somewhere so you can also check for yourself.  Below is the sample yaml output for the user zephoria (danah boyd). The second number next to the person in each category lists how high this person has been ranked in this category.

Here are some shortened results (omitting the individual persons) of people I follow on twitter . If you like, you can tell me in the comments how well this approach actually captured your interests.

  • Name: plotti Interests: java 12 python 7 ruby 6 sociology 4 investor 2 database 2 tech 2 anthropology 1 anime 1 developer 1 innovation 1 mac_iphone 1 publicrelations 1 university 1 teaching 1
  • Name: barrywellman Interests: sociology 7 anthropology 3 linguistics 1 tech 1 multimedia 1 innovation 1 developer 1 highered 1
  • Name: marc_smith Interests: sociology 19 tech 14 innovation 7 ceo 7 investor 6 politics_news 5 geography 5 marketing 5 charity_philanthropy 4 developer 3 finance_economics 3 climatechange 2 comedy_funny 2 healthcare_medicine 2 highered 2 religion 1 director 1 publicrelations 1 mobile_smartphone 1 blogs 1 humanrights_activism_justice 1 pharma 1 multimedia 1 anthropology 1 engineering 1 hacking 1 branding 1 banking 1 mac_iphone 1 basketball 1 university 1 management 1 biology 1 democrat 1 radio 1 newspaper 1 ruby 1 agriculture 1 author 1 psychology_mentalhealth 1
  • Name: jorgefabrega Interests: sociology 8 tech 3 innovation 2 anthropology 1 mathematics 1 marketing 1 geography 1 developer 1 teaching 1 database 1 politics_news 1 university 1 finance_economics 1 philosophy 1 reporter 1
  • Name: PFCdgayo Interests: sociology 6 innovation 2 developer 2 mathematics 2 database 2 psychology_mentalhealth 1 php 1 comedy_funny 1 politics_news 1 engineering 1 anime 1 anthropology 1 tech 1 hacking 1 comics 1 biology 1 university 1
  • Name: chl Interests: python 11 flash 9 developer 8 tech 6 investor 6 ceo 5 database 3 ruby 3 html 3 biology 2 astronomy_physics 2 multimedia 1 gaming 1 geography 1 anthropology 1 mathematics 1 sociology 1 innovation 1 photography 1 neuroscience 1 buddhism 1 java 1 comedy_funny 1 chemistry 1 banking 1
  • Name: orgnet Interests: innovation 2 sociology 2 jewish 1 geography 1 publicrelations 1 management 1
  • Name: jure Interests: university 2 sociology 2 investor 1 liberal 1 sailing 1 finance_economics 1 database 1
  • Name: arnicas Interests: flash 8 python 7 developer 4 tech 4 html 3 database 2 sociology 2 innovation 2 jokes 2 engineering 2 astronomy_physics 2 tvshows_drama_actor_hollywood 2 mathematics 2 anime 1 dating 1 charity_philanthropy 1 multimedia 1 marketing 1 investor 1 anthropology 1 comedy_funny 1 politics_news 1 history 1 blogs 1 author 1 neuroscience 1 university 1 management 1 teaching 1 cinema 1 biology 1 climatechange 1 comics 1 reporter 1

Again I’d like to note that in order to find out about user’s interests using this method, there is no need to study his tweets. His friends ties already reveal quite a lot. The first couple of interests are often not that surprising, but some of the later interests reveal things about persons that I was not aware of.




About plotti2k1

Thomas Plotkowiak is working at the MCM Institute in the Social Media and Mobile communication group which belongs to the University of St. Gallen. His PhD research in Social Media is researching how the structure of social networks like Facebook and Twitter influences the diffusion of information. His main focus of work is Twitter, since it allows public access (and has a nice API). Make sure to also have a look at his recent publications. Thomas majored 2008 in Computer Science and Economics at the University of Mannheim and was involved at the computer science institutes for software development and multimedia technoIogy: SWT and PI4. During his studies I focused on Artificial Intelligence, Multimedia Technology, Logistics and Business Informatics. In his diploma/master thesis he developed an adhoc p2p audio engine for 3D Games. Thomas was also a researcher for a year at the University of Waterloo in Canada and in the Macquarie University in Sydney. He was part of the CSIRO ICT researcher group. In his freetime thomas likes to swim in his houselake (drei weiher) and run and enjoy hiking in the Appenzell region. Otherwise you will find him coding ideas he recently had or enjoying a beer with colleagues in the MeetingPoint or Schwarzer Engel.


4 thoughts on “Finding out what people are interested in by using only structural information

  1. Interesting, but my list is not accurate — partially right and partially wrong. I think you need more data … probably grabbed too small of a data chunk. I did a similar Twitter “list analysis” last year, but focused on people, rather than interests.

    But, a nice start! Keep at it and let us know what you get.

    Posted by Valdis Krebs | July 26, 2012, 9:25 pm
    • Well I’ve collected ~ 16.000 people and categorized them into 170 categories based on the keywords that were used in lists that these people were members of. Putting a person into one category seemed to be pretty ambiguous in the categories such as politics, liberal, democrat, magazine and so on which seem to have a high overlap of members.

      I think why some of your interests might be wrong might be due to this, and also that there were only 8 matches of your friend ties and these people. If these categories were bigger e.g. 1000 instead of 100 people I would have gotten more matches, but I also think the sets would be fuzzier.

      Posted by plotti2k1 | July 27, 2012, 12:37 pm
  2. Interesting approach. Seems like the user interests obtained from social graph has much less noise than content based approaches.

    Posted by xghan | July 27, 2012, 5:37 am
    • I agree, people generally seem to have a good understanding which interest a certain person represents on twitter. By following these people I think people give away quite a lot about their interests, even more than they do by what they write about. In the end this is what Twitter is about: following your interests.

      Posted by plotti2k1 | July 27, 2012, 12:33 pm

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: