April 11, 2012

Data Mining on Social Media Platforms

Unless you are Rip Van Winkle who has just woken up from long sleep, it is very unlikely that you would be unaware of how social media frenzy has been and is sweeping through the ecosystem. Facebook, Twitter and Youtube is the new trinity that everyone is interested in, not the Brahma-Vishnu-Maheshwar or the Father-Son-HolyGhost that our ancestor talked about.

While usage and consumption of social media is what most of us are interested in, social scientists and market research professionals have a deeper interest in this amazing phenomenon. In the past, such research was carried out through traditional tools like questionnaires but given the vast quantity of data and the speed with which it changes we need newer and better tools. As a part of the curriculum of the MBA program at the Vinod Gupta School of Management, IIT Kharagpur students were required to explore and present newer tools -- that are legitimately available at no cost -- for extracting and analysing social media data. In this post, we see a sample of their work -- a list of free, open source tools that will allow anyone to access data from various social media platforms and analyse them for information and insights.

NodeXL is very simple tool based on Microsoft Excel that will allow you to extract a whole lot of data from most social media platforms. The following presentation has an overview of how it an be used.
Social Media data is all about the study of graphs along with their nodes and edges and some more information the nomenclature can be found on this link in addition their home page at http://nodexl.codeplex.com/

Gephi ( pronounced jee-phai) is another very useful tool that can extract data from various social media and make useful analysis of the same. This is explained further in this tutorial. Tutorial - Gephi
 The next interesting tool that we came across was Impure and this explained in the following presentation

This needs a little more coding effort but the results are pretty impressive. More information is available in this tutorial and on their website.

Social media is the biggest battlefield ( the Kurukshtra) between Google and Facebook ( who are the Pandavas ? who are the Kauravas ?) and but for us who dabble in social media, the intersection between the two is a rich source of data.

The best way to access the treasure trove of data inside Facebook is to use the Graph API explorer. If you go into your own Facebook page and search for this you will land up on the Graph API page and you can obtain your authorisation token. Using this token to extract and analye data is not easy and this where the power of Google Refine becomes evident.

You can also see this Youtube video to understand how to use these two powerful tools together. Two tutorials are available at http://mpvp4u.blogspot.in/2012/04/google-refine-tutorial.html and at http://gaarora.blogspot.in/2012/04/google-refinevgsom.html

Last by not the least, let us look at Twittersave

This is a simple tool developed by Mr Amod Gupta of VGSOM that allows on to quickly extract some data from Twitter.

The philosophy that we try to imbibe in our students at VGSOM is that they should be creators -- not just consumers -- of knowledge and these assignments are designed to help them to go beyond text books and push the frontiers of what is possible in today's techno economic environment. These are some examples of what we do at VGSOM. Keep reading this blog for more insights.


Sandip Maiti said...

This is an area of great interest to digital marketing professionals who wish to understand "conversation" dynamics on a social network; how topics get seeded, travel paths, velocity etc and empirically relate the same to influencing factors. Facebook is pouring over such data everyday and already leveraging knowledge harnessed to sell "advertising space". It is not by chance that you see an advertisement that your friend may have 'liked' on FB. Data has proven that you are more likely to click on an AD, that your friend has liked.

The data visualization tools are extremely useful in research and for e.x. can help in visualizing the spread of a brand's buzz. Thanks to all the public domain API's shared by the world's most powerful web services, we can do amazing things, mashing up the insights.

At our agency, we deal with vast amounts of analytics data when we do a brand campaign. Interestingly, what we have now realized is that brand custodians are 'burdened' with data; they need simple measurement frameworks to present insights on how their brand is engaged in the conversation space. We are thus working with an early model of measuring a brand's buzz index (across the entire social web, and not just one service).

IMHO we are just emerging from the dark ages in digital marketing.

ABTC said...

great blog