Using Twitter as a Data Source is Challenging: PhD candidate explained

Twitter generates a lot of data every day. Over 500 million tweets are generated from its over 316 million active users. You would expect it to be a data goldmine, but it is not as explained by a PhD candidate, Wasim Ahmed from the London School of Economics.

He explains that social networks like Twitter can be used alongside the traditional methods of obtaining data like surveys and interviews to obtain data but it has its own unique challenges in the quest to obtain this data. This is due to the fact that content on Twitter, like other new age social networking sites are user generated.

One of the key challenges he points out is Representivity where he says “Twitter users are not representative of the national offline population, not even representative of Internet users and most strikingly not representative of Twitter users” This is because Twitter encourages people to post their opinion about something so it may be hard for a researcher to get a proper sample of a population from a known location on Twitter.

He also talks about the issue of spam where people are clickbaited on links and hashtags and how it is hard to determine if a certain account is real or has real followers. This may cause your data to be inaccurate and discrepancies may appear in your research.

Another issue is obtaining large amounts of data from Twitter which is costly if you are interested in data that extends beyond 7 days. In relation to this, thanks to Twitter’s terms and conditions, you’re not allowed to share data that you get.

There is also the issue of asking for consent to use someone’s tweet in a publication and if you retrieved a lot of tweets, it may not be possible to get consent from the people who posted them. If you go ahead and post them, these people will take legal issues if they discover you didn’t consult them.

What does this mean? Major networking platforms should come up with tools to make it easier for researchers to use the immense data they have or being generated by the day for research. Twitter has been testing user generated polls and they could refine that possibly by making a better one which researchers could use to extract accurate data.