Tracking and Analysing Regional Alcoholic Consumptions Patterns through social media
External event by HighWire student Daniel Kershaw on mining social media
Monitoring the rates of consumption of alcohol across the UK is a timely problem with an ever-increasing consumption levels, and calls from public service about the effect it is having on people and society. However methods that are currently utilised are costly, time consuming, and don't supply detailed enough results as they look at user rather than user patters of drinking. In this paper we look into the abilities of using Twit- ter (a popular micro blogging site) to monitor rate of alcohol consumption in regions across the UK. Looking into variation of term usage, along with trends within a specific key word set. This study was performed over a 2 week windows; analysing textual markers in over 17 million tweets all geo located within the UK. A score was given to each tweet based on the number of markers used and the sum of all markers in the tweet. Scores form different geo-locations sets and time granularities were compared to the ground truth data from the Health & Social Care Information Centre (HSCIC) weekly alcohol consumption pattern. We managed to get high lin- ear correlations between the ground truth and the twitter alcohol score, with highs of 0.87 with a p-value of less than 0.01 for regions in the UK. The near real-time monitoring of alcohol-consumption, and the limited overheads that the use of social media incurs, means that such a method could be used to inform decisions that in the past have relied purely on slow and laborious data collection methods (e.g. questionnaires). Different variations in language were detected over time and regions, with lags in certain words been detected, e.g. 'drunk' appearing at midnight, and 'hungover' appearing from around midday.