LONDON, SEPTEMBER 27 2012 — The process by which listeners decide what music they like – or dislike – on first hearing has become a step clearer, after a global hackathon event held jointly by EMI Music and Data Science London.
One hundred and seventy five data scientists from across the world signed up for the Music Data Science Hackathon in July, a 24-hour event hosted in London but accessible online worldwide, competing for £6,500 in cash prizes provided by EMI and EMC.
During the course of the event, more than 1,300 formulas and ideas were submitted in answer to the question: “Can you predict if a listener will love a new song?”, by analysing data from EMI’s Million Interview Dataset. The EMI Million Interview Dataset is one of the richest and largest music appreciation datasets ever made available – a massive, unique, rich, high-quality dataset compiled from global research that contains interests, attitudes, behaviours, familiarity, and appreciation of music as expressed by music fans.
The final winning models came from teams and entrants from around the world. The winners were:
- 1st Prize winner: Shanda Innovations, a collective of data scientists from China
- 2nd Prize winner: Wang Qing who is also based in China
- 3rd Prize winner: Martin O’Leary from the USA
- 1st Prize winner – London entrants: Dell Zhang
- 1st Prize winner – London entrants Data Visualisation Contest: Gregory Mead
The solutions to the task were scored using the online platform for predictive modelling competitions designed and run by Kaggle, which calculated an individual score for each entry. The most accurate solutions found that there was no single or over-riding individual measure or set of attributes which determine music preferences. Instead, it is combinations of various factors although to the surprise of many teams age and gender turned out to be distinctly weak predictors of musical taste.
A number of statistical and mathematical techniques were used among the entries. For example, a popular machine learning technique was ‘random forest’ which utilises a collection of ‘decision trees’ to aggregate predictions from a number of different statisticians working on selections of the EMI dataset. The aggregate of these decisions enables the final prediction to be determined.
The winning visualisations from Gregory Mead, CEO and co-founder of Musicmetri, included a graph plotting the relationship between how consumers feel about particular artists and the words they use to describe them. The line trends upwards – people generally use more positive words about music they like, as would be expected, but there are plenty of outlier words – for example people using more negative words to describe artists that they really like, illustrating that unexpected or negative feelings about an artist can contribute to a positive response to that artist and their music. Being different and left-field can clearly have its rewards in popularity.
David Boyle, SVP Consumer Insight for EMI Music said: “We were really pleased with how well this first hackathon with Data Science London went, and are very appreciative of the support we received from all our partners in making this such a success. We learned a great deal and huge congratulations to the winners. We haven’t uncovered the perfect solution to predicting consumer preference for songs yet, but through this competition and the skills and expertise of the data science community we’ve together moved closer to tackling this challenge. We plan to have many more events where we can continue build a strong international community of data scientists interested in utilising and learning from EMI’s amazing dataset so that we can deliver an even better service to all our artists.”
“The results of the Music Data Science Hackathon prove the amazing power of the collective intelligence and the value of our data science community.” said Carlos Somohano, Founder of Data Science London. “The algorithms presented by the participants and the winners are absolutely awesome, worlds class. We are already planning the next hackathon which will be focused on music and emotions.”
A more in-depth technical description of the solutions submitted by the winners can be found at: http://musicdatascience.com/the-winners/.
EMC, a world leader in data science and big data solutions, provided the event’s IT infrastructure and analytical tools to the contestants, as well as operational support for the competition through its Greenplum division.
“Community, learning and collaboration are at the heart of innovation. To succeed in the new world of Big Data, companies need to invest in innovation and experiment with data-sets to mine their real, untapped value,” said Chris Roche, Regional Director for EMC Greenplum. “The insights revealed in this hackathon hint at the power and potential that Big Data holds – both for intellectual discovery and for incremental business value for organisations of every kind. EMC will continue to invest in crowd sourcing events such as this and we look forward to the innovation and knowledge-creation they will inspire.”
The EMI Million Interview Dataset has been created over the last three years, interacting with consumers in 24 countries and across 15 languages to ask them about their music listening and consumption habits. At any given moment, at least 12 people are taking part in a survey somewhere in the world for EMI.
The Music Data Science Hackathon was hosted by EMI Music and non-profit organisation Data Science London, one of the largest and most active data science communities in the world. Kaggle hosted the competition on its collaborative, real-time, online platform for predictive modelling competitions and UK consultancy Adatis sponsored the Data Visualisation Contest. The EMI Million Interview Dataset was built utilising research tools provided by Lightspeed Research and Perspective Research.