Exploring genre on SoundCloud, part II

In my previous post on this topic, I introduced a problem – how to understand the work that explicit genre categorisations are made to do by people uploading tracks to the SoundCloud audio-sharing website – and a potential solution – identifying the three categories most frequently used by each individual in a sample and studying regularities in the ways in which pairs of categories tend to pop up within the same group of three. I also presented some partial and preliminary findings in the form of a matrix comparing co-occurrences of the five genre categories most frequently used by people within an initial sample. And I either glossed over or left unmentioned a slew of problems, some of which we’ve been more successful in addressing than others at present (because these are only blog posts, and we haven’t finished the research yet). The biggest problem is the sample itself: the analysis was done on the basis of a snowball sample, when a random sample would be more appropriate. Hence the provisionality of all this. The analysis will be redone soon on the basis of a sample that will enable us to make more robust claims, but in the meantime I wanted to share our thought processes and working methods with the world because – quite apart from anything else – I’m excited about the patterns that are emerging.

Ah, those patterns. In part i, I pointed out the clearest one: the third most-common genre category in our sample co-occurs very rarely with the first and second, but very frequently with the eighth. That is, there are many individuals in our sample among whose top three most frequently-used genre categories are to be found both ‘house’ (the most common) and ‘techno’ (the second most common), and even more among whose top three are to be found both ‘hip-hop’ (the third most common) and ‘rap’ (the eighth most common), but very few among whose top three are to be found both ‘hip-hop’ and ‘house’ or both ‘hip-hop’ and ‘techno’. Picking further patterns out by hand in a 50×50 matrix is likely to be unreliable, and there are a number of approaches we could take to automating the process. For example, we could compare the actually-occurring frequencies with what we would expect given the same total frequency for each individual category and completely random associations between the categories. That’s the sort of approach that corpus linguists take in identifying systematic relationships between pairs of lexical items such as ‘butter’ and ‘parsnips’. And in fact, it was one of the first things we tried. But while that approach can give a relatively robust measure of the strength of the association between any two categories, it’s of relatively limited use in helping the researcher to get a sense of how larger numbers of categories might interrelate as a system.

So we tried something else. This was to visualise the whole matrix as a graph, with categories represented by nodes connected together by edges representing co-occurrences in individual users’ top threes.

Once that first step has been taken, further options become available. First, one may lay out the resulting graph using a force-directed algorithm in order to place connected nodes closer together and unconnected nodes further apart. Second, one may draw thicker and thinner lines between pairs of nodes depending on how many times the categories they represent appear together. Third, one may resize each node to reflect the frequency with which it is connected to other nodes (its weighted degree). Fourth, one may use a community detection algorithm to identify densely-connected groups of nodes within the network, and then colour-code those groups.

So if there is no-one with (say) both ‘classical’ and ‘dubstep’ among his or her top three categories, the nodes representing those categories will be unconnected and distant from one another, but if there is a small handful of people with both these categories in the top three, those nodes will be connected by a skinny line, and if there are a great many such people, they will be connected by a thick line, and probably located close together in the visualised graph. Moreover, if ‘classical’ appears with many other categories but ‘dubstep’ does not, then the node representing the ‘classical’ category will be larger than the node representing the ‘dubstep’ genre. And if ‘classical’, ‘dubstep’, and ‘grindcore’ are all densely connected to one another through co-occurrence in many different SoundCloud users’ top threes, then they are likely to be detected as a community and given a single colour to distinguish them from other communities.

Enough explanation. Here’s the result, visualised using Gephi, with layout through the Fruchterman-Rheingold algorithm and community detection through the Louvain method:

A pretty picture, but what does it tell us? First, it tells us to be cautiously optimistic about the potential of the method described above in exploring systematic relationships between genre categories on SoundCloud, because the communities it presents map onto intuitive groupings recognisable from the real world. If ‘classical’, ‘dubstep’, and ‘grindcore’ had really appeared as a community, for example, then we would have been forced to consider the possibility that our sample was extremely unrepresentative, or that there was something wrong with our chosen community detection algorithm, or even that we were barking up the wrong methodological tree by exploring genre in this way, because our cultural knowledge tells us that grindcore, classical music, and dubstep have little or nothing to do with one another and are very unlikely to be produced by the same people. What we see in the above is, however, a community dominated by musical genres with a historical association with black performers – for want of a better word, let’s call it the ‘urban’ community – and a community dominated by forms of electronic music designed for dancing to in clubs or at raves – for want of a better word, let’s call it the ‘EDM’ community – as well as a somewhat loosely-connected ‘community’ of… everything that is neither urban nor EDM, including guitar music (e.g. ‘rock’), un-danceable electronic music (e.g. ‘ambient’), and acoustic music (e.g. ‘classical’). (There are some interesting anomalies, but most of these are easily explained. For example, ‘instrumental’ is probably found within the ‘urban’ community because of the role it appears to play – at least within our sample – in setting hip-hop apart from EDM in the absence of its most obvious distinguishing feature, i.e. the rapped vocal.)

This doesn’t really qualify as a discovery, because – in common with record company executives everywhere – we already knew that urban music and EDM existed as socially real categories. And it doesn’t quite answer the problem with which I began part i, i.e. of how to distinguish electronic music from everything else, since electronic music extends beyond the EDM community in the graph above (as already noted, ambient is a form of electronic music even if it isn’t danceable; moreover, some of the supposedly ‘urban’ genres above, such as jungle, drum & bass, and dubstep, are both electronic and extremely danceable). But it suggests an approach to the study of genres as emic categories, and to studying them quantitatively as well as qualitatively.

So the key question now (apart from will these patterns recur in a random sample?) is of how to use groupings such as the above to inform other work on this project. For example, we could take a network of ‘follow’ relationships (such as the one explored in Anna’s first post on quantitative data collection) and partition it according to whether the individuals concerned primarily gave the tracks they uploaded genre categories from the ‘urban’ community or from the ‘EDM’ community. We’ve already done something like this with regard to geographical location, and found a tendency for producers identifying as based within the same city to follow one another, hinting at the continued importance of local music scenes in fostering appreciation of value, even where such appreciation is expressed via the internet. The challenge will be to approach an understanding of how genre and location interact in structuring music makers’ recognitions of value in one another’s work. Is an uploader of trance in Leeds more likely to ‘follow’ an uploader of house in Ibiza or an uploader of RnB in his or her hometown, for example? And why (not)?

Right now, we simply don’t know. Which is a good place to be.


5 thoughts on “Exploring genre on SoundCloud, part II

  1. That’s a really interesting post. The question of the continuing importance of locality is going to be fascinating. On the one hand local scenes seem unimportant in a digital world yet they do seem to endure. From my own experiences I have made a number of connections with other artists through locality but usually because we met at or playrd at the same ‘club night’. In this case I have broader connections than I would otherwise have made but they still fit approximately to sub genres of the more left field or less dance oriented electronic music. I’m also fairly certain that without the offline connection I wouldn’t have made contact online.


    • Judging from some recent discussions/interviews with musicians, you’re not alone in making offline connections before online ones. It’s very interesting what you say about such connections and genres/subgenres – if I understand right, it sounds like face-to-face contexts create situations in which it’s easier to forge connections with people who are doing things slightly outside of your own area.

      I definitely agree that there are some fascinating questions to explore regarding the ongoing importance of locality. Musicians, for instance, often understand their music to be addressing an international audience, even through performances and pieces of music that respond to very specific places or groups of people. (I suppose that contradiction has been around since at least the invention of print, though…)


      • Yes, face to face interaction certainly increases the likelihood of making an online connection though the offline contact doesn’t need to be direct, it could be as little as “I see you played Club X, I’m playing there next month. We might see each other some time…”. I also think that these kind of offline contacts make it more likely that the online connection can be maintained past the initial ‘holiday romance’ period. It might be worth mentioning that it can work in the other direction with an online intact being the precursor to an offline contact usually through meeting at a gig or playing together. Again though,for me, the establishment of an offline relationship makes a long term online contact much more sustainable.


  2. So interesting, and so glad to see trance is on there, yet as one ‘genre’ which obscures the fine ‘sub-genre’ gradations. The current ‘hoo-ha’ (not sure if debate would be the right word really!) about “proper trance” having to be at least 138bpm (humph!) is an example of how qualitative work with producers/consumers can compliment your fascinating research insights.

    Mixcloud next? 😉


    • Yes – those sub-genre gradations and arguments are always fascinating! (Although also reminiscent of the People’s Front of Judea sketch in Life of Brian.) I think there’s a tension on SoundCloud between wanting to identify your track accurately so that people who like exactly that kind of music will find it, and wanting it to show up on the maximum possible number of searches so that other people will find it too. So the subgenre terms do turn up, but (except with subgenres of house: the most popular EDM genre on SoundCloud) they are relatively scarce in our data. And yes: Mixcloud next! We’re already making plans…

      In other news, I’ve now recreated the above graph using a random sample, and the results are broadly the same, except that dubstep and drum & bass now appear where they ‘ought’ to, i.e. in the EDM community. I think that dubstep may have been pulled into the ‘urban’ community above by its links to trap – but those are possibly misleading, because there’s both an electronic genre called ‘trap’ and a rap genre called ‘trap’, and it seems likely to me that the people uploading both dubstep and trap are not uploading the same kind of trap as the people uploading both hip hop and trap.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s