Exploring genre on SoundCloud, part I

One of the problems you’re always going to face when studying electronic music is the need to decide what you think ‘electronic music’ means. It’s a question of genre, and as Paul DiMaggio acknowledged in one of his most influential papers, genre is at once a formal and a social concept:

Literally, a genre is a ‘kind’ or ‘type’ of art. The notion of genre presumes that some aggregation principle enables observers to sort cultural products into categories. Formalists treat genres as comprising works that share conventions of form or content… Art historians also define genres in terms of shared conventions, but focus as well on social relations among producers in identifying ‘schools’ or ‘artistic movements’… Although students of popular culture and literary theorists of the ‘reader-response’ school consider formal similarities, they acknowledge that genres are partially constituted by the audiences that support them (DiMaggio 1987, p. 441)

Of the three approaches, the formalist one is the hardest to make headway with – and electronic music is a case in point. Defining electronic music as ‘music made with electronic instruments’ might seem straightforward, for example, but it doesn’t get you very far because pop and rap are also made with electronic instruments, yet – culturally speaking – are considered to be very different things. So one of the first things the project team started to look at was what SoundCloud users do – or rather, used to do – with the ‘genre’ free text field.[1]

One of our first findings was that, while it was not uncommon for tracks to be assigned to the ‘electronic’ genre, it was far more common for them to be assigned to a genre that would generally be categorised as a form of electronic music, for example ‘techno’, ‘dubstep’, or ‘house’. The question was, how to identify the set of ‘electronic’ genres. At first, we approached the problem by looking at the genres that were most commonly used and relying on our cultural knowledge to decide whether they were likely to be electronic or not. ‘Trance’ yes, ‘classical’ no. But having so much data to work with (even in this small initial sample) raised the possibility of approaching the problem quantitatively. And that’s where things got really interesting.

It turned out that, while SoundCloud only allowed a user to assign a single genre to a track, users often gave tracks multiple genres by informal means, for example separating genre terms with slashes, backslashes, or commas (‘and’ and ‘&’ are also used in this way, but less often, and they appear within genres such as ‘drum and bass’ or ‘rock and roll’). These multi-genre strings were recognised by SoundCloud as unique genres – that is, it treated ‘pop’, ‘rap’, and ‘pop / rap’ as three different genres – as too were alternative spellings of the same genres, e.g. ‘hip hop’ and ‘hip-hop’ were different genres from SoundCloud’s point of view.

Using a short Python program, we took all the ‘genre’ strings from all the tracks that had been uploaded by an initial snowball sample of 1500 users, cut them up into smaller strings wherever the most common separators (e.g. ‘/’) were present, changed the word ‘and’ to the ampersand (so that ‘drum and bass’ and ‘drum & bass’ could be treated as a single genre), and removed spaces and hyphens (so that ‘hip hop’, ‘hip-hop’, and ‘hiphop’ all become ‘hiphop’). We then identified the resulting genre terms most commonly used by each user, and created a matrix showing how often the 50 most common genre terms in the sample as a whole appeared among the three most common genre terms used by a single individual. Here’s the part of that matrix covering the most common five overall:

house techno hiphop deephouse electronic
house 359 39 484 139
techno 359 9 141 97
hiphop 39 9 11 50
deephouse 484 141 11 42
electronic 139 97 50 42

Reading the first row shows you how many times the term ‘house’ appeared with ‘techno’, ‘hiphop’, ‘deephouse’, or ‘electronic’ among the most common three genre terms in a single SoundCloud user’s uploads. So there were for example 359 SoundCloud users in our initial sample whose top three genre terms included both ‘house’ and ‘techno’: no surprise there, because these were the two most common genre terms overall. But the co-occurrence of terms does not appear to be random, as we clearly see when we look at how the first and second most common genre terms relate to the third and fourth: it turns out that ‘house’ occurs far more frequently with ‘deephouse’ than with ‘techno’, even though that was a less common term overall, and that ‘hiphop’ was commonly used by very, very few of the people who commonly used the terms ‘techno’ or ‘house’. Instead, ‘hiphop’ co-occurred most commonly with the eighth most common genre term, ‘rap’. Altogether, there were 562 users in our sample whose most commonly used three genre terms included both ‘rap’ and ‘hiphop’: the highest rate of co-occurrence in the sample.

So already in the above, we start to see patterns in the usage of the SoundCloud ‘genre’ tag that are suggestive of associations and disassociations between musical genres: associations and disassociations that in turn hint at relationships among SoundCloud users (and, one might conjecture, among music-makers offline): DiMaggio’s genre-defining ‘social relations among producers’ (above). For example, it would seem plausible that ‘rap’ and ‘hiphop’ co-occur so frequently because rapping is such a prominent feature of hiphop music, while ‘hiphop’ and ‘techno’ co-occur so rarely because techno is a form of electronic music, which is produced by a different group of people than hip-hop. Eminem (2002) once rapped that ‘nobody listens to techno’, but it might have been more accurate to say that, generally speaking, rappers don’t listen to techno. What the above figures begin to suggest is that they don’t tend to upload it to SoundCloud either.

In part II, we’ll take a look at what happens when the co-occurrence matrix is visualised as a network.

[1] This feature was recently removed from the SoundCloud uploader; it’s still available, but through a slightly less obvious mechanism.


DiMaggio, Paul (1987). ‘Classification in art’. American Sociological Review 52 (94): 440-455.
Eminem (2002). ‘Without me’. New York: Shady Records / Aftermath Entertainment.


4 thoughts on “Exploring genre on SoundCloud, part I

  1. Hi Daniel,
    Interesting, I look forward to the follow up post. This initial post prompts me to ask about the nature of your sample, and how that could affect the results. E.g. if you take some sub-samples within your initial sample, can you find groups of rappers who do listen to techno?
    Could you write a bit more about your sampling method? I get ‘snowball’ but exactly how, and if you plan to change it when you move from this ‘initial’ sample to a larger sample.


    • Thanks for the comment! Those are good questions. In this blog, we report on our work in progress, and any findings we discuss here are preliminary (there’s a report scheduled to appear in July, but even that’s comparatively early days). The main thing to bear in mind for now is that a snowball sample is not a representative sample, which is one of the reasons I’ve presented the above as tentative and suggestive. The analysis here goes beyond what we’d originally planned to do, and it will need to be redone using a random sample. When we do that, it will still be possible, for example, that – absolutely by chance – we miss out a group of techno-loving rappers (and incidentally I think your idea of zooming in on seemingly anomalous groups like that is a very good one), but we’ll be able to provide a statistical measure of confidence that this wasn’t what produced the patterns. In the meantime, keep dropping back to see how our work is developing!


  2. Interestingly, soundcloud automatically generates tags for uploaded material which seem wildly inaccurate; I was recently tagged as krautrock (mmm possibly) and black metal (not likely). I’ve never really found electronic a useful tag for a genre as electronic instruments and production are used, as you point out by so many; I’m not even sure I know what to expect from ‘electronica’ or IDM but they are a bit more useful. Surely, the reason for multiple tags and the reason for defining the music amongst producers is simple – soundcloud is all about getting your music heard and the best way is to appear in as many searches and ‘music like this’ lists as possible. The more genre tags the better.


    • I think that the reason SoundCloud now foregrounds (potentially multiple) tags rather than a genre field may have been that it realised people were kicking against the requirement to use a single genre (by putting things like ‘pop / rap’, as above). Which in itself is interesting from a theoretical point of view. And the distinction you bring up between ‘electronic’ and ‘electronica’ is a fascinating one. ‘Electronic’ might be anything from instrumental trap to a theremin solo. But ‘electronica’ – if someone tells me I’m going to hear that, I form much clearer expectations.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s