Research methodology

Qualitative research proceeded primarily through audio-recorded semi-structured interviews with London-based electronic music-makers (including producers, DJs, and one promoter). The interview protocol is included below. Ethnographic observation and note taking were carried out at a total of three electronic music gigs in London: at one, two interviewees were among the performers; at another, one of the workshop invitees was among the performers; at a third, one of the interviewees was in the audience. Two of the interviewees were already known to the researchers at the beginning of the project. All three team members carried out interviews.

Our initial quantitative approach was to follow a snowball sampling method in collecting public data automatically from SoundCloud. Snowball sampling involves starting with a seed individual, then finding out who is connected to that individual, then finding out who is connected to those individuals in turn, and so on. We did this on the basis of the ‘follow’, ‘like’, ‘comment’, ‘share’, and ‘group’ relationships. Having done this, we were able to construct networks of SoundCloud users for closer analysis. Because we started the snowball sampling process with our interviewees, we were able to start relating patterns in these networks to qualitative findings quite early on in the research process.

However, this approach ran into a number of problems. One was the sheer density of the network. One of our interviewees had over 7000 followers, and followed over 1000 people in turn; others that we encountered in the snowball sampling process had millions of followers. Some people commented on hundreds of tracks. Many groups had tens of thousands of members, some over 100000. All these factors combined meant that many millions of people would frequently be found within just two degrees of separation of a single individual. Yet at the same time, we would be prevented from discovering who all of those people were, because of limitations that the SoundCloud API places on downloads of information: one might find, for example, that a given user was a member of a group with 120000 members, but one would be unable to discover the identity of more than 8199 of them. Under such circumstances, snowball sampling becomes impractical, or even meaningless. We could not, for instance, use measures of network centrality to determine which individuals were most influential, because our knowledge of the ‘network’ was always artificially constrained in ways that we were unable to control, or be sure that we fully understood. A further problem was that we began to notice clear patterns in the data that had been collected, yet were unable to make statements about the extent to which those patterns might be generalisable across SoundCloud as a whole because we had no reason to suppose a snowball sample to be representative.

Following discussion at the expert workshop we organised on 15 May 2014, it was decided that we would change tack, adopting, in the first instance, an ‘ego-network’ approach based around each of our informants (that is to say, a series of snowball samples, each time stopping at one degree of separation), and in the second instance, a random sampling approach across SoundCloud as a whole.

An ego-network is a network consisting of all individuals connected to a particular individual, with interrelationships between them. This is usually done on the basis of information elicited from that individual; we chose to gather data via the SoundCloud API instead. The ego-networks we created were based around follow relationships: taking each interviewee as a seed, we would use the SoundCloud API to discover (a) what tracks he or she had uploaded and (b) who he or she followed and was followed by. Then, we would discover who each of those individuals followed and what tracks they had uploaded, ignoring anyone who neither followed nor was followed by the seed. To study how individuals interacted using the commenting facility, we used a similar but more restricted process: identifying the ten individuals whose tracks the seed user most frequently commented on and the ten individuals who most frequently commented on the seed user’s tracks.

To collect our random sample, we began by ascertaining the approximate highest number currently in use as a user ID on SoundCloud. This could only be an estimate, because new user IDs are being added all the time. As of the early July data when we began collecting the sample, this number was approaching 104 million. We then generated random numbers between 1 and 104 million and used the SoundCloud API to collect the following data for each account: track profile data for all the user’s tracks; all the comments on those tracks; user profile data for the users who had left those comments; user profile data for everyone followed by the user; user profile data for everyone following the user; all the comments the user had left on other people’s tracks; track profile data for those tracks; user profile data for the uploaders of those tracks. In order to study the relationship between following and genre, we also downloaded data on all tracks uploaded by users followed by members of the sample who had uploaded tracks. As noted above, limitations of the SoundCloud API prevented us from downloading more than 8199 items of data in each of these categories; in practice, the primary consequence of this limitation was to prevent us from being able to track all followers of an extreme minority of very popular SoundCloud users.

Comment data were filtered for language, because non-English comments were outside the scope of the project. However, filtering them turned out to be an interesting problem (a) because of the prevalence of re-spellings such as ‘fakkkkkkk’ (for ‘fuck’), and (b) because of the use of specialised and non-standard lexis such as ‘synth’ and ‘woot’ (which might in turn be re-spelled, e.g. as ‘wooooot’). Initial attempts to identify language using the Python package, guess-language (which purports to be able to detect more than 60 languages) proved highly inaccurate. Greater accuracy was achieved by searching for words within each comment that could be found in the OpenOffice spell checker dictionary for the English language, but this ran into problems because of (a) the heavy use of specialised English lexis such as ‘synth’ and ‘remix’ in non-English comments and (b) the large number of common words in continental European languages that also form acceptable strings in the English language. The accuracy of this procedure was improved by comparing counts of words from comments that were to be found in the spell checker dictionary for English with counts of words from the same comments that were to be found in the spell checker dictionary for other languages, but this resulted in a notable slowdown in processing speed and continued to mis-identify a proportion of comments. An acceptable solution settled on was to check the words in each comment against a list of the 5000 most common words in the British National Corpus. Excluding less common words had the useful effect of removing rare English words that are also (coincidentally or otherwise) to be found in other European languages (e.g. ‘para’). If more than 50% of the words in a comment were to be found in that list, then the comment was deemed to be in English.

Quantitative data were stored in an SQLite database, analysed with bespoke programs, and visualised with Gephi, with the latter’s Force Atlas algorithm used for positioning of nodes and with its implementation of the Louvain Method used for community detection (however, it was not used for calculation of centrality; see below). Comments were processed, cleaned, and sorted with bespoke programs, and analysed with AntConc. Qualitative data (interviews and notes) were transcribed, then read through independently by members of the team, who compared observations afterwards. Data scraping programs were written in Python 2.7 using SoundCloud’s Software Development Kit (SDK) for that language (this consists of a collection of programming tools to allow us to write programs to collect SoundCloud data). Other programs were also written in Python 2.7, with additional use of the Numpy and NetworkX libraries. The eigenvector centrality of nodes was calculated using NetworkX and then imported into Gephi, because Gephi’s approach to eigenvector centrality ignores edge weight. Code written in the course of the research is another output of this project, and is freely available at

Interview protocol

  1. Legal name and professional aliases
  2. Date and place of birth; current residence
  3. Ethnicity, sexuality
  4. Occupation and parents’ occupations
  5. How in your experience do producers of electronic music show that they value other people’s work?
  6. How do you show other producers of electronic music that you value their work?
  7. How have other producers of electronic music indicated that they value your work?
  8. [Follow-up to 5, 6, and 7.] Are online networks forums the most typical contexts for these expressions of value? Are there contexts where expressions of value tend to be made face to face?
  9. Who do you value as music producers? Who do you seek to emulate and how?
  10. Do you sell your works? How? Do you work as a performer? In what kinds of contexts?
  11. [Follow-up to 10.] About what proportion of your income comes from selling your works and/or performing?
  12. How much time do you spend making music per week? [Follow-up: How does this compare to the amount of time you spend doing other things – e.g. work, uni, pastimes?]
  13. In what kinds of venues do you present your music to audiences?
  14. [Follow-up/clarification.] What online networks do you make use of? Advantages? Disadvantages?
  15. [Follow-up/clarification.] In what offline contexts do you present your music? How often? With anyone else? What kinds of arrangements are involved?
  16. What is the highest level of education you have attained?
  17. How did you get into music?
  18. [This is formally addressed on the release form, but some additional texture is often appropriate, especially at the completion of the interview.] Are you happy for us to identify you by name in academic publications? By professional alias? Would you prefer to remain anonymous? Is there anything you have said in this interview that you would prefer we did not attribute to you by name? [Make any necessary notes and modifications to the release form and initial them.]
  19. Advertisement