Introduction to this Project

This is a tutorial on how to utilize R and RStudio in conjunction with Spotify’s API in order to gain insight to Spotify’s music algorithims, as well as leverage that information for optimized set lists and networking potentials. The benefits that this can offer for artists are numerous, namely the benefits of creating a setlist from information given by Spotify. Utilizing this information will give artists a better insight into how the progression of a show will go, optimize sets for minimal instrument changes, and plot a progression for the overall theme and feeling of a concert.

Initial tools

The key sources in this project are as follows:

R coding language
Rstudio
Spotify API
R plugins

Each of these offer their own benefits and lend themselves to the project in their own rites, as explained below.

R coding language

R is the backbone for this project. While it is not the most common coding language compared to things like HTML, C+, or Python, the benefits of it being free, open source, and extremely strong at compiling data make it a strong contender for this project. To obtain R, head to https://www.r-project.org/, find the “CRAN” section under the “download” tab, and select a mirror for your region. for the US, many options are available, and I would recommend any of the options closest to you. Installing this will give you access to the base coding language, but in order to have the system work for us, we will need to get something more user friendly.

RStudio

RStudio is a piece of software that directly interacts with the R language. This allows us to use extra plugins to get more use out of R without having to code it ourselves. RStudio is a very popular tool for interacting with R, and has a lot of crowd sourcing potential, as well as extra guides should you need help with navigating the program or expanding the work presented in this guide. To obtain RStudio, all that is needed is to head to https://www.rstudio.com/, and select the “RStudio” option from under the “Products” tab. From there, select the “RStudio Desktop” option. For our needs, we want to select the “Open source edition”. The site should automatically detect and recommend the version of RStudio you need for your operating system, but if it does not, simply select your current operating system from the options available. We are using the open source version for several reasons, including this copy is free, we are not using R to code in a professional manner, and the Pro edition has a yearly subscription of $1,000. While the site states that “support” for this version is only “community forums” this is actually a huge benefit, as there are many helpful guides, videos, and alternatives that create options to learn and solve problems quickly.

Spotify API

While there are many candidates for choices on API to use for data, the availability and ease of access to Spotify make it a prime option. This is further enhanced by the fact that Spotify is the most popular music streaming platform, and has a lot of technology under the hood that benefits the scope of this project. Spotify’s algorithms break down an artists music into several different valences, offering many options to better understand and shed light on ones music. If you already have developer credentials for Spotify, then you are off to a great start, if not, please consult below on how to gain these credentials and begin utilizing them.

Gaining developer credentials

This is extremely important to this project, as you will need these credentials for any of this to work properly. To gain access to the developer side of Spotify is not difficult, however, as they make it very automated, due to the potential of people creating their own tools, apps, and widgets on a regular basis. To begin this, you will first need to make a Spotify account. It can either be an existing account or a new account, premium or free. So long as you have an accout, you will be good to go. After this condition is met, head over to https://developer.spotify.com/ to begin the process. Log in with your account of choice and accept the Terms of Service. From here, you will be able to go to your “Dashboard” section and click the “Create an app” button. This will generate 2 long strings that will be essential, being 1) the client ID, and 2) the client secret. These will be available through this section of the dashboard if you ever need access to them, and can be copy and pasted into RStudio when needed. These are what authorize you to access Spotify’s API, and will be what you need to access any of the data you are trying to reach.

Gathering data from the Spotify API

You are now ready to access the Spotify API and download audio features data for your target artist as well as for your target artist’s related artists.

Loading the Spotify R package

SpotifyR is an R package that makes working with the Spotify API easier. This code checks to see whether the package is installed already on your computer, installs the package if necessary, and loads the package into memory so the script can use it.

if (!require("spotifyr")) install.packages("spotifyr")

## Loading required package: spotifyr

## Warning: package 'spotifyr' was built under R version 4.1.3

library(spotifyr)

API authorization

As noted earlier, accessing the Spotify API requires a Spotify API id and API secret. First, remove the # at the start of each of the following five lines of code. Then, replace PASTEYOURSPOTIFYIDHERE with your Spotify id, replace PASTEYOURSPOTIFYSECRETHERE with your Spotify secret, and run the code. Be sure to retain the quote marks in the first two lines of code. Omitting them will produce an error.

#id <- 'PASTEYOURSPOTIFYIDHERE'
#secret <- 'PASTEYOURSPOTIFYSECRETHERE'
#Sys.setenv(SPOTIFY_CLIENT_ID = id)
#Sys.setenv(SPOTIFY_CLIENT_SECRET = secret)
#access_token <- get_spotify_access_token()

Specify a target artist

Use this code to tell the script the name of the artist whose audio features data you want to retrieve. For this demonstration, I’m using spiritbox. You can use the name of any other artist or band with content on Spotify. Be sure to retain the quote marks. Note that data may not yet be available for newer artists or bands.

AlphaArtist <- "spiritbox"

Retrieve audio features data for the target artist

Nothing needs to be edited here. Just run the code, which will retrieve audio features data for the specified artist and store the data in a data frame called AAFeatures and also retrieve the artist’s Spotify ID for use in other parts of the script.

AAFeatures <- get_artist_audio_features(artist = AlphaArtist, 
                        include_groups = c("album","single"),
                        return_closest_artist = TRUE,
                        dedupe_albums = TRUE,
                        authorization = get_spotify_access_token())
Alpha_ArtistID <- AAFeatures$artist_id[1]

Understanding The Audio Features

here is a breakdown of the audio features that we will be using throughout this project:

Danceability: Creates an understanding of how easy it is to dance to a track, mostly tracked off regularity and the strength of beats in the track.

Valence: Covers the overall Positivity of a track. A song with a higher valence stat has a more upbeat and cheerful sound.

Energy: Covers the energetic feel of a song. Faster sounding songs or songs with lots of noise will have a high energy.

Tempo: Tempo keeps a track of a songs beats per minute; a higher BPM will create a higher score here.

Loudness: A track with loudness to it measured in decibels (Db) will chart high here. While a user can manage this on their end with volume controls, this is tracked off the scan of the initial upload to Spotify.

Speechiness: The presence of words in a song. This tracks the use of verbal input on a song and scores based on the actual verbal pieces of a track.

Instumentalness: This tracks the lack of words or vocals in a track. This plays interestingly with speechiness as it is not entirely inverse of one another.

Liveness: This is used to detect an audience in the track. Higher numbers mean the track likely indicates the track was recorded live.

Acousticness: Determines the acoustics of a song. Higher score is more likely to be made with acoustic instruments or without electronic assistance.

Duration: The length of a track (in Ms).

Key: Overall Key of the track. Using average pitch, it assigns a key to the song.

Finding Similar Artists

Running this code will reach into Spotify and find the top 20 closest related artists to our alpha artist. This is different to looking it up though Spotify’s front end because using Spotify adds bias based on your listening habits. Utilizing this code will pull from Spotify’s aggregate listening data, pulling data from all listeners to compile an average instead. This will also create a data set for our artists and compile it into a .CSV document to extract data from for mapping. No edits are needed, as we have designated the band we want to focus on above.

Similar_Artists <- get_related_artists(Alpha_ArtistID)

Similar_Artists$name

##  [1] "Silent Planet"     "ERRA"              "Darko US"         
##  [4] "Make Them Suffer"  "Thornhill"         "Alpha Wolf"       
##  [7] "Currents"          "Invent Animate"    "Novelists FR"     
## [10] "Northlane"         "Sleep Token"       "Void Of Vision"   
## [13] "Aviana"            "Crystal Lake"      "Monuments"        
## [16] "Hollow Front"      "Kingdom Of Giants" "Oceans Ate Alaska"
## [19] "Volumes"           "LANDMVRKS"

Similar_Artists_IDs <- as.list(Similar_Artists$id)
Compare_Features <- AAFeatures

print("Getting data for:")

## [1] "Getting data for:"

for(ID in Similar_Artists_IDs)

{This_Extraction <- get_artist_audio_features(artist = ID,

include_groups = c("album","single"),

return_closest_artist = TRUE,

dedupe_albums = TRUE,

authorization = get_spotify_access_token())

Compare_Features <- rbind(Compare_Features,This_Extraction)
print(This_Extraction$artist_name[1])
Sys.sleep(2)}

## [1] "Silent Planet"
## [1] "ERRA"
## [1] "Darko US"
## [1] "Make Them Suffer"
## [1] "Thornhill"
## [1] "Alpha Wolf"
## [1] "Currents"
## [1] "Invent Animate"
## [1] "Novelists FR"
## [1] "Northlane"
## [1] "Sleep Token"
## [1] "Void Of Vision"
## [1] "Aviana"
## [1] "Crystal Lake"
## [1] "Monuments"
## [1] "Hollow Front"
## [1] "Kingdom Of Giants"
## [1] "Oceans Ate Alaska"
## [1] "Volumes"
## [1] "LANDMVRKS"

print("All data extracted")

## [1] "All data extracted"

Audio_Features <- subset(Compare_Features,

select = -c(album_images,artists,available_markets))

write.csv(Audio_Features,"Audio_Features.csv", row.names = FALSE)

Creating Ridgeline Plots

In order to visualize our data, we can create various plots for our data sets to better understand and visualize our data. While we have the numerical values, if we want to create a visual representation of our data, we can use these data sets to create an option to do that. Running the code below will give us all the tools we need to run these data plots. Included is a link if you wish to have a better understanding or construct different ridgelines.

# Ridgeline plots. See: https://r-charts.com/distribution/ggridges/

if (!require("ggridges")) install.packages("ggridges")

## Loading required package: ggridges

## Warning: package 'ggridges' was built under R version 4.1.3

library(ggridges)

if (!require("ggplot2")) install.packages("ggplot2")

## Loading required package: ggplot2

## Warning: package 'ggplot2' was built under R version 4.1.3

library(ggplot2)

if (!require("tidyverse")) install.packages("tidyverse")

## Loading required package: tidyverse

## Warning: package 'tidyverse' was built under R version 4.1.3

## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --

## v tibble  3.1.4     v dplyr   1.0.7
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   2.0.1     v forcats 0.5.1
## v purrr   0.3.4

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(tidyverse)

Comparing Related Artists

These ridge line plots will create a graph for each of our valences. This will create a chart for our top 20 artists based on a per valence and show how our alpha artist stacks into the average of the data provided. Utilizing these ridge plots will give us a better understanding of these valences and where we land in relation to our peers at a glance. If any edits are desired based off of the link above, they can be made in the code here, but if we are happy with the output as shown, we can run the code as is. for each of our Y-axis, we can designate the artist_name, and then each of our valences for our various charts.

#Ridgeline plots

ggplot(Audio_Features, aes(x = danceability,

y = fct_reorder(artist_name,danceability),

fill = stat(x))) +

labs(x="Danceability", y=NULL) +

geom_density_ridges_gradient() +

scale_fill_viridis_c(name = "Danceability", option = "C")

## Picking joint bandwidth of 0.0359

ggplot(Audio_Features, aes(x = energy,

y = fct_reorder(artist_name,energy),

fill = stat(x))) +

labs(x="Energy", y=NULL) +

geom_density_ridges_gradient() +

scale_fill_viridis_c(name = "Energy", option = "C")

## Picking joint bandwidth of 0.02

ggplot(Audio_Features, aes(x = loudness,

y = fct_reorder(artist_name,loudness),

fill = stat(x))) +

labs(x="Loudness", y=NULL) +

geom_density_ridges_gradient() +

scale_fill_viridis_c(name = "Loudness", option = "C")

## Picking joint bandwidth of 0.576

ggplot(Audio_Features, aes(x = speechiness,

y = fct_reorder(artist_name,speechiness),

fill = stat(x))) +

labs(x="Speechiness", y=NULL) +

geom_density_ridges_gradient() +

scale_fill_viridis_c(name = "Speechiness", option = "C")

## Picking joint bandwidth of 0.0222

ggplot(Audio_Features, aes(x = acousticness,

y = fct_reorder(artist_name,acousticness),

fill = stat(x))) +

labs(x="Acousticness", y=NULL) +

geom_density_ridges_gradient() +

scale_fill_viridis_c(name = "Acousticness", option = "C")

## Picking joint bandwidth of 0.00671

ggplot(Audio_Features, aes(x = instrumentalness,

y = fct_reorder(artist_name,instrumentalness),

fill = stat(x))) +

labs(x="Instrumentalness", y=NULL) +

geom_density_ridges_gradient() +

scale_fill_viridis_c(name = "Instrumentalness", option = "C")

## Picking joint bandwidth of 0.0738

ggplot(Audio_Features, aes(x = liveness,

y = fct_reorder(artist_name,liveness),

fill = stat(x))) +

labs(x="Liveness", y=NULL) +

geom_density_ridges_gradient() +

scale_fill_viridis_c(name = "Liveness", option = "C")

## Picking joint bandwidth of 0.0575

ggplot(Audio_Features, aes(x = valence,

y = fct_reorder(artist_name,valence),

fill = stat(x))) +

labs(x="Valence", y=NULL) +

geom_density_ridges_gradient() +

scale_fill_viridis_c(name = "Valence", option = "C")

## Picking joint bandwidth of 0.0484

ggplot(Audio_Features, aes(x = tempo,

y = fct_reorder(artist_name,tempo),

fill = stat(x))) +

labs(x="Tempo", y=NULL) +

geom_density_ridges_gradient() +

scale_fill_viridis_c(name = "Tempo", option = "C")

## Picking joint bandwidth of 9.74

ggplot(Audio_Features, aes(x = duration_ms,

y = fct_reorder(artist_name,duration_ms),

fill = stat(x))) +

labs(x="Miliseconds", y=NULL) +

geom_density_ridges_gradient() +

scale_fill_viridis_c(name = "Miliseconds", option = "C")

## Picking joint bandwidth of 14600

Looking at Top Tracks

These lines of code will pull the data for the top tracks for each related artist. This will allow us to compare each groups individual song. Running this code will create a table for us.

Top_Tracks <- get_artist_top_tracks(Alpha_ArtistID, market = "US")

Artist_count <- 0

print("Getting data for:")

## [1] "Getting data for:"

for(ID in Similar_Artists_IDs)

{Artist_count <- Artist_count +1

print(Artist_count)

This_Extraction <- get_artist_top_tracks(ID, market = "US")

Top_Tracks <- rbind(Top_Tracks,This_Extraction)

Sys.sleep(1)}

## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10
## [1] 11
## [1] 12
## [1] 13
## [1] 14
## [1] 15
## [1] 16
## [1] 17
## [1] 18
## [1] 19
## [1] 20

print("All data extracted")

## [1] "All data extracted"

Top_Tracks$track_id <- Top_Tracks$id

Top_Tracks <- merge(Top_Tracks, Audio_Features, by = "track_id")

Creating Ridgeline Plots of Related Artists Top Tracks

Now that we have extracted the data for our top tracks, we can run these extractions below to create a set of ridge lines.

ggplot(Top_Tracks, aes(x = danceability,

y = fct_reorder(artist_name,danceability),

fill = stat(x))) +

labs(x="Danceability - Top Tracks Only", y=NULL) +

geom_density_ridges_gradient() +

scale_fill_viridis_c(name = "Danceability", option = "C")

## Picking joint bandwidth of 0.0456

ggplot(Top_Tracks, aes(x = energy,

y = fct_reorder(artist_name,energy),

fill = stat(x))) +

labs(x="Energy - Top Tracks Only", y=NULL) +

geom_density_ridges_gradient() +

scale_fill_viridis_c(name = "Energy", option = "C")

## Picking joint bandwidth of 0.0245

ggplot(Top_Tracks, aes(x = loudness,

y = fct_reorder(artist_name,loudness),

fill = stat(x))) +

labs(x="Loudness - Top Tracks Only", y=NULL) +

geom_density_ridges_gradient() +

scale_fill_viridis_c(name = "Loudness", option = "C")

## Picking joint bandwidth of 0.541

ggplot(Top_Tracks, aes(x = speechiness,

y = fct_reorder(artist_name,speechiness),

fill = stat(x))) +

labs(x="Speechiness - Top Tracks Only", y=NULL) +

geom_density_ridges_gradient() +

scale_fill_viridis_c(name = "Speechiness", option = "C")

## Picking joint bandwidth of 0.0255

ggplot(Top_Tracks, aes(x = acousticness,

y = fct_reorder(artist_name,acousticness),

fill = stat(x))) +

labs(x="Acousticness - Top Tracks Only", y=NULL) +

geom_density_ridges_gradient() +

scale_fill_viridis_c(name = "Acousticness", option = "C")

## Picking joint bandwidth of 0.00734

ggplot(Top_Tracks, aes(x = instrumentalness,

y = fct_reorder(artist_name,instrumentalness),

fill = stat(x))) +

labs(x="Instrumentalness - Top Tracks Only", y=NULL) +

geom_density_ridges_gradient() +

scale_fill_viridis_c(name = "Instrumentalness", option = "C")

## Picking joint bandwidth of 0.0456

ggplot(Top_Tracks, aes(x = liveness,

y = fct_reorder(artist_name,liveness),

fill = stat(x))) +

labs(x="Liveness - Top Tracks Only", y=NULL) +

geom_density_ridges_gradient() +

scale_fill_viridis_c(name = "Liveness", option = "C")

## Picking joint bandwidth of 0.0695

ggplot(Top_Tracks, aes(x = valence,

y = fct_reorder(artist_name,valence),

fill = stat(x))) +

labs(x="Valence - Top Tracks Only", y=NULL) +

geom_density_ridges_gradient() +

scale_fill_viridis_c(name = "Valence", option = "C")

## Picking joint bandwidth of 0.0561

ggplot(Top_Tracks, aes(x = tempo,

y = fct_reorder(artist_name,tempo),

fill = stat(x))) +

labs(x="Tempo - Top Tracks Only", y=NULL) +

geom_density_ridges_gradient() +

scale_fill_viridis_c(name = "Tempo", option = "C")

## Picking joint bandwidth of 12.8

ggplot(Top_Tracks, aes(x = duration_ms.x,

y = fct_reorder(artist_name,duration_ms.x),

fill = stat(x))) +

labs(x="Miliseconds - Top Tracks Only", y=NULL) +

geom_density_ridges_gradient() +

scale_fill_viridis_c(name = "Miliseconds", option = "C")

## Picking joint bandwidth of 15800

Related Artists Tracks, By Key

Running the script below will create a barred plot showing the different keys for artists, creating a breakdown of preferred keys.

library(dplyr)

keys <- Audio_Features %>% count(artist_name, key_mode)

keys$Artist <- keys$artist_name

ggplot(keys, aes(fill=Artist, y=n, x=key_mode)) +

geom_bar(position="stack", stat="identity") +

coord_flip() +

ggtitle("Key frequency, by artist") +

ylab("Track count") +

xlab("")

Top Tracks, by Key

Alternatively, if we wanted to do this for top tracks of artists, we could do that by running this:

library(dplyr)

keys <- Top_Tracks %>% count(artist_name, key_mode)

keys$Artist <- keys$artist_name

ggplot(keys, aes(fill=key_mode, y=n, x=Artist)) +

geom_bar(position="stack", stat="identity") +

coord_flip() +

ggtitle("Key frequency, by artist") +

ylab("Track count") +

xlab("")

Finding Our Own Data

If we want to do some introspective research, we can do that as well. Below is a string to pull per track audio features for an album. We can find these track ID’s in our “AAFeatures table.” From there, we can designate a new table based on the information gathered. Since this is Spiritbox’s latest album “Eternal Blue”, we can designate the table using “Eternal_Blue”

Eternal_Blue = get_track_audio_features('2glEXDEzubpETiDRXfC4oX,2rFaJ6NRIRWn335tQr8lWD,3yk51U329nwdpeIHV0O5ez,6nhA5FGEnH5wm3cr2zhMf7,39sAePHCDbaZlpLow8lRp4,09egp10m2GLUAgNoutZQ7A,3q7kMFce0TnDafVUzq8IpE,6gJ0ydZombiOIs4NaxnFXR,4y8iYaEWA7MrMiyNFqidtR,4JZycTDeBXufDFM8mbRbeq,6I5zXzSDByTEmYZ7ePVQeB,0hdDPaUbhi1OkzhyicPSBb')

Eternal_Blue <- cbind(Eternal_Blue, "Track"=1:nrow(Eternal_Blue))

Plotting our Own Albums

For this, we will need another source for plotting our own albums. Plotly will allow for an alternative charting solution that works better with single targets, such as plotting an album.

if (!require("plotly")) install.packages("plotly")

## Loading required package: plotly

## Warning: package 'plotly' was built under R version 4.1.3

## 
## Attaching package: 'plotly'

## The following object is masked from 'package:ggplot2':
## 
##     last_plot

## The following object is masked from 'package:stats':
## 
##     filter

## The following object is masked from 'package:graphics':
## 
##     layout

library(plotly)

Charting our Album

Using Plotly, we can now create a plotting for our album. using the strings of code below, we can chart based on each track, for each valence on the album. These line charts create an easier read out on a per track basis.

plot_ly(Eternal_Blue, x = ~Track, y = ~danceability,

               name = 'danceability',

               type = 'scatter',

               mode = 'lines+markers')

plot_ly(Eternal_Blue, x = ~Track, y = ~energy,

               name = 'energy',

               type = 'scatter',

               mode = 'lines+markers')

plot_ly(Eternal_Blue, x = ~Track, y = ~loudness,

               name = 'loudness',

               type = 'scatter',

               mode = 'lines+markers')

plot_ly(Eternal_Blue, x = ~Track, y = ~speechiness,

               name = 'speechiness',

               type = 'scatter',

               mode = 'lines+markers')

plot_ly(Eternal_Blue, x = ~Track, y = ~acousticness,

               name = 'acousticness',

               type = 'scatter',

               mode = 'lines+markers')

plot_ly(Eternal_Blue, x = ~Track, y = ~instrumentalness,

               name = 'instrumentalness',

               type = 'scatter',

               mode = 'lines+markers')

plot_ly(Eternal_Blue, x = ~Track, y = ~liveness,

               name = 'liveness',

               type = 'scatter',

               mode = 'lines+markers')

plot_ly(Eternal_Blue, x = ~Track, y = ~valence,

               name = 'valence',

               type = 'scatter',

               mode = 'lines+markers')

plot_ly(Eternal_Blue, x = ~Track, y = ~tempo,

               name = 'tempo',

               type = 'scatter',

               mode = 'lines+markers')

plot_ly(Eternal_Blue, x = ~Track, y = ~duration_ms,

               name = 'duration_ms',

               type = 'scatter',

               mode = 'lines+markers')

Decoding Spotify Audio Features Data

Paul Zielenski

2022-11-03