This is a tutorial on how to utilize R and RStudio in conjunction with Spotify’s API in order to gain insight to Spotify’s music algorithims, as well as leverage that information for optimized set lists and networking potentials. The benefits that this can offer for artists are numerous, namely the benefits of creating a setlist from information given by Spotify. Utilizing this information will give artists a better insight into how the progression of a show will go, optimize sets for minimal instrument changes, and plot a progression for the overall theme and feeling of a concert.
The key sources in this project are as follows:
R coding language
Rstudio
Spotify API
R plugins
Each of these offer their own benefits and lend themselves to the project in their own rites, as explained below.
R is the backbone for this project. While it is not the most common coding language compared to things like HTML, C+, or Python, the benefits of it being free, open source, and extremely strong at compiling data make it a strong contender for this project. To obtain R, head to https://www.r-project.org/, find the “CRAN” section under the “download” tab, and select a mirror for your region. for the US, many options are available, and I would recommend any of the options closest to you. Installing this will give you access to the base coding language, but in order to have the system work for us, we will need to get something more user friendly.
RStudio is a piece of software that directly interacts with the R language. This allows us to use extra plugins to get more use out of R without having to code it ourselves. RStudio is a very popular tool for interacting with R, and has a lot of crowd sourcing potential, as well as extra guides should you need help with navigating the program or expanding the work presented in this guide. To obtain RStudio, all that is needed is to head to https://www.rstudio.com/, and select the “RStudio” option from under the “Products” tab. From there, select the “RStudio Desktop” option. For our needs, we want to select the “Open source edition”. The site should automatically detect and recommend the version of RStudio you need for your operating system, but if it does not, simply select your current operating system from the options available. We are using the open source version for several reasons, including this copy is free, we are not using R to code in a professional manner, and the Pro edition has a yearly subscription of $1,000. While the site states that “support” for this version is only “community forums” this is actually a huge benefit, as there are many helpful guides, videos, and alternatives that create options to learn and solve problems quickly.
While there are many candidates for choices on API to use for data, the availability and ease of access to Spotify make it a prime option. This is further enhanced by the fact that Spotify is the most popular music streaming platform, and has a lot of technology under the hood that benefits the scope of this project. Spotify’s algorithms break down an artists music into several different valences, offering many options to better understand and shed light on ones music. If you already have developer credentials for Spotify, then you are off to a great start, if not, please consult below on how to gain these credentials and begin utilizing them.
This is extremely important to this project, as you will need these credentials for any of this to work properly. To gain access to the developer side of Spotify is not difficult, however, as they make it very automated, due to the potential of people creating their own tools, apps, and widgets on a regular basis. To begin this, you will first need to make a Spotify account. It can either be an existing account or a new account, premium or free. So long as you have an accout, you will be good to go. After this condition is met, head over to https://developer.spotify.com/ to begin the process. Log in with your account of choice and accept the Terms of Service. From here, you will be able to go to your “Dashboard” section and click the “Create an app” button. This will generate 2 long strings that will be essential, being 1) the client ID, and 2) the client secret. These will be available through this section of the dashboard if you ever need access to them, and can be copy and pasted into RStudio when needed. These are what authorize you to access Spotify’s API, and will be what you need to access any of the data you are trying to reach.
You are now ready to access the Spotify API and download audio features data for your target artist as well as for your target artist’s related artists.
SpotifyR is an R package that makes working with the Spotify API easier. This code checks to see whether the package is installed already on your computer, installs the package if necessary, and loads the package into memory so the script can use it.
if (!require("spotifyr")) install.packages("spotifyr")
## Loading required package: spotifyr
## Warning: package 'spotifyr' was built under R version 4.1.3
library(spotifyr)
Use this code to tell the script the name of the artist whose audio
features data you want to retrieve. For this demonstration, I’m using
spiritbox
. You can use the name of any other artist or band
with content on Spotify. Be sure to retain the quote marks. Note that
data may not yet be available for newer artists or bands.
AlphaArtist <- "spiritbox"
Nothing needs to be edited here. Just run the code, which will retrieve audio features data for the specified artist and store the data in a data frame called AAFeatures and also retrieve the artist’s Spotify ID for use in other parts of the script.
AAFeatures <- get_artist_audio_features(artist = AlphaArtist,
include_groups = c("album","single"),
return_closest_artist = TRUE,
dedupe_albums = TRUE,
authorization = get_spotify_access_token())
Alpha_ArtistID <- AAFeatures$artist_id[1]
here is a breakdown of the audio features that we will be using throughout this project:
Danceability: Creates an understanding of how easy it is to dance to a track, mostly tracked off regularity and the strength of beats in the track.
Valence: Covers the overall Positivity of a track. A song with a higher valence stat has a more upbeat and cheerful sound.
Energy: Covers the energetic feel of a song. Faster sounding songs or songs with lots of noise will have a high energy.
Tempo: Tempo keeps a track of a songs beats per minute; a higher BPM will create a higher score here.
Loudness: A track with loudness to it measured in decibels (Db) will chart high here. While a user can manage this on their end with volume controls, this is tracked off the scan of the initial upload to Spotify.
Speechiness: The presence of words in a song. This tracks the use of verbal input on a song and scores based on the actual verbal pieces of a track.
Instumentalness: This tracks the lack of words or vocals in a track. This plays interestingly with speechiness as it is not entirely inverse of one another.
Liveness: This is used to detect an audience in the track. Higher numbers mean the track likely indicates the track was recorded live.
Acousticness: Determines the acoustics of a song. Higher score is more likely to be made with acoustic instruments or without electronic assistance.
Duration: The length of a track (in Ms).
Key: Overall Key of the track. Using average pitch, it assigns a key to the song.
Running this code will reach into Spotify and find the top 20 closest related artists to our alpha artist. This is different to looking it up though Spotify’s front end because using Spotify adds bias based on your listening habits. Utilizing this code will pull from Spotify’s aggregate listening data, pulling data from all listeners to compile an average instead. This will also create a data set for our artists and compile it into a .CSV document to extract data from for mapping. No edits are needed, as we have designated the band we want to focus on above.
Similar_Artists <- get_related_artists(Alpha_ArtistID)
Similar_Artists$name
## [1] "Silent Planet" "ERRA" "Darko US"
## [4] "Make Them Suffer" "Thornhill" "Alpha Wolf"
## [7] "Currents" "Invent Animate" "Novelists FR"
## [10] "Northlane" "Sleep Token" "Void Of Vision"
## [13] "Aviana" "Crystal Lake" "Monuments"
## [16] "Hollow Front" "Kingdom Of Giants" "Oceans Ate Alaska"
## [19] "Volumes" "LANDMVRKS"
Similar_Artists_IDs <- as.list(Similar_Artists$id)
Compare_Features <- AAFeatures
print("Getting data for:")
## [1] "Getting data for:"
for(ID in Similar_Artists_IDs)
{This_Extraction <- get_artist_audio_features(artist = ID,
include_groups = c("album","single"),
return_closest_artist = TRUE,
dedupe_albums = TRUE,
authorization = get_spotify_access_token())
Compare_Features <- rbind(Compare_Features,This_Extraction)
print(This_Extraction$artist_name[1])
Sys.sleep(2)}
## [1] "Silent Planet"
## [1] "ERRA"
## [1] "Darko US"
## [1] "Make Them Suffer"
## [1] "Thornhill"
## [1] "Alpha Wolf"
## [1] "Currents"
## [1] "Invent Animate"
## [1] "Novelists FR"
## [1] "Northlane"
## [1] "Sleep Token"
## [1] "Void Of Vision"
## [1] "Aviana"
## [1] "Crystal Lake"
## [1] "Monuments"
## [1] "Hollow Front"
## [1] "Kingdom Of Giants"
## [1] "Oceans Ate Alaska"
## [1] "Volumes"
## [1] "LANDMVRKS"
print("All data extracted")
## [1] "All data extracted"
Audio_Features <- subset(Compare_Features,
select = -c(album_images,artists,available_markets))
write.csv(Audio_Features,"Audio_Features.csv", row.names = FALSE)
In order to visualize our data, we can create various plots for our data sets to better understand and visualize our data. While we have the numerical values, if we want to create a visual representation of our data, we can use these data sets to create an option to do that. Running the code below will give us all the tools we need to run these data plots. Included is a link if you wish to have a better understanding or construct different ridgelines.
# Ridgeline plots. See: https://r-charts.com/distribution/ggridges/
if (!require("ggridges")) install.packages("ggridges")
## Loading required package: ggridges
## Warning: package 'ggridges' was built under R version 4.1.3
library(ggridges)
if (!require("ggplot2")) install.packages("ggplot2")
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 4.1.3
library(ggplot2)
if (!require("tidyverse")) install.packages("tidyverse")
## Loading required package: tidyverse
## Warning: package 'tidyverse' was built under R version 4.1.3
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v tibble 3.1.4 v dplyr 1.0.7
## v tidyr 1.1.3 v stringr 1.4.0
## v readr 2.0.1 v forcats 0.5.1
## v purrr 0.3.4
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(tidyverse)
These lines of code will pull the data for the top tracks for each related artist. This will allow us to compare each groups individual song. Running this code will create a table for us.
Top_Tracks <- get_artist_top_tracks(Alpha_ArtistID, market = "US")
Artist_count <- 0
print("Getting data for:")
## [1] "Getting data for:"
for(ID in Similar_Artists_IDs)
{Artist_count <- Artist_count +1
print(Artist_count)
This_Extraction <- get_artist_top_tracks(ID, market = "US")
Top_Tracks <- rbind(Top_Tracks,This_Extraction)
Sys.sleep(1)}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10
## [1] 11
## [1] 12
## [1] 13
## [1] 14
## [1] 15
## [1] 16
## [1] 17
## [1] 18
## [1] 19
## [1] 20
print("All data extracted")
## [1] "All data extracted"
Top_Tracks$track_id <- Top_Tracks$id
Top_Tracks <- merge(Top_Tracks, Audio_Features, by = "track_id")
Alternatively, if we wanted to do this for top tracks of artists, we could do that by running this:
library(dplyr)
keys <- Top_Tracks %>% count(artist_name, key_mode)
keys$Artist <- keys$artist_name
ggplot(keys, aes(fill=key_mode, y=n, x=Artist)) +
geom_bar(position="stack", stat="identity") +
coord_flip() +
ggtitle("Key frequency, by artist") +
ylab("Track count") +
xlab("")
If we want to do some introspective research, we can do that as well. Below is a string to pull per track audio features for an album. We can find these track ID’s in our “AAFeatures table.” From there, we can designate a new table based on the information gathered. Since this is Spiritbox’s latest album “Eternal Blue”, we can designate the table using “Eternal_Blue”
Eternal_Blue = get_track_audio_features('2glEXDEzubpETiDRXfC4oX,2rFaJ6NRIRWn335tQr8lWD,3yk51U329nwdpeIHV0O5ez,6nhA5FGEnH5wm3cr2zhMf7,39sAePHCDbaZlpLow8lRp4,09egp10m2GLUAgNoutZQ7A,3q7kMFce0TnDafVUzq8IpE,6gJ0ydZombiOIs4NaxnFXR,4y8iYaEWA7MrMiyNFqidtR,4JZycTDeBXufDFM8mbRbeq,6I5zXzSDByTEmYZ7ePVQeB,0hdDPaUbhi1OkzhyicPSBb')
Eternal_Blue <- cbind(Eternal_Blue, "Track"=1:nrow(Eternal_Blue))
For this, we will need another source for plotting our own albums. Plotly will allow for an alternative charting solution that works better with single targets, such as plotting an album.
if (!require("plotly")) install.packages("plotly")
## Loading required package: plotly
## Warning: package 'plotly' was built under R version 4.1.3
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(plotly)
Using Plotly, we can now create a plotting for our album. using the strings of code below, we can chart based on each track, for each valence on the album. These line charts create an easier read out on a per track basis.
plot_ly(Eternal_Blue, x = ~Track, y = ~danceability,
name = 'danceability',
type = 'scatter',
mode = 'lines+markers')
plot_ly(Eternal_Blue, x = ~Track, y = ~energy,
name = 'energy',
type = 'scatter',
mode = 'lines+markers')
plot_ly(Eternal_Blue, x = ~Track, y = ~loudness,
name = 'loudness',
type = 'scatter',
mode = 'lines+markers')
plot_ly(Eternal_Blue, x = ~Track, y = ~speechiness,
name = 'speechiness',
type = 'scatter',
mode = 'lines+markers')
plot_ly(Eternal_Blue, x = ~Track, y = ~acousticness,
name = 'acousticness',
type = 'scatter',
mode = 'lines+markers')
plot_ly(Eternal_Blue, x = ~Track, y = ~instrumentalness,
name = 'instrumentalness',
type = 'scatter',
mode = 'lines+markers')
plot_ly(Eternal_Blue, x = ~Track, y = ~liveness,
name = 'liveness',
type = 'scatter',
mode = 'lines+markers')
plot_ly(Eternal_Blue, x = ~Track, y = ~valence,
name = 'valence',
type = 'scatter',
mode = 'lines+markers')
plot_ly(Eternal_Blue, x = ~Track, y = ~tempo,
name = 'tempo',
type = 'scatter',
mode = 'lines+markers')
plot_ly(Eternal_Blue, x = ~Track, y = ~duration_ms,
name = 'duration_ms',
type = 'scatter',
mode = 'lines+markers')