Links:
Course Repository: https://github.com/HumblePasty/EAS648
Lab04 Webpage: https://humblepasty.github.io/EAS648/Lab/04/
Lab04 Repository: https://github.com/HumblePasty/EAS648/tree/master/Lab/04
Assignment
- Utilize sentiment analysis to study a textual document in a manner you find suitable. Select a lexicon library for assessing the sentiment of the dataset, such as determining whether it is positive, negative, or joyous. Present your findings using appropriate charts and provide an explanation of the results in 1-2 paragraphs.
- Conduct an analysis of ambiguous text within the dataset. Reflect on and provide examples of issues related to subjectivity, tone, context, polarity, irony, sarcasm, comparisons, and the use of neutral language in 2-3 paragraphs.
Haolin’s Note:
For this assignment, I’ll complete the task in 2 parts:
- Part I: Utilize sentiment analysis to study a textual document in a manner you find suitable.
- Part II: Conduct an analysis of ambiguous text within the dataset.
Task:
Utilize sentiment analysis to study a textual document in a manner you find suitable. Select a lexicon library for assessing the sentiment of the dataset, such as determining whether it is positive, negative, or joyous. Present your findings using appropriate charts and provide an explanation of the results in 1-2 paragraphs.
Note: For this task, I used Halloqveen’ Qveen Herby for analysis
# using spotifyr to grab lyrics data
# install.packages("spotifyr")
# install_github('charlie86/spotifyr')
library(spotifyr)
# # using genius to lyrics, failed
# library(geniusr)
# library(dplyr)
# library(tidytext)
#
# Sys.setenv(GENIUS_API_TOKEN = 'MopeXJwy6Z5B5Jye3YCvgEYurwGxqFjjhkSrLZk2564jxgXTWjNtXxD94GllMwEc')
#
# # Lyrics
# thingCalledLoveLyrics <- get_lyrics_id(song_id = 81524)
# switching to spotifyr library
# STEP 1: setting token
# get spotify access token:
Sys.setenv(SPOTIFY_CLIENT_ID = '5a8451083ff143ada759c6395dd343e1')
Sys.setenv(SPOTIFY_CLIENT_SECRET = 'aaaf9164b35d4fc1ada87f7e3a5f3b08')
access_token = get_spotify_access_token()
# get album data
Album_Songs = get_album_tracks("4g1ZRSobMefqF6nelkgibi", authorization = access_token)
# spotifyr did not provide a function for fetching lyrics, so to get lyrics, I have to use a external api provided by a Github author:
library(httr)
library(jsonlite)
# get the lyrics data
lyrics_df = data.frame(track_number = character(), line_number = character(), track_title = character(), timeTag = character(), words = character(), stringsAsFactors = F)
# use for loop to fetch lyrics song by song
for (i in 1:nrow(Album_Songs)) {
# construct request header
# API Source: https://github.com/akashrchandran/spotify-lyrics-api
response = GET("https://spotify-lyric-api-984e7b4face0.herokuapp.com", query = list(trackid = Album_Songs$id[i], format = "lrc"))
# parse the result
response = content(response, "parsed")
# convert to data frame
line_index = 1
for (line in response$lines) {
new_row = data.frame(track_number = i, line_number = line_index, track_title = Album_Songs$name[i], timeTag = line$timeTag, words = line$words, stringsAsFactors = FALSE)
lyrics_df = rbind(lyrics_df, new_row)
line_index = line_index + 1
}
}
# can also save the lyrics file into csv
# write.csv(lyrics_df, "lyrics.csv", row.names = T)
lyrics_df = read.csv("lyrics.csv")
library(knitr)
kable(head(lyrics_df))
X | track_number | line_number | track_title | timeTag | words |
---|---|---|---|---|---|
1 | 1 | 1 | Obitchuary | 00:21.54 | Like a whisper have you heard the news |
2 | 1 | 2 | Obitchuary | 00:25.61 | That it’s over for a bitch like you? |
3 | 1 | 3 | Obitchuary | 00:29.50 | In the darkness, she was laid to rest |
4 | 1 | 4 | Obitchuary | 00:33.41 | Got a Birkin, I got no regrets |
5 | 1 | 5 | Obitchuary | 00:38.63 | You may not recognize me |
6 | 1 | 6 | Obitchuary | 00:42.33 | Obitchuary |
# load the sentiment library and the lexicon
library(tidytext)
##
## Attaching package: 'tidytext'
## The following object is masked from 'package:spotifyr':
##
## tidy
library(textdata)
##
## Attaching package: 'textdata'
## The following object is masked from 'package:httr':
##
## cache_info
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ textdata::cache_info() masks httr::cache_info()
## ✖ dplyr::filter() masks stats::filter()
## ✖ purrr::flatten() masks jsonlite::flatten()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
head(get_sentiments("afinn"))
## # A tibble: 6 × 2
## word value
## <chr> <dbl>
## 1 abandon -2
## 2 abandoned -2
## 3 abandons -2
## 4 abducted -2
## 5 abduction -2
## 6 abductions -2
# get_sentiments("bing")
# get_sentiments("nrc")
# Combine the lyrics with lexicon
# pre-process
library(stringr)
## we need to make sure that the lyrics are characters
lyrics_df$words = as.character(lyrics_df$words)
tidy_song <- lyrics_df %>%
group_by(track_title) %>%
ungroup() %>%
unnest_tokens(word,words)
# join with lexicon value
song_sentiment <- tidy_song %>%
inner_join(get_sentiments("bing"))%>%
count(track_title, index = line_number, sentiment)%>%
spread(sentiment, n, fill = 0)%>%
mutate(sentiment = positive - negative)
## Joining with `by = join_by(word)`
# plot the data
ggplot(song_sentiment, aes(index, sentiment, fill = track_title)) +
geom_col(show.legend = FALSE) +
facet_wrap(~track_title)
# show the most common positive and negative word
word_counts <- tidy_song %>%
inner_join(get_sentiments("bing")) %>%
count(word, sentiment, sort = TRUE) %>%
ungroup()
## Joining with `by = join_by(word)`
word_counts %>%
group_by(sentiment) %>%
top_n(10) %>%
ungroup() %>%
mutate(word = reorder(word, n)) %>%
ggplot(aes(word, n, fill = sentiment)) +
geom_col(show.legend = FALSE) +
facet_wrap(~sentiment, scales = "free_y") +
labs(y = "Contribution to sentiment",
x = NULL) +
coord_flip()
## Selecting by n
# generate word clouds
library(wordcloud)
## Loading required package: RColorBrewer
library(reshape2)
##
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
##
## smiths
library(RColorBrewer)
tidy_song %>%
inner_join(get_sentiments("bing")) %>%
count(word, sentiment, sort = TRUE) %>%
acast(word ~ sentiment, value.var = "n", fill = 0) %>%
comparison.cloud(colors = c("red", "blue"),
max.words = 100)
## Joining with `by = join_by(word)`
Task:
Conduct an analysis of ambiguous text within the dataset. Reflect on and provide examples of issues related to subjectivity, tone, context, polarity, irony, sarcasm, comparisons, and the use of neutral language in 2-3 paragraphs.
# generate the sentiment of the entire songs using sentimentr (accounts inter word sentiment)
library(sentimentr)
song_sentiment_sent <- lyrics_df %>%
get_sentences() %>%
sentiment_by(by = c('track_title', 'line_number'))%>%
as.data.frame()
ggplot(song_sentiment_sent, aes(line_number, ave_sentiment, fill = track_title, color = track_title)) +
ggtitle("Lyrics sentiment analysis with sentimentr") +
geom_line(show.legend = F) +
facet_wrap(~track_title)
Let’s take a deeper look at the song “Abracadabra”
# Let's take a deeper look at the song "Abracadabra"
song_circle <- lyrics_df %>%
filter(track_title == "Abracadabra")
ggplot(song_sentiment_sent[song_sentiment_sent$track_title == "Abracadabra",], aes(line_number, ave_sentiment, fill = track_title, color = track_title)) +
ggtitle("Song \"Abracadabra\" sentiment line plot") +
geom_line(show.legend = F)
# analysis lyrics emotions
song_emotion <- lyrics_df %>%
filter(track_title == "Abracadabra")%>%
get_sentences() %>%
emotion_by(by = c('track_title', 'words'))%>%
as.data.frame()
Sentiment analysis in R usually involves libraries like
syuzhet
,tidytext
, andtext2vec
, but they have their limitations in handling the complexity of natural human language in that:Subjectivity
Issue: Sentiment analysis often struggles to accurately gauge the subjectivity in a text. Subjective statements are based on personal opinions, feelings, or beliefs, whereas objective statements are factual and verifiable.
Example: “I think this movie is amazing!” vs. “This movie was released in 2020.” The first sentence is subjective and should be identified as such by sentiment analysis algorithms.
Tone
- Issue: Detecting the tone of a text is crucial as it can completely change the meaning. The tone can be serious, ironic, sarcastic, playful, etc.
- Example: “What a brilliant performance!” can be a genuine compliment or a sarcastic remark, depending on the tone.
Context
- Issue: Words and phrases can have different meanings in different contexts. Sentiment analysis algorithms may struggle to understand context.
- Example: “This is sick!” could be negative in a healthcare context but positive when referring to a skateboard trick.
Polarity
- Issue: Polarity refers to identifying whether a sentiment is positive, negative, or neutral. Words can have different polarities in different situations.
- Example: “This film is unpredictably shocking.” The word “shocking” can be positive (exciting) or negative (disturbing), depending on the context.
Irony and Sarcasm
- Issue: Irony and sarcasm are particularly challenging as they often imply the opposite of the literal meanings of words.
- Example: “Great! Another rainy day.” This might be classified as positive sentiment when, in fact, it’s likely negative due to sarcasm.
Comparisons
- Issue: Comparisons can be difficult to interpret correctly because they often involve both positive and negative sentiments.
- Example: “This phone has a better camera than my previous one but a much shorter battery life.” This sentence contains both positive and negative sentiments.
Neutral Language
- Issue: Determining the neutrality of a statement is tricky, especially when it contains elements that could be construed as slightly positive or negative.
- Example: “The movie was three hours long.” This statement is neutral but could be misinterpreted depending on the algorithm’s training data (e.g., if long movies are generally seen as negative).
# show the sentiments of a single line
kable(song_emotion[song_emotion$words == "Comin' with the bad bitch magic (yeah)",c("track_title", "words", "emotion_type", "ave_emotion")])
track_title | words | emotion_type | ave_emotion | |
---|---|---|---|---|
129 | Abracadabra | Comin’ with the bad bitch magic (yeah) | anger | 0.2857143 |
130 | Abracadabra | Comin’ with the bad bitch magic (yeah) | anger_negated | 0.0000000 |
131 | Abracadabra | Comin’ with the bad bitch magic (yeah) | anticipation | 0.0000000 |
132 | Abracadabra | Comin’ with the bad bitch magic (yeah) | anticipation_negated | 0.0000000 |
133 | Abracadabra | Comin’ with the bad bitch magic (yeah) | disgust | 0.2857143 |
134 | Abracadabra | Comin’ with the bad bitch magic (yeah) | disgust_negated | 0.0000000 |
135 | Abracadabra | Comin’ with the bad bitch magic (yeah) | fear | 0.2857143 |
136 | Abracadabra | Comin’ with the bad bitch magic (yeah) | fear_negated | 0.0000000 |
137 | Abracadabra | Comin’ with the bad bitch magic (yeah) | joy | 0.0000000 |
138 | Abracadabra | Comin’ with the bad bitch magic (yeah) | joy_negated | 0.0000000 |
139 | Abracadabra | Comin’ with the bad bitch magic (yeah) | sadness | 0.2857143 |
140 | Abracadabra | Comin’ with the bad bitch magic (yeah) | sadness_negated | 0.0000000 |
141 | Abracadabra | Comin’ with the bad bitch magic (yeah) | surprise | 0.0000000 |
142 | Abracadabra | Comin’ with the bad bitch magic (yeah) | surprise_negated | 0.0000000 |
143 | Abracadabra | Comin’ with the bad bitch magic (yeah) | trust | 0.0000000 |
144 | Abracadabra | Comin’ with the bad bitch magic (yeah) | trust_negated | 0.0000000 |
In the song “Abracadabra”, there are a lot of explicit words that may make it easy for r to extract the sentiment of the sentences.
For example, as to the line above, words like “bad” and “bitch” are clear signs in words that can easily be picked up by
sentimentr
andtidytext
However, tidytext is not good at understanding metaphor, ironic and sarcastic lines or lines that consist of idioms, for example:
# show the sentiments of a single line
kable(song_emotion[song_emotion$words == "Game over, put chills in your bones I told ya",c("track_title", "words", "emotion_type", "ave_emotion")])
track_title | words | emotion_type | ave_emotion | |
---|---|---|---|---|
209 | Abracadabra | Game over, put chills in your bones I told ya | anger | 0 |
210 | Abracadabra | Game over, put chills in your bones I told ya | anger_negated | 0 |
211 | Abracadabra | Game over, put chills in your bones I told ya | anticipation | 0 |
212 | Abracadabra | Game over, put chills in your bones I told ya | anticipation_negated | 0 |
213 | Abracadabra | Game over, put chills in your bones I told ya | disgust | 0 |
214 | Abracadabra | Game over, put chills in your bones I told ya | disgust_negated | 0 |
215 | Abracadabra | Game over, put chills in your bones I told ya | fear | 0 |
216 | Abracadabra | Game over, put chills in your bones I told ya | fear_negated | 0 |
217 | Abracadabra | Game over, put chills in your bones I told ya | joy | 0 |
218 | Abracadabra | Game over, put chills in your bones I told ya | joy_negated | 0 |
219 | Abracadabra | Game over, put chills in your bones I told ya | sadness | 0 |
220 | Abracadabra | Game over, put chills in your bones I told ya | sadness_negated | 0 |
221 | Abracadabra | Game over, put chills in your bones I told ya | surprise | 0 |
222 | Abracadabra | Game over, put chills in your bones I told ya | surprise_negated | 0 |
223 | Abracadabra | Game over, put chills in your bones I told ya | trust | 0 |
224 | Abracadabra | Game over, put chills in your bones I told ya | trust_negated | 0 |
This line clearly show some kind of intimidation, “anticipation_negative”, but is not detected by sentimentr
Dealing with these issues often involves preprocessing the text data, fine-tuning sentiment analysis models, and sometimes incorporating more advanced natural language processing techniques like context-aware or deep learning models. Libraries like syuzhet, tidytext, and text2vec can be used for sentiment analysis, but they have their limitations in handling the above complexities. Understanding these challenges is crucial for interpreting the results of sentiment analysis accurately and for improving the algorithms used for this task.
That is the end of Lab 04