EAS 648 Lab 04: Spatial Temporal analysis in R

Links:

Course Repository: https://github.com/HumblePasty/EAS648

Lab04 Webpage: https://humblepasty.github.io/EAS648/Lab/04/

Lab04 Repository: https://github.com/HumblePasty/EAS648/tree/master/Lab/04

Assignment

  1. Utilize sentiment analysis to study a textual document in a manner you find suitable. Select a lexicon library for assessing the sentiment of the dataset, such as determining whether it is positive, negative, or joyous. Present your findings using appropriate charts and provide an explanation of the results in 1-2 paragraphs.
  2. Conduct an analysis of ambiguous text within the dataset. Reflect on and provide examples of issues related to subjectivity, tone, context, polarity, irony, sarcasm, comparisons, and the use of neutral language in 2-3 paragraphs.

Haolin’s Note:

For this assignment, I’ll complete the task in 2 parts:

  • Part I: Utilize sentiment analysis to study a textual document in a manner you find suitable.
  • Part II: Conduct an analysis of ambiguous text within the dataset.
Flow Chart
Flow Chart

Flow Chart of Sentiment Analysis in R

Part I: Utilize sentiment analysis to study a textual document

Task:

Utilize sentiment analysis to study a textual document in a manner you find suitable. Select a lexicon library for assessing the sentiment of the dataset, such as determining whether it is positive, negative, or joyous. Present your findings using appropriate charts and provide an explanation of the results in 1-2 paragraphs.

Note: For this task, I used Halloqveen’ Qveen Herby for analysis

1.1 Fetch Data from Spotify

# using spotifyr to grab lyrics data
# install.packages("spotifyr")
# install_github('charlie86/spotifyr')
library(spotifyr)

# # using genius to lyrics, failed
# library(geniusr)
# library(dplyr)
# library(tidytext)
# 
# Sys.setenv(GENIUS_API_TOKEN = 'MopeXJwy6Z5B5Jye3YCvgEYurwGxqFjjhkSrLZk2564jxgXTWjNtXxD94GllMwEc')
# 
# # Lyrics
# thingCalledLoveLyrics <- get_lyrics_id(song_id = 81524)

# switching to spotifyr library
# STEP 1: setting token
# get spotify access token:
Sys.setenv(SPOTIFY_CLIENT_ID = '5a8451083ff143ada759c6395dd343e1')
Sys.setenv(SPOTIFY_CLIENT_SECRET = 'aaaf9164b35d4fc1ada87f7e3a5f3b08')

access_token = get_spotify_access_token()

# get album data
Album_Songs = get_album_tracks("4g1ZRSobMefqF6nelkgibi", authorization = access_token)

# spotifyr did not provide a function for fetching lyrics, so to get lyrics, I have to use a external api provided by a Github author:
library(httr)
library(jsonlite)

# get the lyrics data
lyrics_df = data.frame(track_number = character(), line_number = character(), track_title = character(), timeTag = character(), words = character(), stringsAsFactors = F)

# use for loop to fetch lyrics song by song
for (i in 1:nrow(Album_Songs)) {
  # construct request header
  # API Source: https://github.com/akashrchandran/spotify-lyrics-api
  response = GET("https://spotify-lyric-api-984e7b4face0.herokuapp.com", query = list(trackid = Album_Songs$id[i], format = "lrc"))
  
  # parse the result
  response = content(response, "parsed")
  
  # convert to data frame
  line_index = 1
  for (line in response$lines) {
    new_row = data.frame(track_number = i, line_number = line_index, track_title = Album_Songs$name[i], timeTag = line$timeTag, words = line$words, stringsAsFactors = FALSE)
    lyrics_df = rbind(lyrics_df, new_row)
    line_index = line_index + 1
  }
}

# can also save the lyrics file into csv
# write.csv(lyrics_df, "lyrics.csv", row.names = T)
lyrics_df = read.csv("lyrics.csv")

library(knitr)

kable(head(lyrics_df))
X track_number line_number track_title timeTag words
1 1 1 Obitchuary 00:21.54 Like a whisper have you heard the news
2 1 2 Obitchuary 00:25.61 That it’s over for a bitch like you?
3 1 3 Obitchuary 00:29.50 In the darkness, she was laid to rest
4 1 4 Obitchuary 00:33.41 Got a Birkin, I got no regrets
5 1 5 Obitchuary 00:38.63 You may not recognize me
6 1 6 Obitchuary 00:42.33 Obitchuary

1.2 Sentiment Analysis by Songs

# load the sentiment library and the lexicon
library(tidytext)
## 
## Attaching package: 'tidytext'
## The following object is masked from 'package:spotifyr':
## 
##     tidy
library(textdata)
## 
## Attaching package: 'textdata'
## The following object is masked from 'package:httr':
## 
##     cache_info
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ textdata::cache_info() masks httr::cache_info()
## ✖ dplyr::filter()        masks stats::filter()
## ✖ purrr::flatten()       masks jsonlite::flatten()
## ✖ dplyr::lag()           masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
head(get_sentiments("afinn"))
## # A tibble: 6 × 2
##   word       value
##   <chr>      <dbl>
## 1 abandon       -2
## 2 abandoned     -2
## 3 abandons      -2
## 4 abducted      -2
## 5 abduction     -2
## 6 abductions    -2
# get_sentiments("bing")
# get_sentiments("nrc")
# Combine the lyrics with lexicon

# pre-process
library(stringr)
## we need to make sure that the lyrics are characters
lyrics_df$words = as.character(lyrics_df$words)

tidy_song <- lyrics_df %>%
  group_by(track_title) %>%
  ungroup() %>%
  unnest_tokens(word,words)

# join with lexicon value
song_sentiment <- tidy_song %>%
  inner_join(get_sentiments("bing"))%>%
  count(track_title, index = line_number, sentiment)%>%
  spread(sentiment, n, fill = 0)%>%
  mutate(sentiment = positive - negative)
## Joining with `by = join_by(word)`
# plot the data
ggplot(song_sentiment, aes(index, sentiment, fill = track_title)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~track_title)

1.3 Sentiment Analysis by Words

# show the most common positive and negative word
word_counts <- tidy_song %>%
  inner_join(get_sentiments("bing")) %>%
  count(word, sentiment, sort = TRUE) %>%
  ungroup()
## Joining with `by = join_by(word)`
word_counts %>%
  group_by(sentiment) %>%
  top_n(10) %>%
  ungroup() %>%
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(word, n, fill = sentiment)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~sentiment, scales = "free_y") +
  labs(y = "Contribution to sentiment",
       x = NULL) +
  coord_flip()
## Selecting by n

# generate word clouds
library(wordcloud)
## Loading required package: RColorBrewer
library(reshape2)
## 
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
## 
##     smiths
library(RColorBrewer)

tidy_song %>%
  inner_join(get_sentiments("bing")) %>%
  count(word, sentiment, sort = TRUE) %>%
  acast(word ~ sentiment, value.var = "n", fill = 0) %>%
  comparison.cloud(colors = c("red", "blue"),
                   max.words = 100)
## Joining with `by = join_by(word)`

Part II: Conduct an analysis of ambiguous text within the dataset

Task:

Conduct an analysis of ambiguous text within the dataset. Reflect on and provide examples of issues related to subjectivity, tone, context, polarity, irony, sarcasm, comparisons, and the use of neutral language in 2-3 paragraphs.

2.1 Analyse Sentiment with Sentimentr

# generate the sentiment of the entire songs using sentimentr (accounts inter word sentiment)
library(sentimentr)

song_sentiment_sent <- lyrics_df %>%
    get_sentences() %>%
    sentiment_by(by = c('track_title', 'line_number'))%>%
  as.data.frame()

ggplot(song_sentiment_sent, aes(line_number, ave_sentiment, fill = track_title, color = track_title)) +
  ggtitle("Lyrics sentiment analysis with sentimentr") +
  geom_line(show.legend = F) +
  facet_wrap(~track_title)

2.2 Song “Abracadabra”

Let’s take a deeper look at the song “Abracadabra”

# Let's take a deeper look at the song "Abracadabra"
song_circle <- lyrics_df %>%
  filter(track_title == "Abracadabra")

ggplot(song_sentiment_sent[song_sentiment_sent$track_title == "Abracadabra",], aes(line_number, ave_sentiment, fill = track_title, color = track_title)) +
  ggtitle("Song \"Abracadabra\" sentiment line plot") +
  geom_line(show.legend = F)

# analysis lyrics emotions
song_emotion <- lyrics_df %>%
  filter(track_title == "Abracadabra")%>%
  get_sentences() %>%
  emotion_by(by = c('track_title', 'words'))%>%
  as.data.frame()

2.3 Ambiguous Text in Sentiment Analysis in R

Sentiment analysis in R usually involves libraries like syuzhet, tidytext, and text2vec , but they have their limitations in handling the complexity of natural human language in that:

Subjectivity

  • Issue: Sentiment analysis often struggles to accurately gauge the subjectivity in a text. Subjective statements are based on personal opinions, feelings, or beliefs, whereas objective statements are factual and verifiable.

  • Example: “I think this movie is amazing!” vs. “This movie was released in 2020.” The first sentence is subjective and should be identified as such by sentiment analysis algorithms.

Tone

  • Issue: Detecting the tone of a text is crucial as it can completely change the meaning. The tone can be serious, ironic, sarcastic, playful, etc.
  • Example: “What a brilliant performance!” can be a genuine compliment or a sarcastic remark, depending on the tone.

Context

  • Issue: Words and phrases can have different meanings in different contexts. Sentiment analysis algorithms may struggle to understand context.
  • Example: “This is sick!” could be negative in a healthcare context but positive when referring to a skateboard trick.

Polarity

  • Issue: Polarity refers to identifying whether a sentiment is positive, negative, or neutral. Words can have different polarities in different situations.
  • Example: “This film is unpredictably shocking.” The word “shocking” can be positive (exciting) or negative (disturbing), depending on the context.

Irony and Sarcasm

  • Issue: Irony and sarcasm are particularly challenging as they often imply the opposite of the literal meanings of words.
  • Example: “Great! Another rainy day.” This might be classified as positive sentiment when, in fact, it’s likely negative due to sarcasm.

Comparisons

  • Issue: Comparisons can be difficult to interpret correctly because they often involve both positive and negative sentiments.
  • Example: “This phone has a better camera than my previous one but a much shorter battery life.” This sentence contains both positive and negative sentiments.

Neutral Language

  • Issue: Determining the neutrality of a statement is tricky, especially when it contains elements that could be construed as slightly positive or negative.
  • Example: “The movie was three hours long.” This statement is neutral but could be misinterpreted depending on the algorithm’s training data (e.g., if long movies are generally seen as negative).
# show the sentiments of a single line
kable(song_emotion[song_emotion$words == "Comin' with the bad bitch magic (yeah)",c("track_title", "words", "emotion_type", "ave_emotion")])
track_title words emotion_type ave_emotion
129 Abracadabra Comin’ with the bad bitch magic (yeah) anger 0.2857143
130 Abracadabra Comin’ with the bad bitch magic (yeah) anger_negated 0.0000000
131 Abracadabra Comin’ with the bad bitch magic (yeah) anticipation 0.0000000
132 Abracadabra Comin’ with the bad bitch magic (yeah) anticipation_negated 0.0000000
133 Abracadabra Comin’ with the bad bitch magic (yeah) disgust 0.2857143
134 Abracadabra Comin’ with the bad bitch magic (yeah) disgust_negated 0.0000000
135 Abracadabra Comin’ with the bad bitch magic (yeah) fear 0.2857143
136 Abracadabra Comin’ with the bad bitch magic (yeah) fear_negated 0.0000000
137 Abracadabra Comin’ with the bad bitch magic (yeah) joy 0.0000000
138 Abracadabra Comin’ with the bad bitch magic (yeah) joy_negated 0.0000000
139 Abracadabra Comin’ with the bad bitch magic (yeah) sadness 0.2857143
140 Abracadabra Comin’ with the bad bitch magic (yeah) sadness_negated 0.0000000
141 Abracadabra Comin’ with the bad bitch magic (yeah) surprise 0.0000000
142 Abracadabra Comin’ with the bad bitch magic (yeah) surprise_negated 0.0000000
143 Abracadabra Comin’ with the bad bitch magic (yeah) trust 0.0000000
144 Abracadabra Comin’ with the bad bitch magic (yeah) trust_negated 0.0000000

In the song “Abracadabra”, there are a lot of explicit words that may make it easy for r to extract the sentiment of the sentences.

For example, as to the line above, words like “bad” and “bitch” are clear signs in words that can easily be picked up by sentimentr and tidytext

However, tidytext is not good at understanding metaphor, ironic and sarcastic lines or lines that consist of idioms, for example:

# show the sentiments of a single line
kable(song_emotion[song_emotion$words == "Game over, put chills in your bones I told ya",c("track_title", "words", "emotion_type", "ave_emotion")])
track_title words emotion_type ave_emotion
209 Abracadabra Game over, put chills in your bones I told ya anger 0
210 Abracadabra Game over, put chills in your bones I told ya anger_negated 0
211 Abracadabra Game over, put chills in your bones I told ya anticipation 0
212 Abracadabra Game over, put chills in your bones I told ya anticipation_negated 0
213 Abracadabra Game over, put chills in your bones I told ya disgust 0
214 Abracadabra Game over, put chills in your bones I told ya disgust_negated 0
215 Abracadabra Game over, put chills in your bones I told ya fear 0
216 Abracadabra Game over, put chills in your bones I told ya fear_negated 0
217 Abracadabra Game over, put chills in your bones I told ya joy 0
218 Abracadabra Game over, put chills in your bones I told ya joy_negated 0
219 Abracadabra Game over, put chills in your bones I told ya sadness 0
220 Abracadabra Game over, put chills in your bones I told ya sadness_negated 0
221 Abracadabra Game over, put chills in your bones I told ya surprise 0
222 Abracadabra Game over, put chills in your bones I told ya surprise_negated 0
223 Abracadabra Game over, put chills in your bones I told ya trust 0
224 Abracadabra Game over, put chills in your bones I told ya trust_negated 0

This line clearly show some kind of intimidation, “anticipation_negative”, but is not detected by sentimentr

Dealing with these issues often involves preprocessing the text data, fine-tuning sentiment analysis models, and sometimes incorporating more advanced natural language processing techniques like context-aware or deep learning models. Libraries like syuzhet, tidytext, and text2vec can be used for sentiment analysis, but they have their limitations in handling the above complexities. Understanding these challenges is crucial for interpreting the results of sentiment analysis accurately and for improving the algorithms used for this task.


That is the end of Lab 04