EAS 648 Lab 04: Spatial Temporal analysis in R

Links:

Course Repository: https://github.com/HumblePasty/EAS648

Lab04 Webpage: https://humblepasty.github.io/EAS648/Lab/04/

Lab04 Repository: https://github.com/HumblePasty/EAS648/tree/master/Lab/04

Assignment

Utilize sentiment analysis to study a textual document in a manner you find suitable. Select a lexicon library for assessing the sentiment of the dataset, such as determining whether it is positive, negative, or joyous. Present your findings using appropriate charts and provide an explanation of the results in 1-2 paragraphs.

Conduct an analysis of ambiguous text within the dataset. Reflect on and provide examples of issues related to subjectivity, tone, context, polarity, irony, sarcasm, comparisons, and the use of neutral language in 2-3 paragraphs.

Haolin’s Note:

For this assignment, I’ll complete the task in 2 parts:

Part I: Utilize sentiment analysis to study a textual document in a manner you find suitable.

Part II: Conduct an analysis of ambiguous text within the dataset.

Flow Chart

Flow Chart of Sentiment Analysis in R

Part I: Utilize sentiment analysis to study a textual document

Task:

Utilize sentiment analysis to study a textual document in a manner you find suitable. Select a lexicon library for assessing the sentiment of the dataset, such as determining whether it is positive, negative, or joyous. Present your findings using appropriate charts and provide an explanation of the results in 1-2 paragraphs.

Note: For this task, I used Halloqveen’ Qveen Herby for analysis

1.1 Fetch Data from Spotify

# using spotifyr to grab lyrics data
# install.packages("spotifyr")
# install_github('charlie86/spotifyr')
library(spotifyr)

# # using genius to lyrics, failed
# library(geniusr)
# library(dplyr)
# library(tidytext)
# 
# Sys.setenv(GENIUS_API_TOKEN = 'MopeXJwy6Z5B5Jye3YCvgEYurwGxqFjjhkSrLZk2564jxgXTWjNtXxD94GllMwEc')
# 
# # Lyrics
# thingCalledLoveLyrics <- get_lyrics_id(song_id = 81524)

# switching to spotifyr library
# STEP 1: setting token
# get spotify access token:
Sys.setenv(SPOTIFY_CLIENT_ID = '5a8451083ff143ada759c6395dd343e1')
Sys.setenv(SPOTIFY_CLIENT_SECRET = 'aaaf9164b35d4fc1ada87f7e3a5f3b08')

access_token = get_spotify_access_token()

# get album data
Album_Songs = get_album_tracks("4g1ZRSobMefqF6nelkgibi", authorization = access_token)

# spotifyr did not provide a function for fetching lyrics, so to get lyrics, I have to use a external api provided by a Github author:
library(httr)
library(jsonlite)

# get the lyrics data
lyrics_df = data.frame(track_number = character(), line_number = character(), track_title = character(), timeTag = character(), words = character(), stringsAsFactors = F)

# use for loop to fetch lyrics song by song
for (i in 1:nrow(Album_Songs)) {
  # construct request header
  # API Source: https://github.com/akashrchandran/spotify-lyrics-api
  response = GET("https://spotify-lyric-api-984e7b4face0.herokuapp.com", query = list(trackid = Album_Songs$id[i], format = "lrc"))
  
  # parse the result
  response = content(response, "parsed")
  
  # convert to data frame
  line_index = 1
  for (line in response$lines) {
    new_row = data.frame(track_number = i, line_number = line_index, track_title = Album_Songs$name[i], timeTag = line$timeTag, words = line$words, stringsAsFactors = FALSE)
    lyrics_df = rbind(lyrics_df, new_row)
    line_index = line_index + 1
  }
}

# can also save the lyrics file into csv
# write.csv(lyrics_df, "lyrics.csv", row.names = T)
lyrics_df = read.csv("lyrics.csv")

library(knitr)

kable(head(lyrics_df))

X	track_number	line_number	track_title	timeTag	words
1	1	1	Obitchuary	00:21.54	Like a whisper have you heard the news
2	1	2	Obitchuary	00:25.61	That it’s over for a bitch like you?
3	1	3	Obitchuary	00:29.50	In the darkness, she was laid to rest
4	1	4	Obitchuary	00:33.41	Got a Birkin, I got no regrets
5	1	5	Obitchuary	00:38.63	You may not recognize me
6	1	6	Obitchuary	00:42.33	Obitchuary

1.2 Sentiment Analysis by Songs

# load the sentiment library and the lexicon
library(tidytext)

## 
## Attaching package: 'tidytext'

## The following object is masked from 'package:spotifyr':
## 
##     tidy

library(textdata)

## 
## Attaching package: 'textdata'

## The following object is masked from 'package:httr':
## 
##     cache_info

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ textdata::cache_info() masks httr::cache_info()
## ✖ dplyr::filter()        masks stats::filter()
## ✖ purrr::flatten()       masks jsonlite::flatten()
## ✖ dplyr::lag()           masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

head(get_sentiments("afinn"))

## # A tibble: 6 × 2
##   word       value
##   <chr>      <dbl>
## 1 abandon       -2
## 2 abandoned     -2
## 3 abandons      -2
## 4 abducted      -2
## 5 abduction     -2
## 6 abductions    -2

# get_sentiments("bing")
# get_sentiments("nrc")

# Combine the lyrics with lexicon

# pre-process
library(stringr)
## we need to make sure that the lyrics are characters
lyrics_df$words = as.character(lyrics_df$words)

tidy_song <- lyrics_df %>%
  group_by(track_title) %>%
  ungroup() %>%
  unnest_tokens(word,words)

# join with lexicon value
song_sentiment <- tidy_song %>%
  inner_join(get_sentiments("bing"))%>%
  count(track_title, index = line_number, sentiment)%>%
  spread(sentiment, n, fill = 0)%>%
  mutate(sentiment = positive - negative)

## Joining with `by = join_by(word)`

# plot the data
ggplot(song_sentiment, aes(index, sentiment, fill = track_title)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~track_title)

1.3 Sentiment Analysis by Words

# show the most common positive and negative word
word_counts <- tidy_song %>%
  inner_join(get_sentiments("bing")) %>%
  count(word, sentiment, sort = TRUE) %>%
  ungroup()

## Joining with `by = join_by(word)`

word_counts %>%
  group_by(sentiment) %>%
  top_n(10) %>%
  ungroup() %>%
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(word, n, fill = sentiment)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~sentiment, scales = "free_y") +
  labs(y = "Contribution to sentiment",
       x = NULL) +
  coord_flip()

## Selecting by n

# generate word clouds
library(wordcloud)

## Loading required package: RColorBrewer

library(reshape2)

## 
## Attaching package: 'reshape2'

## The following object is masked from 'package:tidyr':
## 
##     smiths

library(RColorBrewer)

tidy_song %>%
  inner_join(get_sentiments("bing")) %>%
  count(word, sentiment, sort = TRUE) %>%
  acast(word ~ sentiment, value.var = "n", fill = 0) %>%
  comparison.cloud(colors = c("red", "blue"),
                   max.words = 100)

## Joining with `by = join_by(word)`

Part II: Conduct an analysis of ambiguous text within the dataset

Task:

Conduct an analysis of ambiguous text within the dataset. Reflect on and provide examples of issues related to subjectivity, tone, context, polarity, irony, sarcasm, comparisons, and the use of neutral language in 2-3 paragraphs.

2.1 Analyse Sentiment with Sentimentr

# generate the sentiment of the entire songs using sentimentr (accounts inter word sentiment)
library(sentimentr)

song_sentiment_sent <- lyrics_df %>%
    get_sentences() %>%
    sentiment_by(by = c('track_title', 'line_number'))%>%
  as.data.frame()

ggplot(song_sentiment_sent, aes(line_number, ave_sentiment, fill = track_title, color = track_title)) +
  ggtitle("Lyrics sentiment analysis with sentimentr") +
  geom_line(show.legend = F) +
  facet_wrap(~track_title)

2.2 Song “Abracadabra”

Let’s take a deeper look at the song “Abracadabra”

# Let's take a deeper look at the song "Abracadabra"
song_circle <- lyrics_df %>%
  filter(track_title == "Abracadabra")

ggplot(song_sentiment_sent[song_sentiment_sent$track_title == "Abracadabra",], aes(line_number, ave_sentiment, fill = track_title, color = track_title)) +
  ggtitle("Song \"Abracadabra\" sentiment line plot") +
  geom_line(show.legend = F)

# analysis lyrics emotions
song_emotion <- lyrics_df %>%
  filter(track_title == "Abracadabra")%>%
  get_sentences() %>%
  emotion_by(by = c('track_title', 'words'))%>%
  as.data.frame()

2.3 Ambiguous Text in Sentiment Analysis in R

Sentiment analysis in R usually involves libraries like syuzhet, tidytext, and text2vec , but they have their limitations in handling the complexity of natural human language in that:

Subjectivity

Issue: Sentiment analysis often struggles to accurately gauge the subjectivity in a text. Subjective statements are based on personal opinions, feelings, or beliefs, whereas objective statements are factual and verifiable.

Example: “I think this movie is amazing!” vs. “This movie was released in 2020.” The first sentence is subjective and should be identified as such by sentiment analysis algorithms.

Tone

Issue: Detecting the tone of a text is crucial as it can completely change the meaning. The tone can be serious, ironic, sarcastic, playful, etc.

Example: “What a brilliant performance!” can be a genuine compliment or a sarcastic remark, depending on the tone.

Context

Issue: Words and phrases can have different meanings in different contexts. Sentiment analysis algorithms may struggle to understand context.

Example: “This is sick!” could be negative in a healthcare context but positive when referring to a skateboard trick.

Polarity

Issue: Polarity refers to identifying whether a sentiment is positive, negative, or neutral. Words can have different polarities in different situations.

Example: “This film is unpredictably shocking.” The word “shocking” can be positive (exciting) or negative (disturbing), depending on the context.

Irony and Sarcasm

Issue: Irony and sarcasm are particularly challenging as they often imply the opposite of the literal meanings of words.

Example: “Great! Another rainy day.” This might be classified as positive sentiment when, in fact, it’s likely negative due to sarcasm.

Comparisons

Issue: Comparisons can be difficult to interpret correctly because they often involve both positive and negative sentiments.

Example: “This phone has a better camera than my previous one but a much shorter battery life.” This sentence contains both positive and negative sentiments.

Neutral Language

Issue: Determining the neutrality of a statement is tricky, especially when it contains elements that could be construed as slightly positive or negative.

Example: “The movie was three hours long.” This statement is neutral but could be misinterpreted depending on the algorithm’s training data (e.g., if long movies are generally seen as negative).

# show the sentiments of a single line
kable(song_emotion[song_emotion$words == "Comin' with the bad bitch magic (yeah)",c("track_title", "words", "emotion_type", "ave_emotion")])

	track_title	words	emotion_type	ave_emotion
129	Abracadabra	Comin’ with the bad bitch magic (yeah)	anger	0.2857143
130	Abracadabra	Comin’ with the bad bitch magic (yeah)	anger_negated	0.0000000
131	Abracadabra	Comin’ with the bad bitch magic (yeah)	anticipation	0.0000000
132	Abracadabra	Comin’ with the bad bitch magic (yeah)	anticipation_negated	0.0000000
133	Abracadabra	Comin’ with the bad bitch magic (yeah)	disgust	0.2857143
134	Abracadabra	Comin’ with the bad bitch magic (yeah)	disgust_negated	0.0000000
135	Abracadabra	Comin’ with the bad bitch magic (yeah)	fear	0.2857143
136	Abracadabra	Comin’ with the bad bitch magic (yeah)	fear_negated	0.0000000
137	Abracadabra	Comin’ with the bad bitch magic (yeah)	joy	0.0000000
138	Abracadabra	Comin’ with the bad bitch magic (yeah)	joy_negated	0.0000000
139	Abracadabra	Comin’ with the bad bitch magic (yeah)	sadness	0.2857143
140	Abracadabra	Comin’ with the bad bitch magic (yeah)	sadness_negated	0.0000000
141	Abracadabra	Comin’ with the bad bitch magic (yeah)	surprise	0.0000000
142	Abracadabra	Comin’ with the bad bitch magic (yeah)	surprise_negated	0.0000000
143	Abracadabra	Comin’ with the bad bitch magic (yeah)	trust	0.0000000
144	Abracadabra	Comin’ with the bad bitch magic (yeah)	trust_negated	0.0000000

In the song “Abracadabra”, there are a lot of explicit words that may make it easy for r to extract the sentiment of the sentences.

For example, as to the line above, words like “bad” and “bitch” are clear signs in words that can easily be picked up by sentimentr and tidytext

However, tidytext is not good at understanding metaphor, ironic and sarcastic lines or lines that consist of idioms, for example:

# show the sentiments of a single line
kable(song_emotion[song_emotion$words == "Game over, put chills in your bones I told ya",c("track_title", "words", "emotion_type", "ave_emotion")])

	track_title	words	emotion_type
209	Abracadabra	Game over, put chills in your bones I told ya	anger
210	Abracadabra	Game over, put chills in your bones I told ya	anger_negated
211	Abracadabra	Game over, put chills in your bones I told ya	anticipation
212	Abracadabra	Game over, put chills in your bones I told ya	anticipation_negated
213	Abracadabra	Game over, put chills in your bones I told ya	disgust
214	Abracadabra	Game over, put chills in your bones I told ya	disgust_negated
215	Abracadabra	Game over, put chills in your bones I told ya	fear
216	Abracadabra	Game over, put chills in your bones I told ya	fear_negated
217	Abracadabra	Game over, put chills in your bones I told ya	joy
218	Abracadabra	Game over, put chills in your bones I told ya	joy_negated
219	Abracadabra	Game over, put chills in your bones I told ya	sadness
220	Abracadabra	Game over, put chills in your bones I told ya	sadness_negated
221	Abracadabra	Game over, put chills in your bones I told ya	surprise
222	Abracadabra	Game over, put chills in your bones I told ya	surprise_negated
223	Abracadabra	Game over, put chills in your bones I told ya	trust
224	Abracadabra	Game over, put chills in your bones I told ya	trust_negated

This line clearly show some kind of intimidation, “anticipation_negative”, but is not detected by sentimentr

Dealing with these issues often involves preprocessing the text data, fine-tuning sentiment analysis models, and sometimes incorporating more advanced natural language processing techniques like context-aware or deep learning models. Libraries like syuzhet, tidytext, and text2vec can be used for sentiment analysis, but they have their limitations in handling the above complexities. Understanding these challenges is crucial for interpreting the results of sentiment analysis accurately and for improving the algorithms used for this task.

That is the end of Lab 04

References

https://s-ai-f.github.io/Natural-Language-Processing/sentiment-analysis.html

https://www.tidytextmining.com/sentiment.html

https://github.com/akashrchandran/spotify-lyrics-api

https://cran.r-project.org/web/packages/syuzhet/vignettes/syuzhet-vignette.html

EAS648 Lab04

Haolin Li (haolinli@umich.edu)

2023-11-14

EAS 648 Lab 04: Spatial Temporal analysis in R

Part I: Utilize sentiment analysis to study a textual document

1.1 Fetch Data from Spotify

1.2 Sentiment Analysis by Songs

1.3 Sentiment Analysis by Words

Part II: Conduct an analysis of ambiguous text within the dataset

2.1 Analyse Sentiment with Sentimentr

2.2 Song “Abracadabra”

2.3 Ambiguous Text in Sentiment Analysis in R

References