Links:
Course Repository: https://github.com/HumblePasty/EAS648
Lab03 Webpage: https://humblepasty.github.io/EAS648/Lab/03/
Lab03 Repository: https://github.com/HumblePasty/EAS648/tree/master/Lab/03
Assignment
- Conduct a comprehensive exploration of the spatiotemporal nuances within the provided social media dataset. This could, for example, investigate the underlying factors contributing to the observed differences between nature and non-nature photographs, or analyze the temporal and geographic variations that might account for specific trends in photograph sharing. You are expected to employ advanced statistical and geospatial techniques to delve deeper into this phenomenon. To substantiate your findings, create compelling visual representations that effectively illustrate the spatiotemporal dynamics at play. Provide a robust argument based on your analytical insights.
- Social media data include various sources of noise related to the frequency of sharing photographs that can obscure meaningful patterns. Your task is to develop a technique for noise reduction that surpasses basic filtering methods. Additionally, generate a visually engaging GIF that elucidates the spatiotemporal dynamics within the data. Alongside this, propose a hypothesis that elucidates the observed patterns based on your knowledge of the region and geographic processes. Your hypothesis should reflect your understanding of the factors of social media sharing and how they influence the temporal and spatial aspects of photograph sharing.
Haolin’s Note:
For this assignment, I’ll complete the task in 2 parts:
- Part I: Spatiotemporal exploration of the Flikr dataset
- Part II: Explore a method for reducing niose for social media data
# Loading libraries
library(readr)
library(ggplot2)
library(ggthemes)
library(gganimate)
library(foreign)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(rnaturalearth)
## Support for Spatial objects (`sp`) will be deprecated in {rnaturalearth} and will be removed in a future release of the package. Please use `sf` objects with {rnaturalearth}. For example: `ne_download(returnclass = 'sf')`
library(rnaturalearthdata)
##
## Attaching package: 'rnaturalearthdata'
## The following object is masked from 'package:rnaturalearth':
##
## countries110
library(gifski)
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(wesanderson)
library(ggpmisc)
## Loading required package: ggpp
## Registered S3 methods overwritten by 'ggpp':
## method from
## heightDetails.titleGrob ggplot2
## widthDetails.titleGrob ggplot2
##
## Attaching package: 'ggpp'
## The following object is masked from 'package:ggplot2':
##
## annotate
## Registered S3 method overwritten by 'ggpmisc':
## method from
## as.character.polynomial polynom
##
## Attaching package: 'ggpmisc'
## The following object is masked from 'package:ggpp':
##
## quadrant_example.df
# Loading the data:
MichFlickr <- read.csv("AdvLab3prj/MichiganFlickr.csv")
# Add columns
MichFlickr$count<- 1
# Is nature threshold: 0.6
MichFlickr$Nature<- MichFlickr$predict_Na > 0.6
head(MichFlickr)
## X fid cat field_1_1 id
## 1 1 1 58 58 101778
## 2 2 2 59 59 101779
## 3 3 3 60 60 101780
## 4 4 4 61 61 101781
## 5 5 5 62 62 101782
## 6 6 6 63 63 101783
## url dateupload latitude
## 1 https://farm1.staticflickr.com/1/101778_37e4536da6_m.jpg 1090510921 45.08638
## 2 https://farm1.staticflickr.com/1/101779_562306283b_m.jpg 1090510923 45.08633
## 3 https://farm1.staticflickr.com/1/101780_a04f7cfa78_m.jpg 1090510926 45.08633
## 4 https://farm1.staticflickr.com/1/101781_d0fc02c3bf_m.jpg 1090510930 45.08633
## 5 https://farm1.staticflickr.com/1/101782_7db5046b1f_m.jpg 1090510932 45.08633
## 6 https://farm1.staticflickr.com/1/101783_e4ec3882ff_m.jpg 1090510936 45.08633
## longitude owner title
## 1 -84.17036 44124269523@N01 Roaring Fire
## 2 -84.17069 44124269523@N01 Geese Reflected
## 3 -84.17069 44124269523@N01 The Night Sky
## 4 -84.17069 44124269523@N01 Sunset #1
## 5 -84.17069 44124269523@N01 Sunset #2
## 6 -84.17069 44124269523@N01 Sunset #3
## url_sq predict_Na Landuse
## 1 https://farm1.staticflickr.com/1/101778_37e4536da6_s.jpg 0.3283439 41
## 2 https://farm1.staticflickr.com/1/101779_562306283b_s.jpg 0.9205280 41
## 3 https://farm1.staticflickr.com/1/101780_a04f7cfa78_s.jpg 0.9658680 41
## 4 https://farm1.staticflickr.com/1/101781_d0fc02c3bf_s.jpg 0.9744647 41
## 5 https://farm1.staticflickr.com/1/101782_7db5046b1f_s.jpg 0.9365814 41
## 6 https://farm1.staticflickr.com/1/101783_e4ec3882ff_s.jpg 0.9627538 41
## count Nature
## 1 1 FALSE
## 2 1 TRUE
## 3 1 TRUE
## 4 1 TRUE
## 5 1 TRUE
## 6 1 TRUE
# Get Michigan Map data
states <- map_data("state")
mich <- subset(states, region == "michigan")
mich_county <- subset(map_data("county"), region == "michigan")
# Plot Michigan with Counties
mich =
ggplot(data = mich, mapping = aes(x = long, y = lat, group = group)) +
coord_fixed(1.3) +
geom_polygon(color = "black", fill = "orange")
mich + theme_void() +
geom_polygon(data = mich_county, fill = NA, color = "white") + # add couties border
geom_polygon(color = "black", fill = NA) # get the state border back on top
Lab Instruction used data in 2010 and did not find any significant pattern. Here I try to use data in other time (year 2017).
# Process the data
MichFlickr$date <- as.Date(as.POSIXct(MichFlickr$dateupload, origin="1970-01-01"))
#min <- as.Date("2008-01-01")
#max <- as.Date("2015-01-01")
animateMich <- MichFlickr %>%
filter( date >= as.Date('2017-01-01') & date <= as.Date('2017-12-31'))
Plot time series
p1 <-mich + theme_void() +
geom_polygon(data = mich_county, fill = NA, color = "white") +
geom_polygon(color = "black", fill = NA) +
geom_point(data = animateMich, aes(longitude, latitude), inherit.aes = FALSE) +
labs(title = 'Date: {format(frame_time, "%b %d %Y")}') +
transition_time(date)
animate(p1 + shadow_wake(0.1), fps=2)
Filter for natural photos
animateMichNature <-animateMich %>%
filter(date >= as.Date('2017-01-01') & date <= as.Date('2017-12-31') & Nature == "TRUE")
p2 <-mich + theme_void() +
geom_polygon(data = mich_county, fill = NA, color = "white") +
geom_polygon(color = "black", fill = NA) +
geom_point(data = animateMichNature, aes(longitude, latitude), inherit.aes = FALSE) +
labs(title = 'Natural Photos\nDate: {format(frame_time, "%b %d %Y")}') +
transition_time(date)
animate(p2 + shadow_wake(0.1), fps=2)
animateMichNature <-animateMich %>%
filter(date >= as.Date('2017-01-01') & date <= as.Date('2017-12-31') & Nature == "FALSE")
p3 <-mich + theme_void() +
geom_polygon(data = mich_county, fill = NA, color = "white") +
geom_polygon(color = "black", fill = NA) +
geom_point(data = animateMichNature, aes(longitude, latitude), inherit.aes = FALSE) +
labs(title = 'Non-Natural Photos\nDate: {format(frame_time, "%b %d %Y")}') +
transition_time(date)
animate(p3 + shadow_wake(0.1), fps=2)
As analysed in the lab instruction, photo activity is usually highest in summer seasons
MichFlickr$year <- year(MichFlickr$date)
MichFlickr$month <- month(MichFlickr$date, label = TRUE)
MichFlickr$day <- day(MichFlickr$date)
# construct daily data
daily_photography <- MichFlickr %>%
mutate(day = as.Date(date, format="%Y-%m-%d")) %>%
group_by(day) %>% # group by the day column
summarise(total_photos=sum(count)) %>% # calculate the SUM of all precipitation that occurred on each day
na.omit()
# construct monthly data
daily_monthly <- daily_photography %>%
mutate(month = month(ymd(daily_photography$day), label = TRUE, abbr = FALSE),
year = year(as.Date(day, format = "%Y-%m-%d"))) %>%
group_by(year,month) %>%
summarise(total.qty = sum(total_photos))
## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.
# plot the lines
daily_monthly %>%
filter(year > 2004) %>%
ggplot(aes(x = month, y = total.qty, group = year)) +
geom_line(aes(color = as.factor(year))) +
scale_color_discrete() +
labs(title = "Total Flickr Photographs for Michigan", x = "", y = "Total Photographs",
subtitle = "Activity is highest for the summer months") +
theme_classic()
With this result, I wonder if this trend have some spatial pattern, which is:
Will photos taken in summer distribute more in certain area(s)?
For example, if people are more likely to come to norther Michigan in summer to take photos of the beatiful scenery of the forests, this should be shown on the map.
Hence the hypothesis:
People tend to come to northern Michigan to capture beatiful scenery. Summer photos will distribute more in northern parts of Michigan, and photos taken in summer will have higher percentage of natural photos than those taken in winter.
# select photos taken in the summer
MichFlickr_summer = MichFlickr %>%
mutate(season = case_when(
month(date) %in% c(3,4,5) ~ "spring",
month(date) %in% c(6,7,8) ~ "summer",
month(date) %in% c(9,10,11) ~ "fall",
TRUE ~ "winter"
)) %>%
filter(season == "summer")
# select photos in the winter for comparison
MichFlickr_winter = MichFlickr %>%
mutate(season = case_when(
month(date) %in% c(3,4,5) ~ "spring",
month(date) %in% c(6,7,8) ~ "summer",
month(date) %in% c(9,10,11) ~ "fall",
TRUE ~ "winter"
)) %>%
filter(season == "winter")
head(MichFlickr_summer)
## X fid cat field_1_1 id
## 1 1 1 58 58 101778
## 2 2 2 59 59 101779
## 3 3 3 60 60 101780
## 4 4 4 61 61 101781
## 5 5 5 62 62 101782
## 6 6 6 63 63 101783
## url dateupload latitude
## 1 https://farm1.staticflickr.com/1/101778_37e4536da6_m.jpg 1090510921 45.08638
## 2 https://farm1.staticflickr.com/1/101779_562306283b_m.jpg 1090510923 45.08633
## 3 https://farm1.staticflickr.com/1/101780_a04f7cfa78_m.jpg 1090510926 45.08633
## 4 https://farm1.staticflickr.com/1/101781_d0fc02c3bf_m.jpg 1090510930 45.08633
## 5 https://farm1.staticflickr.com/1/101782_7db5046b1f_m.jpg 1090510932 45.08633
## 6 https://farm1.staticflickr.com/1/101783_e4ec3882ff_m.jpg 1090510936 45.08633
## longitude owner title
## 1 -84.17036 44124269523@N01 Roaring Fire
## 2 -84.17069 44124269523@N01 Geese Reflected
## 3 -84.17069 44124269523@N01 The Night Sky
## 4 -84.17069 44124269523@N01 Sunset #1
## 5 -84.17069 44124269523@N01 Sunset #2
## 6 -84.17069 44124269523@N01 Sunset #3
## url_sq predict_Na Landuse
## 1 https://farm1.staticflickr.com/1/101778_37e4536da6_s.jpg 0.3283439 41
## 2 https://farm1.staticflickr.com/1/101779_562306283b_s.jpg 0.9205280 41
## 3 https://farm1.staticflickr.com/1/101780_a04f7cfa78_s.jpg 0.9658680 41
## 4 https://farm1.staticflickr.com/1/101781_d0fc02c3bf_s.jpg 0.9744647 41
## 5 https://farm1.staticflickr.com/1/101782_7db5046b1f_s.jpg 0.9365814 41
## 6 https://farm1.staticflickr.com/1/101783_e4ec3882ff_s.jpg 0.9627538 41
## count Nature date year month day season
## 1 1 FALSE 2004-07-22 2004 7月 22 summer
## 2 1 TRUE 2004-07-22 2004 7月 22 summer
## 3 1 TRUE 2004-07-22 2004 7月 22 summer
## 4 1 TRUE 2004-07-22 2004 7月 22 summer
## 5 1 TRUE 2004-07-22 2004 7月 22 summer
## 6 1 TRUE 2004-07-22 2004 7月 22 summer
# Plot the data in summer
animateMichSummer <-MichFlickr_summer %>%
filter(date >= as.Date('2017-01-01') & date <= as.Date('2017-12-31') & Nature == "TRUE")
p4 <-mich + theme_void() +
geom_polygon(data = mich_county, fill = NA, color = "white") +
geom_polygon(color = "black", fill = NA) +
geom_point(data = animateMichSummer, aes(longitude, latitude), inherit.aes = FALSE) +
labs(title = 'Natural Photos Taken in Summer, 2017\nDate: {format(frame_time, "%b %d %Y")}') +
transition_time(date)
animate(p4 + shadow_wake(0.1), fps=2)
# Repeat for winter
animateMichWinter <-MichFlickr_winter %>%
filter(date >= as.Date('2018-01-01') & date <= as.Date('2018-12-31') & Nature == "TRUE")
p5 <-mich + theme_void() +
geom_polygon(data = mich_county, fill = NA, color = "white") +
geom_polygon(color = "black", fill = NA) +
geom_point(data = animateMichWinter, aes(longitude, latitude), inherit.aes = FALSE) +
labs(title = 'Natural Photos Taken in Winter, 2018\nDate: {format(frame_time, "%b %d %Y")}') +
transition_time(date)
animate(p5 + shadow_wake(0.1), fps=2)
Here I go with the simplest way: directly compare the mean latitude and mean logitude of the two photo sets
# Compare the mean latitudes
mean_lat_summer = mean(MichFlickr_summer$latitude)
mean_lat_winter = mean(MichFlickr_winter$latitude)
mean_lat_summer
## [1] 43.23381
mean_lat_winter
## [1] 42.90469
# Compare the percentage of natural photos:
percent_nat_summer = nrow(filter(MichFlickr_summer, Nature == "TRUE")) / nrow(filter(MichFlickr_summer, Nature == "FALSE"))
percent_nat_winter = nrow(filter(MichFlickr_winter, Nature == "TRUE")) / nrow(filter(MichFlickr_winter, Nature == "FALSE"))
percent_nat_summer
## [1] 0.3248652
percent_nat_winter
## [1] 0.1786464
As we can see with the result, photos taken in summer have higher mean latitude and higher percentage to be natural photos. Thus our hypothesis is supported.
TBD: More advanced analysis, such as spatial clustering of the data points, and compare the different clustering result (for example the centers of the clusters) of two photo sets
Task:
Social media data include various sources of noise related to the frequency of sharing photographs that can obscure meaningful patterns. Your task is to develop a technique for noise reduction that surpasses basic filtering methods. Additionally, generate a visually engaging GIF that elucidates the spatiotemporal dynamics within the data. Alongside this, propose a hypothesis that elucidates the observed patterns based on your knowledge of the region and geographic processes. Your hypothesis should reflect your understanding of the factors of social media sharing and how they influence the temporal and spatial aspects of photograph sharing.
Feedback:
With all due respect, this task seems to be too vague and vast for a lab assignment. A little bit more clearification and instruction (such as giving paper for reference, specifying the tool packs) is REALLY helpful for us.
To solve this task, first need to define what is noise data.
The common cause of noise data by user may due to repeatedly uploading photos taken in similar areas within a short period of time.
With this definiton, we can use filter function to eliminate the repeated data.
#
# Create new columns with rounded lat and lng
MichFlickr = MichFlickr %>%
mutate(lat_rnd = round(latitude, 3), lng_rnd = round(longitude, 3))
MichFlickr_denoised = MichFlickr %>%
distinct(owner, date, lat_rnd, lng_rnd, .keep_all = TRUE)
animateMichDenoised <-MichFlickr_denoised %>%
filter(date >= as.Date('2017-01-01') & date <= as.Date('2017-12-31') & Nature == "TRUE")
p6 <-mich + theme_void() +
geom_polygon(data = mich_county, fill = NA, color = "white") +
geom_polygon(color = "black", fill = NA) +
geom_point(data = animateMichDenoised, aes(longitude, latitude), inherit.aes = FALSE) +
labs(title = 'Denoised Photos in year 2017\nDate: {format(frame_time, "%b %d %Y")}') +
transition_time(date)
animate(p6 + shadow_wake(0.1), fps=2)