EAS 648 Lab 03: Spatial Temporal analysis in R

Links:

Course Repository: https://github.com/HumblePasty/EAS648

Lab03 Webpage: https://humblepasty.github.io/EAS648/Lab/03/

Lab03 Repository: https://github.com/HumblePasty/EAS648/tree/master/Lab/03

Assignment

  1. Conduct a comprehensive exploration of the spatiotemporal nuances within the provided social media dataset. This could, for example, investigate the underlying factors contributing to the observed differences between nature and non-nature photographs, or analyze the temporal and geographic variations that might account for specific trends in photograph sharing. You are expected to employ advanced statistical and geospatial techniques to delve deeper into this phenomenon. To substantiate your findings, create compelling visual representations that effectively illustrate the spatiotemporal dynamics at play. Provide a robust argument based on your analytical insights.
  2. Social media data include various sources of noise related to the frequency of sharing photographs that can obscure meaningful patterns. Your task is to develop a technique for noise reduction that surpasses basic filtering methods. Additionally, generate a visually engaging GIF that elucidates the spatiotemporal dynamics within the data. Alongside this, propose a hypothesis that elucidates the observed patterns based on your knowledge of the region and geographic processes. Your hypothesis should reflect your understanding of the factors of social media sharing and how they influence the temporal and spatial aspects of photograph sharing.

Haolin’s Note:

For this assignment, I’ll complete the task in 2 parts:

  • Part I: Spatiotemporal exploration of the Flikr dataset
  • Part II: Explore a method for reducing niose for social media data

Part I: Spatiotemporal Exploration of MichiFlikr

# Loading libraries
library(readr)
library(ggplot2)
library(ggthemes)
library(gganimate)
library(foreign)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(rnaturalearth)
## Support for Spatial objects (`sp`) will be deprecated in {rnaturalearth} and will be removed in a future release of the package. Please use `sf` objects with {rnaturalearth}. For example: `ne_download(returnclass = 'sf')`
library(rnaturalearthdata)
## 
## Attaching package: 'rnaturalearthdata'
## The following object is masked from 'package:rnaturalearth':
## 
##     countries110
library(gifski)
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(wesanderson)
library(ggpmisc)
## Loading required package: ggpp
## Registered S3 methods overwritten by 'ggpp':
##   method                  from   
##   heightDetails.titleGrob ggplot2
##   widthDetails.titleGrob  ggplot2
## 
## Attaching package: 'ggpp'
## The following object is masked from 'package:ggplot2':
## 
##     annotate
## Registered S3 method overwritten by 'ggpmisc':
##   method                  from   
##   as.character.polynomial polynom
## 
## Attaching package: 'ggpmisc'
## The following object is masked from 'package:ggpp':
## 
##     quadrant_example.df

Loading MichiganFlikr

# Loading the data:
MichFlickr <- read.csv("AdvLab3prj/MichiganFlickr.csv")

# Add columns
MichFlickr$count<- 1
# Is nature threshold: 0.6
MichFlickr$Nature<- MichFlickr$predict_Na > 0.6

head(MichFlickr)
##   X fid cat field_1_1     id
## 1 1   1  58        58 101778
## 2 2   2  59        59 101779
## 3 3   3  60        60 101780
## 4 4   4  61        61 101781
## 5 5   5  62        62 101782
## 6 6   6  63        63 101783
##                                                        url dateupload latitude
## 1 https://farm1.staticflickr.com/1/101778_37e4536da6_m.jpg 1090510921 45.08638
## 2 https://farm1.staticflickr.com/1/101779_562306283b_m.jpg 1090510923 45.08633
## 3 https://farm1.staticflickr.com/1/101780_a04f7cfa78_m.jpg 1090510926 45.08633
## 4 https://farm1.staticflickr.com/1/101781_d0fc02c3bf_m.jpg 1090510930 45.08633
## 5 https://farm1.staticflickr.com/1/101782_7db5046b1f_m.jpg 1090510932 45.08633
## 6 https://farm1.staticflickr.com/1/101783_e4ec3882ff_m.jpg 1090510936 45.08633
##   longitude           owner           title
## 1 -84.17036 44124269523@N01    Roaring Fire
## 2 -84.17069 44124269523@N01 Geese Reflected
## 3 -84.17069 44124269523@N01   The Night Sky
## 4 -84.17069 44124269523@N01       Sunset #1
## 5 -84.17069 44124269523@N01       Sunset #2
## 6 -84.17069 44124269523@N01       Sunset #3
##                                                     url_sq predict_Na Landuse
## 1 https://farm1.staticflickr.com/1/101778_37e4536da6_s.jpg  0.3283439      41
## 2 https://farm1.staticflickr.com/1/101779_562306283b_s.jpg  0.9205280      41
## 3 https://farm1.staticflickr.com/1/101780_a04f7cfa78_s.jpg  0.9658680      41
## 4 https://farm1.staticflickr.com/1/101781_d0fc02c3bf_s.jpg  0.9744647      41
## 5 https://farm1.staticflickr.com/1/101782_7db5046b1f_s.jpg  0.9365814      41
## 6 https://farm1.staticflickr.com/1/101783_e4ec3882ff_s.jpg  0.9627538      41
##   count Nature
## 1     1  FALSE
## 2     1   TRUE
## 3     1   TRUE
## 4     1   TRUE
## 5     1   TRUE
## 6     1   TRUE

Plot Michigan with Counties

# Get Michigan Map data
states <- map_data("state")
mich <- subset(states, region == "michigan")
mich_county <- subset(map_data("county"), region == "michigan")

# Plot Michigan with Counties
mich = 
  ggplot(data = mich, mapping = aes(x = long, y = lat, group = group)) +
  coord_fixed(1.3) + 
  geom_polygon(color = "black", fill = "orange")

mich + theme_void() +
  geom_polygon(data = mich_county, fill = NA, color = "white") + # add couties border
  geom_polygon(color = "black", fill = NA) # get the state border back on top

Plot Time-Serie Flikr Data

Lab Instruction used data in 2010 and did not find any significant pattern. Here I try to use data in other time (year 2017).

# Process the data
MichFlickr$date <- as.Date(as.POSIXct(MichFlickr$dateupload, origin="1970-01-01"))
#min <- as.Date("2008-01-01")
#max  <- as.Date("2015-01-01")
animateMich <- MichFlickr %>% 
  filter( date >= as.Date('2017-01-01') & date <= as.Date('2017-12-31'))

Plot time series

p1 <-mich + theme_void() + 
        geom_polygon(data = mich_county, fill = NA, color = "white") +
        geom_polygon(color = "black", fill = NA) + 
        
       geom_point(data = animateMich, aes(longitude, latitude), inherit.aes = FALSE) +
        labs(title = 'Date: {format(frame_time, "%b %d %Y")}') +
        transition_time(date)

animate(p1 + shadow_wake(0.1), fps=2)

Filter for natural photos

animateMichNature <-animateMich %>% 
filter(date >= as.Date('2017-01-01') & date <= as.Date('2017-12-31') & Nature == "TRUE") 

p2 <-mich + theme_void() + 
        geom_polygon(data = mich_county, fill = NA, color = "white") +
        geom_polygon(color = "black", fill = NA) + 
        geom_point(data = animateMichNature, aes(longitude, latitude), inherit.aes = FALSE) +
        labs(title = 'Natural Photos\nDate: {format(frame_time, "%b %d %Y")}') +
        transition_time(date)

animate(p2 + shadow_wake(0.1), fps=2)

Filter non-natural photos

animateMichNature <-animateMich %>% 
filter(date >= as.Date('2017-01-01') & date <= as.Date('2017-12-31') & Nature == "FALSE") 

p3 <-mich + theme_void() + 
        geom_polygon(data = mich_county, fill = NA, color = "white") +
        geom_polygon(color = "black", fill = NA) + 
        geom_point(data = animateMichNature, aes(longitude, latitude), inherit.aes = FALSE) +
        labs(title = 'Non-Natural Photos\nDate: {format(frame_time, "%b %d %Y")}') +
        transition_time(date) 

animate(p3 + shadow_wake(0.1), fps=2)

Time and Spatial Analysis

As analysed in the lab instruction, photo activity is usually highest in summer seasons

MichFlickr$year <- year(MichFlickr$date)
MichFlickr$month <- month(MichFlickr$date, label = TRUE)
MichFlickr$day <- day(MichFlickr$date)

# construct daily data
daily_photography <- MichFlickr %>%
  mutate(day = as.Date(date, format="%Y-%m-%d")) %>%
  group_by(day) %>% # group by the day column
  summarise(total_photos=sum(count)) %>%  # calculate the SUM of all precipitation that occurred on each day
  na.omit()

# construct monthly data
daily_monthly <- daily_photography %>%
    mutate(month =  month(ymd(daily_photography$day), label = TRUE, abbr = FALSE),
           year  = year(as.Date(day, format = "%Y-%m-%d"))) %>%
    group_by(year,month) %>%
    summarise(total.qty = sum(total_photos))
## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.
# plot the lines
daily_monthly  %>%
    filter(year > 2004)  %>% 
    ggplot(aes(x = month, y = total.qty, group = year)) +
    geom_line(aes(color = as.factor(year))) +
    scale_color_discrete() + 
    labs(title = "Total Flickr Photographs for Michigan", x = "", y = "Total Photographs",
         subtitle = "Activity is highest for the summer months") +
   theme_classic()

With this result, I wonder if this trend have some spatial pattern, which is:

Will photos taken in summer distribute more in certain area(s)?

For example, if people are more likely to come to norther Michigan in summer to take photos of the beatiful scenery of the forests, this should be shown on the map.

Hence the hypothesis:

People tend to come to northern Michigan to capture beatiful scenery. Summer photos will distribute more in northern parts of Michigan, and photos taken in summer will have higher percentage of natural photos than those taken in winter.

Testing the hypothesis:

STEP 1: Process the data

# select photos taken in the summer
MichFlickr_summer = MichFlickr %>%
  mutate(season = case_when(
      month(date) %in% c(3,4,5) ~ "spring",
      month(date) %in% c(6,7,8) ~ "summer",
      month(date) %in% c(9,10,11) ~ "fall",
      TRUE ~ "winter"
    )) %>%
  filter(season == "summer")

# select photos in the winter for comparison
MichFlickr_winter = MichFlickr %>%
  mutate(season = case_when(
      month(date) %in% c(3,4,5) ~ "spring",
      month(date) %in% c(6,7,8) ~ "summer",
      month(date) %in% c(9,10,11) ~ "fall",
      TRUE ~ "winter"
    )) %>%
  filter(season == "winter")

head(MichFlickr_summer)
##   X fid cat field_1_1     id
## 1 1   1  58        58 101778
## 2 2   2  59        59 101779
## 3 3   3  60        60 101780
## 4 4   4  61        61 101781
## 5 5   5  62        62 101782
## 6 6   6  63        63 101783
##                                                        url dateupload latitude
## 1 https://farm1.staticflickr.com/1/101778_37e4536da6_m.jpg 1090510921 45.08638
## 2 https://farm1.staticflickr.com/1/101779_562306283b_m.jpg 1090510923 45.08633
## 3 https://farm1.staticflickr.com/1/101780_a04f7cfa78_m.jpg 1090510926 45.08633
## 4 https://farm1.staticflickr.com/1/101781_d0fc02c3bf_m.jpg 1090510930 45.08633
## 5 https://farm1.staticflickr.com/1/101782_7db5046b1f_m.jpg 1090510932 45.08633
## 6 https://farm1.staticflickr.com/1/101783_e4ec3882ff_m.jpg 1090510936 45.08633
##   longitude           owner           title
## 1 -84.17036 44124269523@N01    Roaring Fire
## 2 -84.17069 44124269523@N01 Geese Reflected
## 3 -84.17069 44124269523@N01   The Night Sky
## 4 -84.17069 44124269523@N01       Sunset #1
## 5 -84.17069 44124269523@N01       Sunset #2
## 6 -84.17069 44124269523@N01       Sunset #3
##                                                     url_sq predict_Na Landuse
## 1 https://farm1.staticflickr.com/1/101778_37e4536da6_s.jpg  0.3283439      41
## 2 https://farm1.staticflickr.com/1/101779_562306283b_s.jpg  0.9205280      41
## 3 https://farm1.staticflickr.com/1/101780_a04f7cfa78_s.jpg  0.9658680      41
## 4 https://farm1.staticflickr.com/1/101781_d0fc02c3bf_s.jpg  0.9744647      41
## 5 https://farm1.staticflickr.com/1/101782_7db5046b1f_s.jpg  0.9365814      41
## 6 https://farm1.staticflickr.com/1/101783_e4ec3882ff_s.jpg  0.9627538      41
##   count Nature       date year month day season
## 1     1  FALSE 2004-07-22 2004   7月  22 summer
## 2     1   TRUE 2004-07-22 2004   7月  22 summer
## 3     1   TRUE 2004-07-22 2004   7月  22 summer
## 4     1   TRUE 2004-07-22 2004   7月  22 summer
## 5     1   TRUE 2004-07-22 2004   7月  22 summer
## 6     1   TRUE 2004-07-22 2004   7月  22 summer

STEP 2: Plot the data points in animation

# Plot the data in summer
animateMichSummer <-MichFlickr_summer %>% 
  filter(date >= as.Date('2017-01-01') & date <= as.Date('2017-12-31') & Nature == "TRUE")

p4 <-mich + theme_void() + 
        geom_polygon(data = mich_county, fill = NA, color = "white") +
        geom_polygon(color = "black", fill = NA) + 
        
       geom_point(data = animateMichSummer, aes(longitude, latitude), inherit.aes = FALSE) +
        labs(title = 'Natural Photos Taken in Summer, 2017\nDate: {format(frame_time, "%b %d %Y")}') +
        transition_time(date)

animate(p4 + shadow_wake(0.1), fps=2)

# Repeat for winter
animateMichWinter <-MichFlickr_winter %>% 
  filter(date >= as.Date('2018-01-01') & date <= as.Date('2018-12-31') & Nature == "TRUE")

p5 <-mich + theme_void() + 
        geom_polygon(data = mich_county, fill = NA, color = "white") +
        geom_polygon(color = "black", fill = NA) + 
        
       geom_point(data = animateMichWinter, aes(longitude, latitude), inherit.aes = FALSE) +
        labs(title = 'Natural Photos Taken in Winter, 2018\nDate: {format(frame_time, "%b %d %Y")}') +
        transition_time(date)

animate(p5 + shadow_wake(0.1), fps=2)

STEP 3: Analyse the spatial trend

Here I go with the simplest way: directly compare the mean latitude and mean logitude of the two photo sets

# Compare the mean latitudes
mean_lat_summer = mean(MichFlickr_summer$latitude)
mean_lat_winter = mean(MichFlickr_winter$latitude)
mean_lat_summer
## [1] 43.23381
mean_lat_winter
## [1] 42.90469
# Compare the percentage of natural photos:
percent_nat_summer = nrow(filter(MichFlickr_summer, Nature == "TRUE")) / nrow(filter(MichFlickr_summer, Nature == "FALSE"))
percent_nat_winter = nrow(filter(MichFlickr_winter, Nature == "TRUE")) / nrow(filter(MichFlickr_winter, Nature == "FALSE"))
percent_nat_summer
## [1] 0.3248652
percent_nat_winter
## [1] 0.1786464

STEP 4: Discussion

As we can see with the result, photos taken in summer have higher mean latitude and higher percentage to be natural photos. Thus our hypothesis is supported.

TBD: More advanced analysis, such as spatial clustering of the data points, and compare the different clustering result (for example the centers of the clusters) of two photo sets

Part II: Noise Reduction Method

Task:

Social media data include various sources of noise related to the frequency of sharing photographs that can obscure meaningful patterns. Your task is to develop a technique for noise reduction that surpasses basic filtering methods. Additionally, generate a visually engaging GIF that elucidates the spatiotemporal dynamics within the data. Alongside this, propose a hypothesis that elucidates the observed patterns based on your knowledge of the region and geographic processes. Your hypothesis should reflect your understanding of the factors of social media sharing and how they influence the temporal and spatial aspects of photograph sharing.

Feedback:

With all due respect, this task seems to be too vague and vast for a lab assignment. A little bit more clearification and instruction (such as giving paper for reference, specifying the tool packs) is REALLY helpful for us.

Discussion

To solve this task, first need to define what is noise data.

The common cause of noise data by user may due to repeatedly uploading photos taken in similar areas within a short period of time.

With this definiton, we can use filter function to eliminate the repeated data.

Implementation

# 

# Create new columns with rounded lat and lng
MichFlickr = MichFlickr %>%
  mutate(lat_rnd = round(latitude, 3), lng_rnd = round(longitude, 3))

MichFlickr_denoised = MichFlickr %>%
  distinct(owner, date, lat_rnd, lng_rnd, .keep_all = TRUE)

Plot the denoised data

animateMichDenoised <-MichFlickr_denoised %>% 
  filter(date >= as.Date('2017-01-01') & date <= as.Date('2017-12-31') & Nature == "TRUE")

p6 <-mich + theme_void() + 
        geom_polygon(data = mich_county, fill = NA, color = "white") +
        geom_polygon(color = "black", fill = NA) + 
        
       geom_point(data = animateMichDenoised, aes(longitude, latitude), inherit.aes = FALSE) +
        labs(title = 'Denoised Photos in year 2017\nDate: {format(frame_time, "%b %d %Y")}') +
        transition_time(date)

animate(p6 + shadow_wake(0.1), fps=2)