Passer au contenu

In this chapter, we’ll finally start exploring our data by summarizing, plotting and mapping.

Once you have clarified any possible ambiguous tags and removed false positives, you are ready to start analyzing your clean data set. This chapter will walk you through some simple procedures to start working with and visualizing the clean sample data set; you can modify these scripts to work with your own data. For a more in-depth R tutorial we strongly recommend working through R for Data Science by Garrett Grolemund and Hadley Wickham.

Load required packages

Follow the instructions in Chapter 2 to install the following packages before loading, if you haven’t already done so.

Load data

If you followed along with the the previous Chapter (Chapter 5 - Data cleaning) and are working with the cleaned df.alltags.sub file, you can skip this step and go directly to Summarizing your data.

Otherwise, if you saved your data as an RDS file, you can load it using:

df.alltags.sub <- readRDS("./data/dfAlltagsSub.rds") # change dir to local directory

Or, if you’ve applied a custom filter to your .motus file, you can load the previously downloaded sample Motus data (see Chapter 3) and clean it now. Currently the main benefit of using the custom filter is that you apply the filter to the .motus file, which allows you more flexibility in applying dplyr functions to manage and filter the data (e.g., you can select different variables to include in the data than we included in the RDS file in Chapter 5 - Data cleaning). This approach also allows you to more readily integrate new data added to your database with the tagme() function. Because we are selecting the same variables and filtering the same records, the following gives you the same dataset as the readRDS() statement above:

# load the .motus file (remember 'motus.sample' is both username and password)
sql.motus <- tagme(176, update = TRUE, dir = "./data/")
## Checking for new data in project 176
## Updating metadata
## activity:     1 new batch records to check
## batchID  1977125 (#     1 of      1): got    156 activity records
## Downloaded 156 activity records
## nodeData:     0 new batch records to check
## Fetching deprecated batches
## Total deprecated batches: 6
## New deprecated batches: 0
tbl.alltags <- tbl(sql.motus, "alltagsGPS")

# obtain a table object of the filter
tbl.filter <- getRunsFilters(sql.motus, "filtAmbigFalsePos")

# filter and convert the table into a dataframe, with a few modications
df.alltags.sub <- left_join(tbl.alltags, tbl.filter, by = c("runID", "motusTagID")) %>%
  mutate(probability = ifelse(is.na(probability), 1, probability)) %>%
  filter(probability > 0) %>%
  select(-noise, -slop, -burstSlop, -done, -bootnum, -codeSet, 
         -mfg, -nomFreq, -markerNumber, -markerType, -tagDepComments, 
         -fullID, -deviceID, -recvDeployAlt, 
         -speciesGroup, -gpsLat, -gpsLon, -recvSiteName) %>%
  collect() %>%
  mutate(time = as_datetime(ts),  # work with times AFTER transforming to flat file
         tagDeployStart = as_datetime(tagDeployStart),
         tagDeployEnd = as_datetime(tagDeployEnd),
         recvDeployName = if_else(is.na(recvDeployName), 
                                  paste(recvDeployLat, recvDeployLon, sep=":"), 
                                  recvDeployName))

Note that if your project is very large, you may want to convert only a portion of it to the dataframe, to avoid memory issues.

Details on filtering the tbl prior to collecting as a dataframe are available in Converting to flat data in Chapter 3.

Here we do so by adding a filter to the above command, in this case, only creating a dataframe for motusTagID 16047, but you can decide how to best subset your data based on your need (e.g. by species or year):

# create a subset for a single tag, to keep the dataframe small
df.alltags.16047 <- filter(df.alltags.sub, motusTagID == 16047) 

Summarizing your data

Here we will run through some basic commands, starting with the summary() function to view a selection of variables in a data frame:

sql.motus %>% 
  tbl("alltags") %>% 
  select(ts, motusTagID, runLen, speciesEN, tagDepLat, tagDepLon, 
         recvDeployLat, recvDeployLon) %>% 
  collect() %>%
  mutate(time = as_datetime(ts)) %>% 
  summary()
##        ts              motusTagID        runLen        speciesEN        
##  Min.   :1.438e+09   Min.   :-1953   Min.   :   2.0   Length:188354     
##  1st Qu.:1.477e+09   1st Qu.:22758   1st Qu.:   2.0   Class :character  
##  Median :1.477e+09   Median :22897   Median :  12.0   Mode  :character  
##  Mean   :1.482e+09   Mean   :22604   Mean   : 206.5                     
##  3rd Qu.:1.489e+09   3rd Qu.:22905   3rd Qu.: 153.0                     
##  Max.   :1.498e+09   Max.   :24303   Max.   :2474.0                     
##                                                                         
##    tagDepLat       tagDepLon      recvDeployLat    recvDeployLon    
##  Min.   :11.12   Min.   :-80.69   Min.   :-42.50   Min.   :-143.68  
##  1st Qu.:50.19   1st Qu.:-63.75   1st Qu.:-42.50   1st Qu.: -73.88  
##  Median :50.19   Median :-63.75   Median : 50.19   Median : -63.91  
##  Mean   :50.15   Mean   :-65.86   Mean   : 10.36   Mean   : -69.12  
##  3rd Qu.:50.19   3rd Qu.:-63.75   3rd Qu.: 50.20   3rd Qu.: -63.75  
##  Max.   :51.80   Max.   :-63.75   Max.   : 62.89   Max.   : -60.02  
##  NA's   :80949   NA's   :80949    NA's   :173      NA's   :173      
##       time                       
##  Min.   :2015-07-23 10:10:54.90  
##  1st Qu.:2016-10-19 02:00:53.27  
##  Median :2016-10-26 13:19:24.00  
##  Mean   :2016-12-15 03:37:44.72  
##  3rd Qu.:2017-03-14 08:46:57.19  
##  Max.   :2017-06-26 16:11:04.47  
## 
# same summary for the filtered sql data
df.alltags.sub %>% 
  select(time, motusTagID, runLen, speciesEN, tagDepLat, tagDepLon, 
         recvDeployLat, recvDeployLon) %>% 
  summary()
##       time                          motusTagID        runLen      
##  Min.   :2015-08-03 06:37:11.58   Min.   :-1953   Min.   :   4.0  
##  1st Qu.:2016-10-06 19:18:39.83   1st Qu.:22897   1st Qu.:  27.0  
##  Median :2016-10-09 21:50:50.03   Median :22897   Median :  97.0  
##  Mean   :2016-09-05 19:18:32.42   Mean   :22262   Mean   : 234.7  
##  3rd Qu.:2016-10-19 10:42:18.48   3rd Qu.:22897   3rd Qu.: 287.0  
##  Max.   :2017-04-20 22:33:19.25   Max.   :23316   Max.   :1371.0  
##                                                                   
##   speciesEN           tagDepLat       tagDepLon      recvDeployLat   
##  Length:48225       Min.   :50.19   Min.   :-80.69   Min.   :-42.50  
##  Class :character   1st Qu.:50.19   1st Qu.:-63.75   1st Qu.: 50.20  
##  Mode  :character   Median :50.19   Median :-63.75   Median : 50.20  
##                     Mean   :50.34   Mean   :-65.56   Mean   : 49.87  
##                     3rd Qu.:50.19   3rd Qu.:-63.75   3rd Qu.: 50.20  
##                     Max.   :51.80   Max.   :-63.75   Max.   : 51.82  
##                     NA's   :92      NA's   :92       NA's   :164     
##  recvDeployLon   
##  Min.   :-80.69  
##  1st Qu.:-63.75  
##  Median :-63.75  
##  Mean   :-65.28  
##  3rd Qu.:-63.75  
##  Max.   :-62.99  
##  NA's   :164

The dplyr package allows you to easily summarize data by group, manipulate variables, or create new variables based on your data.

We can manipulate existing variables or create new ones with dplyr’s mutate() function, here we’ll convert ts to a date/time format, then make a new variable for year and day of year (doy).

We’ll also remove the set of points with missing receiver latitude and longitudes. These may be useful in some contexts (for example if the approximate location of the receiver is known) but can cause warnings or errors when plotting.

df.alltags.sub <- df.alltags.sub %>%
  mutate(year = year(time), # extract year from time
         doy = yday(time)) %>% # extract numeric day of year from time
  filter(!is.na(recvDeployLat))

head(df.alltags.sub)
## # A tibble: 6 × 52
##     hitID runID batchID      ts tsCor…¹   sig sigsd  freq freqsd motus…² ambigID
##   <int64> <int>   <int>   <dbl>   <dbl> <dbl> <dbl> <dbl>  <dbl>   <int>   <int>
## 1   45107  8886      53  1.45e9  1.45e9    52     0     4      0   16047      NA
## 2   45108  8886      53  1.45e9  1.45e9    54     0     4      0   16047      NA
## 3   45109  8886      53  1.45e9  1.45e9    55     0     4      0   16047      NA
## 4   45110  8886      53  1.45e9  1.45e9    52     0     4      0   16047      NA
## 5   45111  8886      53  1.45e9  1.45e9    49     0     4      0   16047      NA
## 6  199885 23305      64  1.45e9  1.45e9    33     0     4      0   16047      NA
## # … with 41 more variables: port <chr>, nodeNum <chr>, runLen <int>,
## #   motusFilter <dbl>, tagProjID <int>, mfgID <chr>, tagType <chr>,
## #   tagModel <chr>, tagLifespan <dbl>, tagBI <dbl>, pulseLen <dbl>,
## #   tagDeployID <int>, speciesID <int>, tagDeployStart <dttm>,
## #   tagDeployEnd <dttm>, tagDepLat <dbl>, tagDepLon <dbl>, tagDepAlt <dbl>,
## #   tagDeployTest <int>, recvDeployID <int>, recvDeployLat <dbl>,
## #   recvDeployLon <dbl>, recv <chr>, recvDeployName <chr>, …

We can also summarize information by group, in this case motusTagID, and apply various functions to these groups such as getting the total number of detections (n) for each tag, the number of receivers each tag was detected on, the first and last detection date, and the total number of days there was at least one detection:

tagSummary <- df.alltags.sub %>%
  group_by(motusTagID) %>% 
  summarize(nDet = n(),
            nRecv = length(unique(recvDeployName)),
            timeMin = min(time),
            timeMax = max(time),
            totDay = length(unique(doy)))

head(tagSummary)
## # A tibble: 6 × 6
##   motusTagID  nDet nRecv timeMin             timeMax             totDay
##        <int> <int> <int> <dttm>              <dttm>               <int>
## 1      -1953    10     1 2017-03-18 06:21:12 2017-03-25 05:54:54      2
## 2      16011   116     1 2015-08-03 06:37:11 2015-08-05 20:41:12      3
## 3      16035   415     5 2015-08-14 17:53:49 2015-09-02 14:06:09      6
## 4      16036    62     1 2015-08-17 21:56:44 2015-08-17 21:58:52      1
## 5      16037  1278     3 2015-08-23 15:13:57 2015-09-08 18:37:16     14
## 6      16038    70     1 2015-08-20 18:42:33 2015-08-22 22:19:37      3

We can also group by multiple variables; applying the same function as above but now grouping by motusTagID and recvDeployName, we will get information for each tag detected on each receiver. Since we are grouping by recvDeployName, there will be by default only one recvDeployName in each group, thus the variable nRecv will be 1 for each row. This in not very informative, however we include this to help illustrate how grouping works:

tagRecvSummary <- df.alltags.sub %>%
  group_by(motusTagID, recvDeployName) %>% 
  summarize(nDet = n(),
            nRecv = length(unique(recvDeployName)),
            timeMin = min(time),
            timeMax = max(time),
            totDay = length(unique(doy)), .groups = "drop")

head(tagRecvSummary)
## # A tibble: 6 × 7
##   motusTagID recvDe…¹  nDet nRecv timeMin             timeMax             totDay
##        <int> <chr>    <int> <int> <dttm>              <dttm>               <int>
## 1      -1953 LLICALD…    10     1 2017-03-18 06:21:12 2017-03-25 05:54:54      2
## 2      16011 North B…   116     1 2015-08-03 06:37:11 2015-08-05 20:41:12      3
## 3      16035 Brier2      38     1 2015-09-02 14:03:19 2015-09-02 14:06:09      1
## 4      16035 D'Estim…    32     1 2015-09-02 07:58:43 2015-09-02 08:04:24      1
## 5      16035 Netitis…   274     1 2015-08-14 17:53:49 2015-09-01 21:35:32      5
## 6      16035 Southwe…    65     1 2015-09-02 13:06:13 2015-09-02 13:14:39      1
## # … with abbreviated variable name ¹​recvDeployName

Plotting your data

Plotting your data is a powerful way to visualize broad and fine-scale detection patterns. This section will give you a brief introduction to plotting using ggplot2. For more in depth information on the uses of ggplot2, we recommend the Cookbook for R, and the RStudio ggplot2 cheatsheet.

To make coarse-scale plots with large files, we suggest first rounding the detection time to the nearest hour or day so that processing time is faster. Here we round detection times to the nearest hour, then make a basic plot of hourly detections by motusTagID:

df.alltags.sub.2 <- df.alltags.sub %>%
  mutate(hour = hour(time)) %>% 
  select(motusTagID, port, tagDeployStart, tagDepLat, tagDepLon, 
         recvDeployLat, recvDeployLon, recvDeployName, antBearing, speciesEN, year, doy, hour) %>% 
  distinct()

ggplot(data = df.alltags.sub.2, aes(x = hour, y = as.factor(motusTagID))) +
  theme_bw() +
  geom_point() + 
  labs(x = "Hour", y = "MotusTagID")

Let’s focus only on tags deployed in 2016, and we can colour the tags by species:

ggplot(data = filter(df.alltags.sub.2, year(tagDeployStart) == 2016), 
       aes(x = hour, y = as.factor(motusTagID), colour = speciesEN)) +
  theme_bw() + 
  geom_point() +
  labs(x = "Hour", y = "MotusTagID") +
  scale_colour_discrete(name = "Species")

We can see how tags moved latitudinally by first ordering by hour, and colouring by motusTagID:

df.alltags.sub.2 <- arrange(df.alltags.sub.2, hour)

ggplot(data = filter(df.alltags.sub.2, year(tagDeployStart) == 2016), 
       aes(x = hour, y = recvDeployLat, col = as.factor(motusTagID), 
           group = as.factor(motusTagID))) + 
  theme_bw() + 
  geom_point() +
  geom_path() +
  labs(x = "Hour", y = "Receiver latitude") +
  scale_colour_discrete(name = "MotusTagID")

Now let’s look at more detailed plots of signal variation. We use the full df.alltags.sub dataframe so that we can get signal strength for each detection of a specific tag. Let’s examine fall 2016 detections of tag 22897 at Niapiskau; we facet the plot by deployment name, ordered by decreasing latitude:

ggplot(data = filter(df.alltags.sub, 
                     motusTagID == 22897, 
                     recvDeployName == "Niapiskau"), 
       aes(x = time, y = sig)) +
  theme_bw() + 
  geom_point() + 
  labs(x = "Time", y = "Signal strength") +
  facet_grid(recvDeployName ~ .)

We use the sunRiseSet() to get sunrise and sunset times for all detections. We then zoom in on a certain timeframe and add that information to the above plot by adding a geom_vline() statement to the code, which adds a yellow line for sunrise time, and a blue line for sunset time:

# add sunrise and sunset times to the dataframe
df.alltags.sub <- sunRiseSet(df.alltags.sub, lat = "recvDeployLat", lon = "recvDeployLon") 

ggplot(data = filter(df.alltags.sub, motusTagID == 22897,
                     time > "2016-10-11",
                     time < "2016-10-17",
                     recvDeployName == "Niapiskau"), 
       aes(x = time, y = sig)) +
  theme_bw() + 
  geom_point() + 
  labs(x = "Time of year", y = "Signal strength") +
  geom_vline(aes(xintercept = sunrise), col = "orange") + 
  geom_vline(aes(xintercept = sunset), col = "blue")

We can see that during this period, the tag was most often detected during the day, suggesting it may be actively foraging in this area during this time.

The same plots can provide valuable movement information when the receivers are ordered geographically. We do this for motusTagID 16039:

# We'll first order sitelat by latitude (for plots)
df.alltags.sub <- mutate(df.alltags.sub, 
                         recvDeployName = reorder(recvDeployName, recvDeployLat))

ggplot(data = filter(df.alltags.sub, 
                     motusTagID == 16039,
                     time < "2015-10-01"), 
       aes(x = time, y = recvDeployName)) +
  theme_bw() + 
  geom_point() + 
  labs(x = "Time of year", y = "Receiver name (ordered by latitude)")

We zoom in on a section of this plot and look at antenna bearings to see directional movement past stations:

ggplot(data = filter(df.alltags.sub, motusTagID == 16039, 
                     time > "2015-09-14", 
                     time < "2015-10-01"), 
       aes(x = time, y = sig, col = as.factor(antBearing))) +
  theme_bw() + 
  geom_point() + 
  labs(x = "Time of day", y = "Signal strength") +
  scale_color_discrete(name = "Antenna bearing") +
  facet_grid(recvDeployName ~ .)

This plot shows the typical fly-by pattern of a migrating animal, with signal strength increasing and then decreasing as the tag moves through the beams of the antennas.

Mapping your data

To generate maps of tag paths, we will once again use summarized data so we can work with a much smaller database for faster processing. Here we’ll summarize detections by day. We will use code similar to the code we used in Chapter 5 - Data cleaning, however, here we will create a simple function to summarize the data, since we will likely want to do this type of summary over and over again.

# Simplify the data by summarizing by the runID
# If you want to summarize at a finer (or coarser) scale, you can also create
# other groups.

fun.getpath <- function(df) {
  df %>%
    filter(tagProjID == 176, # keep only tags registered to the sample project
           !is.na(recvDeployLat) | !(recvDeployLat == 0)) %>% 
    group_by(motusTagID, runID, recvDeployName, ambigID, 
             tagDepLon, tagDepLat, recvDeployLat, recvDeployLon) %>%
    summarize(max.runLen = max(runLen), time = mean(time), .groups = "drop") %>%
    arrange(motusTagID, time) %>%
    data.frame()
} # end of function call

df.alltags.path <- fun.getpath(df.alltags.sub)
df.alltags.sub.path <- df.alltags.sub %>%
  filter(tagProjID == 176) %>% # only tags registered to project
  arrange(motusTagID, time) %>%       # order by time stamp for each tag
  mutate(date = as_date(time)) %>%    # create date variable
  group_by(motusTagID, date, recvDeployName, ambigID, 
           tagDepLon, tagDepLat, recvDeployLat, recvDeployLon)

df.alltags.path <- fun.getpath(df.alltags.sub.path)

Creating simple maps with the sf package

Mapping allows us to spatially view our data and connect sequential detections that may represent flight paths in some cases. First we load base maps from the rnaturalearth package. Please see Chapter 4 for more detail if you have not already downloaded the rnaturalearth shapefiles.

world <- ne_countries(scale = "medium", returnclass = "sf") 
lakes <- ne_load(type = "lakes", scale = "medium", category = 'physical',
                 returnclass = "sf",
                 destdir = "map-data") # use this if already downloaded shapefiles
## OGR data source with driver: ESRI Shapefile 
## Source: "/home/runner/work/motus/motus/vignettes/articles/map-data", layer: "ne_50m_lakes"
## with 412 features
## It has 39 fields
## Integer64 fields read as strings:  scalerank ne_id

Then, to map the paths, we set the x-axis and y-axis limits based on the location of receivers with detections. Depending on your data, these might need to be modified to encompass the deployment location of the tags, if tags were not deployed near towers with detections. We then use ggplot and sf to plot the map and tag paths. We add points for receivers and lines connecting consecutive detections by motusTagID. We also include an ‘x’ for where tags were deployed.

# just use the tags that we have examined carefully and filtered (in the
# previous chapter)
df.tmp <- df.alltags.path %>%
  filter(motusTagID %in% c(16011, 16035, 16036, 16037, 16038, 16039)) %>%
  arrange(time)  %>% # arrange by hour
  as.data.frame()

# set limits to map based on locations of detections, ensuring they include the
# deployment locations
xmin <- min(df.tmp$recvDeployLon, na.rm = TRUE) - 2
xmax <- max(df.tmp$recvDeployLon, na.rm = TRUE) + 2
ymin <- min(df.tmp$recvDeployLat, na.rm = TRUE) - 1
ymax <- max(df.tmp$recvDeployLat, na.rm = TRUE) + 1

# map
ggplot(data = world) + 
  geom_sf(colour = NA) +
  geom_sf(data = lakes, colour = NA, fill = "white") +
  coord_sf(xlim = c(xmin, xmax), ylim = c(ymin, ymax), expand = FALSE) +
  theme_bw() + 
  labs(x = "", y = "") +
  geom_path(data = df.tmp, 
            aes(x = recvDeployLon, y = recvDeployLat, 
                group = as.factor(motusTagID), colour = as.factor(motusTagID))) +
  geom_point(data = df.tmp, aes(x = recvDeployLon, y = recvDeployLat), 
             shape = 16, colour = "black") +
  geom_point(data = df.tmp, 
             aes(x = tagDepLon, y = tagDepLat), colour = "red", shape = 4) +
  scale_colour_discrete("motusTagID") 

We can also add points for all receivers that were active, but did not detect these birds during a certain time period if we have already downloaded all metadata.

# get receiver metadata
tbl.recvDeps <- tbl(sql.motus, "recvDeps")
df.recvDeps <- tbl.recvDeps %>% 
  collect() %>% 
  mutate(timeStart = as_datetime(tsStart),
         timeEnd = as_datetime(tsEnd),
         # for deployments with no end dates, make an end date a year from now
         timeEnd = if_else(is.na(timeEnd), Sys.time() + years(1), timeEnd))

# get running intervals for all receiver deployments
siteOp <- with(df.recvDeps, interval(timeStart, timeEnd))

# set the date range you're interested in
dateRange <- interval(as_date("2015-08-01"), as_date("2016-01-01"))

# create new variable "active" which will be set to TRUE if the receiver was
# active at some point during your specified date range, and FALSE if not
df.recvDeps$active <- int_overlaps(siteOp, dateRange) 

# create map with receivers active during specified date range as red, and
# receivers with detections as yellow
ggplot(data = world) + 
  geom_sf(colour = NA) +
  geom_sf(data = lakes, colour = NA, fill = "white") +
  coord_sf(xlim = c(xmin, xmax), ylim = c(ymin, ymax), expand = FALSE) +
  theme_bw() + 
  labs(x = "", y = "") +
  geom_point(data = filter(df.recvDeps, active == TRUE), 
             aes(x = longitude, y = latitude), 
             shape = 16, colour = "grey60") +
  geom_path(data = df.tmp, 
            aes(x = recvDeployLon, y = recvDeployLat, 
                group = as.factor(motusTagID), colour = as.factor(motusTagID))) +
  geom_point(data = df.tmp, aes(x = recvDeployLon, y = recvDeployLat), 
             shape = 16, colour = "black") +
  geom_point(data = df.tmp, 
             aes(x = tagDepLon, y = tagDepLat), colour = "red", shape = 4) +
  scale_colour_discrete("motusTagID") 

Mapping with the ggmap package

Mapping with ggmap allows you to select from multiple base layers.
There are several ways to use the ggmap package. One is with Google Maps, which, as of October 16, 2018, requires a Google Maps API key (see Troubleshooting for more details). An alternative method is to use an open source of map tiles, such as Stamen Map tiles, which do not require an API key. The following example uses Stamen Map tiles.

The first step is to create a map with a specified map are, maptype (“terrain”, “toner”, or “watercolor”, among others), and level of zoom (integer for zoom 3-21, 3 being continent level, 10 being city-scale. For stamen maps this represents the level of detail you want). We then add points for receivers and lines connecting consecutive detections by motusTagID.

gmap <-  get_stamenmap(bbox = c(left = -90, right = -57, bottom = 35, top = 55),
                       maptype = "terrain-background", # select maptype
                       zoom = 6) # zoom, must be a whole number

# just use the tags that we have examined carefully and filtered (in the
# previous chapter)
df.tmp <- df.alltags.path %>%
  filter(motusTagID %in% c(16011, 16035, 16036, 16037, 16038, 16039)) %>%
  arrange(time)  %>% # arrange by hour
  as.data.frame()

ggmap(gmap) +
  theme_bw() + 
  geom_point(data = df.tmp, aes(x = recvDeployLon, y = recvDeployLat), 
             shape = 21, colour = "black", fill = "yellow") +
  geom_path(data = df.tmp, 
            aes(x = recvDeployLon, y = recvDeployLat, group = motusTagID, col = as.factor(motusTagID))) +
  scale_color_discrete(name = "MotusTagID")

The functions above provide examples of how you can begin exploring your data and are by no means exhaustive.

What Next? Explore all articles