Some tags also collect GPS points, which can be used for more precise locations (as opposed to the receiver location recDeployLat
/recDeployLon
). In this article we will show you how to efficiently work with these GPS points.
First, let’s get set up. The motus.sample data (project 176) doesn’t have GPS points, so in these next examples we’ll be using a different data base called gps_tags
.
## <SQLiteConnection>
## Path: /home/runner/work/motus/motus/vignettes/articles/data/gps_sample.motus
## Extensions: TRUE
If your data has gps point, you should be able to follow along using your own data (replace PROJECT_NUMBER
with the number corresponding to your project.
gps_tags <- tagme(PROJECT_NUMBER, update = TRUE, new = TRUE)
GPS points are stored in the gps
table.
tbl(gps_tags, "gps")
## # Source: table<gps> [?? x 11]
## # Database: sqlite 3.39.3 [/home/runner/work/motus/motus/vignettes/articles/data/gps_sample.motus]
## gpsID batchID ts gpsts lat lon alt quality lat_mean lon_m…¹
## <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl>
## 1 313414 127792 1510337399 NA 33.5 -104. 1074 NA NA NA
## 2 313415 127792 1510340999 NA 33.5 -104. 1072. NA NA NA
## 3 4967619 370452 1457110815 NA 36.5 -76.0 -3.4 NA NA NA
## 4 4967620 370452 1457114415 NA 36.5 -76.0 8.7 NA NA NA
## 5 4967621 370452 1457118015 NA 36.5 -76.0 5.3 NA NA NA
## 6 4967622 370452 1457121615 NA 36.5 -76.0 -6.6 NA NA NA
## 7 4967623 370452 1457125215 NA 36.5 -76.0 -3.1 NA NA NA
## 8 4967624 370452 1457128815 NA 36.5 -76.0 2.5 NA NA NA
## 9 4967625 370452 1457132415 NA 36.5 -76.0 -4.8 NA NA NA
## 10 4967626 370452 1457136016 NA 36.5 -76.0 1 NA NA NA
## # … with more rows, 1 more variable: n_fixes <int>, and abbreviated variable
## # name ¹lon_mean
Working with GPS points
You can work with GPS points in one of two ways:
- For small data bases - Use the
alltagsGPS
view directly (can be slow) - For large data bases - Filter first, then use the
addGPS()
function to match GPS points to your hits
alltagsGPS
view
The alltagsGPS
view combines hits and GPS points
## # Source: SQL [?? x 7]
## # Database: sqlite 3.39.3 [/home/runner/work/motus/motus/vignettes/articles/data/gps_sample.motus]
## hitID runID batchID ts gpsLat gpsLon gpsAlt
## <int> <int> <int> <dbl> <dbl> <dbl> <dbl>
## 1 32811108 7984227 118721 1509139743. NA NA NA
## 2 32811109 7984227 118721 1509139823. NA NA NA
## 3 32912898 7985620 120474 1515877560. NA NA NA
## 4 32912899 7985620 120474 1515877600. NA NA NA
## 5 32913271 7985787 120474 1515878154. NA NA NA
## 6 32913272 7985787 120474 1515878194. NA NA NA
## 7 32913404 7985850 120474 1515878250. NA NA NA
## 8 32913405 7985850 120474 1515878329. NA NA NA
## 9 32913717 7985998 120474 1515878623. NA NA NA
## 10 32913718 7985998 120474 1515878662. NA NA NA
## # … with more rows
Note that not all hits have a GPS point associated, but we can filter to those that do:
tbl(gps_tags, "alltagsGPS") %>%
select(hitID, runID, batchID, ts, gpsLat, gpsLon, gpsAlt) %>%
filter(!is.na(gpsLat))
## # Source: SQL [?? x 7]
## # Database: sqlite 3.39.3 [/home/runner/work/motus/motus/vignettes/articles/data/gps_sample.motus]
## hitID runID batchID ts gpsLat gpsLon gpsAlt
## <int> <int> <int> <dbl> <dbl> <dbl> <dbl>
## 1 449016552 24439773 370452 1457111891. 36.5 -76.0 -3.4
## 2 449016553 24439773 370452 1457111897. 36.5 -76.0 -3.4
## 3 449016554 24439773 370452 1457111904. 36.5 -76.0 -3.4
## 4 449016555 24439774 370452 1457111897. 36.5 -76.0 -3.4
## 5 449016556 24439774 370452 1457111904. 36.5 -76.0 -3.4
## 6 449016557 24439775 370452 1457111891. 36.5 -76.0 -3.4
## 7 449016558 24439775 370452 1457111910. 36.5 -76.0 -3.4
## 8 449016559 24439773 370452 1457111910. 36.5 -76.0 -3.4
## 9 449016560 24439774 370452 1457111916. 36.5 -76.0 -3.4
## 10 449016561 24439775 370452 1457111916. 36.5 -76.0 -3.4
## # … with more rows
Note: The
alltagsGPS
view is the same as thealltags
view but includes GPS points. Because of this, thealltagsGPS
view can be slower to work with, particularly if you have a large database.
Filtering then adding GPS data
For example, let’s work with a subset of the alltags view, including only King Rails.
Now let’s retrieve the daily median location of GPS points for these data. Note that we use both the original database gps_tags
as well as the data subset rails
. Also note that the getGPS()
function requires the original, numeric ts
column, so if you want a date/time column it’s best to rename it (i.e. time = as_datetime(ts)
).
index_GPS <- getGPS(src = gps_tags, data = rails)
This table is an index matching GPS points to specific hitID
, so the next step is to join it into your data subset using the left_join()
function from the dplyr
package. Note that at this point, we need to use collect()
to ensure that rails
is ‘flat’ (i.e. a data frame, not a database; see Converting to flat data for more details).
We can subset the columns to see if it worked as expected (we filter to non-missing, because not all hits have a GPS point when using the default matching).
## # A tibble: 90 × 9
## hitID runID batchID ts gpsLat gpsLon gpsAlt gpsTs_…¹ gpsTs…²
## <int> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 34126729 8211543 127792 1505763333. 40.4 -76.1 106. 1.51e9 1.51e9
## 2 34126730 8211543 127792 1505763412. 40.4 -76.1 106. 1.51e9 1.51e9
## 3 34126795 8211576 127792 1505778822. 40.4 -76.1 106. 1.51e9 1.51e9
## 4 34126796 8211576 127792 1505778901. 40.4 -76.1 106. 1.51e9 1.51e9
## 5 34126846 8211593 127792 1505837262. 40.4 -76.1 106. 1.51e9 1.51e9
## 6 34126847 8211593 127792 1505837341. 40.4 -76.1 106. 1.51e9 1.51e9
## 7 34128916 8212308 132026 1505348966. 40.4 -76.1 105. 1.51e9 1.51e9
## 8 34128917 8212308 132026 1505349006. 40.4 -76.1 105. 1.51e9 1.51e9
## 9 1136575629 47145854 742386 1496844344. 44.0 -79.5 313. 1.50e9 1.50e9
## 10 1136575630 47145854 742386 1496844383. 44.0 -79.5 313. 1.50e9 1.50e9
## # … with 80 more rows, and abbreviated variable names ¹gpsTs_min, ²gpsTs_max
More ways of matching GPS points
By default, getGPS()
matches GPS points to hits by date. However, we can match GPS locations to hitID
s according to one of several different time values, specified by the by
argument.
by
can be one of three options:
- the median location within
by = X
minutes of ahitID
- here,
X
can be any number greater than zero and represents the size of the time block in minutes over which to calculate a median location - be aware that you should ideally not chose a period smaller than the frequency at which GPS fixes are recorded, or some hits will not be associated with GPS
- here,
For example, the median location within 60 minutes of a hitID
.
index_GPS <- getGPS(src = gps_tags, data = rails, by = 60)
index_GPS
## # A tibble: 78 × 6
## hitID gpsLat gpsLon gpsAlt gpsTs_min gpsTs_max
## <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 34126729 40.4 -76.1 111. 1505761796 1505761796
## 2 34126730 40.4 -76.1 111. 1505761796 1505761796
## 3 34126795 40.4 -76.1 110. 1505776197 1505776197
## 4 34126796 40.4 -76.1 110. 1505776197 1505776197
## 5 34126846 40.4 -76.1 106. 1505837998 1505837998
## 6 34126847 40.4 -76.1 106. 1505837998 1505837998
## 7 34128916 40.4 -76.1 100. 1505350490 1505350490
## 8 34128917 40.4 -76.1 100. 1505350490 1505350490
## 9 1136575629 44.0 -79.5 312. 1496846031 1496846031
## 10 1136575630 44.0 -79.5 312. 1496846031 1496846031
## # … with 68 more rows
-
by = "daily"
median location (default, used in first example)- similar to
by = X
except the duration is 24hr (same asby = 1440
) - this method is most suitable for receiver deployments at fixed location.
- similar to
index_GPS <- getGPS(src = gps_tags, data = rails, by = "daily")
index_GPS
## # A tibble: 90 × 6
## hitID gpsLat gpsLon gpsAlt gpsTs_min gpsTs_max
## <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 34126729 40.4 -76.1 106. 1505693095 1505776197
## 2 34126730 40.4 -76.1 106. 1505693095 1505776197
## 3 34126795 40.4 -76.1 106. 1505693095 1505776197
## 4 34126796 40.4 -76.1 106. 1505693095 1505776197
## 5 34126846 40.4 -76.1 106. 1505780097 1505863198
## 6 34126847 40.4 -76.1 106. 1505780097 1505863198
## 7 34128916 40.4 -76.1 105. 1505350490 1505433291
## 8 34128917 40.4 -76.1 105. 1505350490 1505433291
## 9 1136575629 44.0 -79.5 313. 1496797130 1496879031
## 10 1136575630 44.0 -79.5 313. 1496797130 1496879031
## # … with 80 more rows
- or the
by = "closest"
location in time- individual GPS lat/lons are returned, matching the closest
hitID
timestamp - this method is most accurate for mobile deployments, but is potentially slower than
by = X
. - you can also specify a
cutoff
which will only match GPS records which are withincutoff = X
minutes of the hit. This way you can avoid having situations where the ‘closest’ GPS record is actually days away.
- individual GPS lat/lons are returned, matching the closest
For example, the closest location in time noted within 2 hours of a hit.
index_GPS <- getGPS(src = gps_tags, data = rails, by = "closest", cutoff = 120)
index_GPS
## # A tibble: 90 × 6
## hitID gpsID gpsLat gpsLon gpsAlt gpsTs
## <int> <int> <dbl> <dbl> <dbl> <dbl>
## 1 34126729 23978179 40.4 -76.1 111. 1505761796
## 2 34126730 23978179 40.4 -76.1 111. 1505761796
## 3 34126795 23978184 40.4 -76.1 100. 1505780097
## 4 34126796 23978184 40.4 -76.1 100. 1505780097
## 5 34126846 23978200 40.4 -76.1 106. 1505837998
## 6 34126847 23978200 40.4 -76.1 106. 1505837998
## 7 34128916 23978065 40.4 -76.1 100. 1505350490
## 8 34128917 23978065 40.4 -76.1 100. 1505350490
## 9 1136575629 22242037 44.0 -79.5 312. 1496846031
## 10 1136575630 22242037 44.0 -79.5 312. 1496846031
## # … with 80 more rows
To keep all hitID
s, regardless of whether they match to GPS data or not, use the argument keepAll = TRUE
. This results in NA
for gpsLat
, gpsLon
and gpsAlt
where there is no corresponding GPS hit (otherwise the hit is omitted).
index_GPS <- getGPS(src = gps_tags, data = rails, keepAll = TRUE)
index_GPS
## # A tibble: 2,748 × 6
## hitID gpsLat gpsLon gpsAlt gpsTs_min gpsTs_max
## <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 32811108 NA NA NA NA NA
## 2 32811109 NA NA NA NA NA
## 3 32912898 NA NA NA NA NA
## 4 32912899 NA NA NA NA NA
## 5 32913271 NA NA NA NA NA
## 6 32913272 NA NA NA NA NA
## 7 32913404 NA NA NA NA NA
## 8 32913405 NA NA NA NA NA
## 9 32913717 NA NA NA NA NA
## 10 32913718 NA NA NA NA NA
## # … with 2,738 more rows
Using GPS locations
Now that we have our GPS data (either through loading alltagsGPS
or using the getGPS()
function), we can use these coordinates when cleaning our data in Chapter 5.
In Chapter 5, we used the receiver deployment latitude and longitude: recvDepLat
, recvDepLon
.
However, now that you have gps data, you can create receiver latitude and longitude variables (recvLat
, recvLon
, recvAlt
) based on the coordinates recorded by the receiver GPS (gpsLat
, gpsLon
, gpsAlt
), and where those are not available, infilled with coordinates from the receiver deployment metadata (recvDeployLat
, recvDeployLon
, recvDeployAlt
).
Missing GPS coordinates may appear as NA
if they are missing, or as 0
or 999
if there was a problem with the unit recording.
However, as we are changing values in the data, we’ll need to flatten the data first.
Using the example from Chapter 5 starting with Checking receivers
sql.motus <- tagme(176, update = TRUE, dir = "./data/")
## Checking for new data in project 176
## Updating metadata
## activity: 1 new batch records to check
## batchID 1977125 (# 1 of 1): got 156 activity records
## Downloaded 156 activity records
## nodeData: 0 new batch records to check
## Fetching deprecated batches
## Total deprecated batches: 6
## New deprecated batches: 0
df.alltagsGPS <- tbl(sql.motus, "alltagsGPS") %>%
mutate(recvLat = if_else(is.na(gpsLat) | gpsLat %in% c(0, 999),
recvDeployLat, gpsLat),
recvLon = if_else(is.na(gpsLon) | gpsLon %in% c(0, 999),
recvDeployLon, gpsLon),
recvAlt = if_else(is.na(gpsAlt), recvDeployAlt, gpsAlt)) %>%
collect() # Flatten the data
# Take a look
select(df.alltagsGPS, hitID, recvLat, recvLon)
## # A tibble: 187,879 × 3
## hitID recvLat recvLon
## <int64> <dbl> <dbl>
## 1 45107 42.6 -72.7
## 2 45108 42.6 -72.7
## 3 45109 42.6 -72.7
## 4 45110 42.6 -72.7
## 5 45111 42.6 -72.7
## 6 199885 42.7 -72.5
## 7 199886 42.7 -72.5
## 8 199887 42.7 -72.5
## 9 199888 42.7 -72.5
## 10 199889 42.7 -72.5
## # … with 187,869 more rows
Continuing on in Chapter 5 with these values, you would then replace all instances of recvDeployLat
with recvLat
and recevDeployLon
with recvLon
.
You would also use this flattened data frame, df.alltagsGPS
, rather than the un-flattened tbl.alltagsGPS
.
What Next? Explore all articles