Spatial Analytics

Geography, some economics and data analysis.

Get going with purrr

Phil Donovan / 2018-09-17


Being from an originally Python background, and like many others, it took me a long time to get my head around the lapply() function and I could never really be bothered to understand the other suite of apply functions. I just didn’t understand them, they behaved weirdly and had arguments in different places to lapply. Fortunately, I wasn’t the only one who found the apply suite a pain; come the purrr package. The purrr package provides a suite of lapply like functions.

Many people, including myself, have asked as to the point of the purrr package. The thread linked above has an answer from R Jesus himself, Hadley Wickham, who explains some of the idiosyncrasies that purrr rectifies compared to base R; particularly with respect to consistent syntax. Anyho, purrr is great so keep reading. if you want more reasoning then follow that link!

The most basic function of purrr is the map() function which behaves in a very similar fashion to lapply(). Perhaps the only real difference between map() and lapply() is that purrr functions can accept formula specifed functions e.g.:

test <- list("love" = 1, "hate" = 2)

map(test, ~. + 10)
## $love
## [1] 11
## 
## $hate
## [1] 12
lapply(test, function(x) {x})
## $love
## [1] 1
## 
## $hate
## [1] 2

However, map() is not the only function which comes with purrr! There are multiple difference functions such as map_df() which instead of returning a list, converts the outputs to a dataframe and binds them, returns a complete dataframe of the results.

iris_split <- split(iris, iris$Species)

map_df(iris_split, ~as.tibble(.))
## Warning: `as.tibble()` is deprecated, use `as_tibble()` (but mind the new semantics).
## This warning is displayed once per session.
## # A tibble: 150 x 5
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
##           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
##  1          5.1         3.5          1.4         0.2 setosa 
##  2          4.9         3            1.4         0.2 setosa 
##  3          4.7         3.2          1.3         0.2 setosa 
##  4          4.6         3.1          1.5         0.2 setosa 
##  5          5           3.6          1.4         0.2 setosa 
##  6          5.4         3.9          1.7         0.4 setosa 
##  7          4.6         3.4          1.4         0.3 setosa 
##  8          5           3.4          1.5         0.2 setosa 
##  9          4.4         2.9          1.4         0.2 setosa 
## 10          4.9         3.1          1.5         0.1 setosa 
## # … with 140 more rows

Another key map variant is that map_if() function which only evaluates if a given criteria evaluates e.g. a column is a double as oppsoed to a character or vector.

map_if(mtcars, is_double, ~. * 10000)
## $mpg
##  [1] 210000 210000 228000 214000 187000 181000 143000 244000 228000 192000
## [11] 178000 164000 173000 152000 104000 104000 147000 324000 304000 339000
## [21] 215000 155000 152000 133000 192000 273000 260000 304000 158000 197000
## [31] 150000 214000
## 
## $cyl
##  [1] 60000 60000 40000 60000 80000 60000 80000 40000 40000 60000 60000
## [12] 80000 80000 80000 80000 80000 80000 40000 40000 40000 40000 80000
## [23] 80000 80000 80000 40000 40000 40000 80000 60000 80000 40000
## 
## $disp
##  [1] 1600000 1600000 1080000 2580000 3600000 2250000 3600000 1467000
##  [9] 1408000 1676000 1676000 2758000 2758000 2758000 4720000 4600000
## [17] 4400000  787000  757000  711000 1201000 3180000 3040000 3500000
## [25] 4000000  790000 1203000  951000 3510000 1450000 3010000 1210000
## 
## $hp
##  [1] 1100000 1100000  930000 1100000 1750000 1050000 2450000  620000
##  [9]  950000 1230000 1230000 1800000 1800000 1800000 2050000 2150000
## [17] 2300000  660000  520000  650000  970000 1500000 1500000 2450000
## [25] 1750000  660000  910000 1130000 2640000 1750000 3350000 1090000
## 
## $drat
##  [1] 39000 39000 38500 30800 31500 27600 32100 36900 39200 39200 39200
## [12] 30700 30700 30700 29300 30000 32300 40800 49300 42200 37000 27600
## [23] 31500 37300 30800 40800 44300 37700 42200 36200 35400 41100
## 
## $wt
##  [1] 26200 28750 23200 32150 34400 34600 35700 31900 31500 34400 34400
## [12] 40700 37300 37800 52500 54240 53450 22000 16150 18350 24650 35200
## [23] 34350 38400 38450 19350 21400 15130 31700 27700 35700 27800
## 
## $qsec
##  [1] 164600 170200 186100 194400 170200 202200 158400 200000 229000 183000
## [11] 189000 174000 176000 180000 179800 178200 174200 194700 185200 199000
## [21] 200100 168700 173000 154100 170500 189000 167000 169000 145000 155000
## [31] 146000 186000
## 
## $vs
##  [1]     0     0 10000 10000     0 10000     0 10000 10000 10000 10000
## [12]     0     0     0     0     0     0 10000 10000 10000 10000     0
## [23]     0     0     0 10000     0 10000     0     0     0 10000
## 
## $am
##  [1] 10000 10000 10000     0     0     0     0     0     0     0     0
## [12]     0     0     0     0     0     0 10000 10000 10000     0     0
## [23]     0     0     0 10000 10000 10000 10000 10000 10000 10000
## 
## $gear
##  [1] 40000 40000 40000 30000 30000 30000 30000 40000 40000 40000 40000
## [12] 30000 30000 30000 30000 30000 30000 40000 40000 40000 30000 30000
## [23] 30000 30000 30000 40000 50000 50000 50000 50000 50000 40000
## 
## $carb
##  [1] 40000 40000 10000 10000 20000 10000 40000 20000 20000 40000 40000
## [12] 30000 30000 30000 40000 40000 40000 10000 20000 10000 10000 20000
## [23] 20000 40000 20000 10000 20000 20000 40000 60000 80000 20000

There are a whole host of other map functions which return the output of a formula in a specified way.

Another function worth presenting is the imap() function which is an indexed map. According to the documentation, this function is useful if you need to compute on both the value and the position of an element.

imap_chr(sample(10), ~ paste0(.y, ": ", .x))
##  [1] "1: 8"  "2: 6"  "3: 9"  "4: 10" "5: 5"  "6: 3"  "7: 7"  "8: 4" 
##  [9] "9: 2"  "10: 1"

Meanwhile, the walk() returns the input .x ‘invisibly’ or without returning the output of the function but of .x. This makes it easy to use in a pipe, say to write csv files as it doesn’t return a list of empty values.

iwalk(mtcars, ~ cat(.y, ": ", median(.x), "\n", sep = ""))
## mpg: 19.2
## cyl: 6
## disp: 196.3
## hp: 123
## drat: 3.695
## wt: 3.325
## qsec: 17.71
## vs: 0
## am: 0
## gear: 4
## carb: 2

Note that no ‘list’ was outputted from iwalk() function above.

All of these functions also have a second (2) variant which accepts two lists simultaneously where the first list input is denoted by .x and the second by .y.

test_1 <- 1:2
test_2 <- 3:4

map2(test_1, test_2, ~.x + .y)
## [[1]]
## [1] 4
## 
## [[2]]
## [1] 6

Finally, there are the pmap() variants which are similar to map2 but they accept multiple lists (more than two). In a formula, they denote their input arguments with ..1, ..2, ..3 etc.

test_3 <- 5:6
test_4 <- 7:8

all_tests <- list(test_1, test_2, test_3, test_4)
pmap(all_tests, ~..1 + ..2 + ..3 + ..4)
## [[1]]
## [1] 16
## 
## [[2]]
## [1] 20

Overall, I find that the map functions do behave better, and are easier to grasp than the base R functions; but each to their own. If you know the apply suite and you are happy with it, then you can keep using it!

For lesser known purrr tricks, I suggest you read this.