Some intriguing psychology papers (open access)

This post presents a compilation of links to psychology papers; I have chosen papers I find intriguing particularly for working in class. All papers are open access (or a from open access repositories) which renders classroom work easier. The papers are collected from a broad range of topics but mostly with focus on general interest. The perspective is an applied one; I have not tried to select based on methodological rigor. The collection is structured along the well-known classification of psychological work: social, personality, cognitive. I have added ‘social media/ psychoinformatics’ as this reflects a topic I am quite interested in.

I am unsure about compilations of ‘must read’ psych articles, but I have found some. Such sites may provide a more succinct and broader perspective on much read or influential or interesting or high-qualitative science papers. Here’s a short list:

Different ways to count NAs over multiple columns

There are a number of ways in R to count NAs (missing values). A common use case is to count the NAs over multiple columns, ie., a whole dataframe. That’s basically the question “how many NAs are there in each column of my dataframe”? This post demonstrates some ways to answer this question.

Way 1: using sapply

A typical way (or classical way) in R to achieve some iteration is using apply and friends. sapply renders through a list and simplifies (hence the “s” in sapply) if possible.

sapply(mtcars, function(x) sum(is.na(x)))
#>  mpg  cyl disp   hp drat   wt qsec   vs   am gear carb
#>    0    0    0    0    0    0    0    0    0    0    0


Pros: Straightforward. No dependencies on other packages. Tried and true.
Cons: Not typestable; not sure you will always get the same data type back from this function. You might be surprised and get something you did not expect. That’s no problem in interactive use, but you’d not want for programming.

Way 2: using purrr::map

map maps (applies) a function to each element of a vector/list. Here, the following code reads as “Apply ‘sum(is.na(.))’ on each column of mtcars”. Mind the tilde ~ before function. The dot . refers to the respective column.

library(tidyverse)
map(mtcars, ~sum(is.na(.)))
#> $mpg #> [1] 0 #> #>$cyl
#> [1] 0
#>
#> $disp #> [1] 0 #> #>$hp
#> [1] 0
#>
#> $drat #> [1] 0 #> #>$wt
#> [1] 0
#>
#> $qsec #> [1] 0 #> #>$vs
#> [1] 0
#>
#> $am #> [1] 0 #> #>$gear
#> [1] 0
#>
#> $hp #> estimate statistic p.value parameter conf.low conf.high #> 1 0.8324475 8.228604 3.477861e-09 30 0.6816016 0.9154223 #> method alternative #> 1 Pearson's product-moment correlation two.sided  map applies a function to a list element So, what does map do? It applies a function .fun over all elements of a list .list: map(.list, .fun) .list must either be a list or a simple vector. mapp is convenient for iteration as a replacement of “for-loops”: mtcars %>% select(hp, cyl, mpg) %>% # only three for the sake of demonstration map(~cor.test(.x, mtcars$cyl) %>% tidy)
#> $hp #> estimate statistic p.value parameter conf.low conf.high #> 1 0.8324475 8.228604 3.477861e-09 30 0.6816016 0.9154223 #> method alternative #> 1 Pearson's product-moment correlation two.sided #> #>$cyl
#>   estimate statistic p.value parameter conf.low conf.high
#> 1        1       Inf       0        30        1         1
#>                                 method alternative
#> 1 Pearson's product-moment correlation   two.sided
#>
#> $mpg #> estimate statistic p.value parameter conf.low conf.high #> 1 -0.852162 -8.919699 6.112687e-10 30 -0.9257694 -0.7163171 #> method alternative #> 1 Pearson's product-moment correlation two.sided  BTW, it would be nice to combine the tidy output elements ($hp, $cyl, $mpg) to a dataframe:

mtcars %>%
select(hp, cyl, mpg) %>%  # only three for the sake of demonstration
map_df(~cor.test(.x, mtcars$cyl) %>% tidy) #> estimate statistic p.value parameter conf.low conf.high #> 1 0.8324475 8.228604 3.477861e-09 30 0.6816016 0.9154223 #> 2 1.0000000 Inf 0.000000e+00 30 1.0000000 1.0000000 #> 3 -0.8521620 -8.919699 6.112687e-10 30 -0.9257694 -0.7163171 #> method alternative #> 1 Pearson's product-moment correlation two.sided #> 2 Pearson's product-moment correlation two.sided #> 3 Pearson's product-moment correlation two.sided  map_df maps the function (that’s what comes after “~”) to each list (ie., column) of mtcars. If possible, the resulting elements will be row-binded to a dataframe. To make the output of cor.test nice (ie., tidy) we again use tidy. Extract elements from a list using map Say, we are only interest in the p-value (OMG). How to extract each of the 3 p-values in our example? mtcars %>% select(hp, cyl, mpg) %>% # only three for the sake of demonstration map(~cor.test(.x, mtcars$cyl) %>% tidy) %>%
map("p.value")
#> $hp #> [1] 3.477861e-09 #> #>$cyl
#> [1] 0
#>
#> $mpg #> [1] 6.112687e-10  To extract several elements, say the p-value and r, we can use the [ operator: mtcars %>% select(hp, cyl, mpg) %>% # only three for the sake of demonstration map(~cor.test(.x, mtcars$cyl) %>% tidy) %>%
map([, c("p.value", "statistic"))
#> $hp #> p.value statistic #> 1 3.477861e-09 8.228604 #> #>$cyl
#>   p.value statistic
#> 1       0       Inf
#>
#> $mpg #> p.value statistic #> 1 6.112687e-10 -8.919699  [ is some kind of “extractor” function; it extracts elements, and returns a list or data frame: mtcars["hp"] %>% head #> hp #> Mazda RX4 110 #> Mazda RX4 Wag 110 #> Datsun 710 93 #> Hornet 4 Drive 110 #> Hornet Sportabout 175 #> Valiant 105 mtcars["hp"] %>% head %>% str #> 'data.frame': 6 obs. of 1 variable: #>$ hp: num  110 110 93 110 175 105

x <- list(1, 2, 3)
x[1]
#> [[1]]
#> [1] 1


Maybe more convenient, there is a function called magrittr:extractor. It’s a wrapper aroung [:

library(magrittr)
mtcars %>%
select(hp, cyl, mpg) %>%  # only three for the sake of demonstration
map(~cor.test(.x, mtcars$cyl) %>% tidy) %>% map_df(extract, c("p.value", "statistic")) #> p.value statistic #> 1 3.477861e-09 8.228604 #> 2 0.000000e+00 Inf #> 3 6.112687e-10 -8.919699  Comparing the pipe with base methods Some say, the pipe (#tidyverse) makes analyses in R easier. I agree. This post demonstrates some examples. Let’s take the mtcars dataset as an example. data(mtcars) ?mtcars  Say, we would like to compute the correlation between gasoline consumption (mpg) and horsepower (hp). Base approach 1 cor(mtcars[, c("mpg", "hp")])  ## mpg hp ## mpg 1.0000000 -0.7761684 ## hp -0.7761684 1.0000000  We use the [-operator (function) to select the columns; note that df[, c(col1, col2)] sees dataframes as matrices, and spits out a dataframe, not a vector: class(mtcars[, c("mpg", "hp")])  ## [1] "data.frame"  That’s ok, because cor expects a matrix or a dataframe as input. Alternatively, we can understand dataframes as lists as in the following example. Base approach 2 cor.test(x = mtcars[["mpg"]], y = mtcars[["hp"]])  ## ## Pearson's product-moment correlation ## ## data: mtcars[["mpg"]] and mtcars[["hp"]] ## t = -6.7424, df = 30, p-value = 1.788e-07 ## alternative hypothesis: true correlation is not equal to 0 ## 95 percent confidence interval: ## -0.8852686 -0.5860994 ## sample estimates: ## cor ## -0.7761684  the [[-operator extracts a column from a list (a dataframe is technically a list), and extracts it as a vector. This is useful as some functions, such as cor.test don’t digest dataframes, but want vectors as input (here x, y). Pipe approach 1 We will use dplyr for demonstrating the pipe approach. library(dplyr) mtcars %>% select(mpg, hp) %>% cor  ## mpg hp ## mpg 1.0000000 -0.7761684 ## hp -0.7761684 1.0000000  If you are not acquainted with dplyr, the %>% operator can be translated as then do. More specifically, the result of the the lefthand side (lhs) is transferred as input to the righthand side (rhs). Easy, right? Pipe approach 2 We will need broom here, a package that renders some R output into a nice (ie, tidy) dataframe. For example, cor.test does not spit a nice dataframe when left in the wild. Applying tidy() from broom on the output, we will get a nice dataframe: library(broom) cor.test(x = mtcars[["mpg"]], y = mtcars[["hp"]]) %>% tidy  ## estimate statistic p.value parameter conf.low conf.high ## 1 -0.7761684 -6.742389 1.787835e-07 30 -0.8852686 -0.5860994 ## method alternative ## 1 Pearson's product-moment correlation two.sided  # same: tidy(cor.test(x = mtcars[["mpg"]], y = mtcars[["hp"]]))  ## estimate statistic p.value parameter conf.low conf.high ## 1 -0.7761684 -6.742389 1.787835e-07 30 -0.8852686 -0.5860994 ## method alternative ## 1 Pearson's product-moment correlation two.sided  This code can be made simpler using dplyr: mtcars %>% do(tidy(cor.test(.$mpg, .$hp)))  ## estimate statistic p.value parameter conf.low conf.high ## 1 -0.7761684 -6.742389 1.787835e-07 30 -0.8852686 -0.5860994 ## method alternative ## 1 Pearson's product-moment correlation two.sided  The function do from dplyr runs any function, provided it spits a dataframe. That’s why we first apply tidy from broom, and run do afterwards. The . dot refers to the dataframe as handed over from the last step. We need this piece because cor.test does not know any variable by the name mpg (unless you have attached mtcars beforehands). This code produces the same result: mtcars %>% do(cor.test(.$mpg, .$hp) %>% tidy) %>% knitr::kable()  estimate statistic p.value parameter conf.low conf.high method alternative -0.7761684 -6.742388 2e-07 30 -0.8852686 -0.5860994 Pearson’s product-moment correlation two.sided Pipe appraoch 3 The package magrittr provides some pipe variants, most importantly perhaps the “exposition pipe”, %$%:

mtcars %\$%
cor.test(mpg, hp) %>%
tidy

##     estimate statistic      p.value parameter   conf.low  conf.high
## 1 -0.7761684 -6.742389 1.787835e-07        30 -0.8852686 -0.5860994
##                                 method alternative
## 1 Pearson's product-moment correlation   two.sided


Why is it useful? Let’s spell out the code above in more detail.

• Line 1: “Hey R, pick up mtcars but do not simply pass over this dataframe, but pull out each column and pass those columns over”
• Line 2: “Run the function cor.test with hp and mpg” and then …
• Line 3: “Tidy the result up. Not necessary here but quite nice”.

Remember that cor.test does not accept a dataframe as input. It expects two vectors. That’s why we need to transform the dataframe mtcars to a bundle of vectors (ie., the columns).

Recap

In sum, I think the pipe makes life easier. Of course, one needs to get used to it. But after a while, it’s much simpler than working with deeply nested [` brackets.

Enjoy the pipe!