How to read Github files into R easily

Reading time ~4 minutes

Downloading a folder (repository) from Github as a whole

The most direct way to get data from Github to your computer/ into R, is to download the repository. That is, click the big green button:


The big, green button saying “Clone or download”, click it and choose “download zip”.

Of course, for those using Git and Github, it would be appropriate to clone the repository. And, although appearing more advanced, cloning has the definitive advantage that you’ll enjoy the whole of the Github features. In fact, the whole purpose of Github is to provide a history of the file(s), so the purpose is not really served if one just downloads the most recent snapshot. But anyhow, that depends on you own will.

Note that “repository” can be thought of as “folder” or “project”.

Once downloaded, you need to unzip the folder. Unzipping means to “extract” or “unpack” the file/folder. On many machines, this can be accomplished by right clicking the icon and choosing something like “extract here”.

Once extracted, just navigate to the folder and open whatever file you are inclined to.

Downloading individual files from Github

In case you do not want to download the whole repository, individual files can be downloaded and parsed to R quite easily:

library(readr)  # for read_csv
library(knitr)  # for kable
myfile <- "https://raw.github.com/sebastiansauer/Daten_Unterricht/master/Affairs.csv"

Affairs <- read_csv(myfile)
## Warning: Missing column names filled in: 'X1' [1]
## Parsed with column specification:
## cols(
##   X1 = col_integer(),
##   affairs = col_integer(),
##   gender = col_character(),
##   age = col_double(),
##   yearsmarried = col_double(),
##   children = col_character(),
##   religiousness = col_integer(),
##   education = col_integer(),
##   occupation = col_integer(),
##   rating = col_integer()
## )
kable(head(Affairs))
X1 affairs gender age yearsmarried children religiousness education occupation rating
1 0 male 37 10.00 no 3 18 7 4
2 0 female 27 4.00 no 4 14 6 4
3 0 female 32 15.00 yes 1 12 1 4
4 0 male 57 15.00 yes 5 18 6 5
5 0 male 22 0.75 no 2 17 6 3
6 0 female 32 1.50 no 2 17 5 5

Let’s quickly deconstruct the url above from Github. In general, we need to write:

https://raw.github.com/user/repository/branch/file.name.

In many cases, the branch will be “master”. You can easily find out about that one the page of the repo you wanna download:

I’ve noticed that unzipping a repository from Github (and downloading a zip file) can cause confusion, so it might be easier to provide a code bit as shown above.

BTW: read.csv should work equally.

This blog has moved

This blog has moved to Adios, Jekyll. Hello, Blogdown!… Continue reading