library(tidyverse)
<- "https://vincentarelbundock.github.io/Rdatasets/csv/palmerpenguins/penguins.csv"
d_path <- read_csv(d_path)
d nrow(d)
[1] 344
May 14, 2023
Zählen Sie fehlende Werte im Datensatz penguins
zeilenweise.
[1] 0 0 0 5 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[38] 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[149] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
[186] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
[223] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
[260] 0 0 0 0 0 0 0 0 0 1 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[297] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[334] 0 0 0 0 0 0 0 0 0 0 0
d_na_only <-
d %>%
rowwise() %>%
mutate(na_n = sum(is.na(cur_data()))) %>% # "na_n" für "Anzahl (n) NA"
ungroup()
d_na_only %>%
pull(na_n)
[1] 0 0 0 5 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[38] 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[149] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
[186] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
[223] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
[260] 0 0 0 0 0 0 0 0 0 1 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[297] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[334] 0 0 0 0 0 0 0 0 0 0 0
[1] 0 0 0 5 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[38] 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[149] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
[186] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
[223] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
[260] 0 0 0 0 0 0 0 0 0 1 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[297] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[334] 0 0 0 0 0 0 0 0 0 0 0
Der folgende Weg funktioniert nur, wenn alle Variablen vom Typ numeric
sind.
Wir definieren eine Funktion, die die NAs des Vektors x
zählt:
Diese Funktion “mappen” wir wir auf jede Spalte (und lassen uns einen numerischen Vektor dbl
(“double”) zurückgeben):
rownames species island bill_length_mm
0 0 0 2
bill_depth_mm flipper_length_mm body_mass_g sex
2 2 2 11
year
0
Categories:
---
exname: filter-na5
expoints: 1
extype: string
exsolution: NA
categories:
- 2023
- eda
- na
- string
date: '2023-05-14'
slug: filter-na5
title: filter-na5
---
# Aufgabe
Zählen Sie fehlende Werte im Datensatz `penguins` *zeilenweise*.
</br>
</br>
</br>
</br>
</br>
</br>
</br>
</br>
</br>
</br>
# Lösung
## Setup
```{r}
library(tidyverse)
d_path <- "https://vincentarelbundock.github.io/Rdatasets/csv/palmerpenguins/penguins.csv"
d <- read_csv(d_path)
nrow(d)
```
## Weg 1
```{r}
apply(d, 1, function(x) sum(is.na(x)))
```
## Weg 2
```{r}
d_na_only <-
d %>%
rowwise() %>%
mutate(na_n = sum(is.na(cur_data()))) %>% # "na_n" für "Anzahl (n) NA"
ungroup()
d_na_only %>%
pull(na_n)
```
## Weg 3
```{r}
rowSums(is.na(d))
```
## Weg 4
Der folgende Weg funktioniert nur, wenn alle Variablen vom Typ `numeric` sind.
```{r error=TRUE}
d %>%
rowwise() %>%
mutate(na_n = sum(is.na(c_across(everything()))))
```
## Weg 5
Wir definieren eine Funktion, die die NAs des Vektors `x` zählt:
```{r}
sum_na <- function(x) {sum(is.na(x))}
```
Diese Funktion "mappen" wir wir auf jede Spalte (und lassen uns einen numerischen Vektor `dbl` ("double") zurückgeben):
```{r}
d |>
map_dbl(sum_na)
```
---
Categories:
- 2023
- eda
- na
- string