Laden Sie \(n=10^k\) Tweets von Twitter herunter (mit \(k=4\)) via der Twitter API; die Tweets sollen jeweils an eine prominente Person gerichtet sein.
Beziehen Sie sich auf folgende Personen bzw. Twitter-Accounts:
Markus_Soeder
karl_lauterbach.
Bereiten Sie die Textdaten mit grundlegenden Methoden des Textminings auf (Tokenisieren, Stopwörter entfernen, Zahlen entfernen, …).
Nutzen Sie die Daten dann, um eine Sentimentanalyse zu erstellen.
Vergleichen Sie die Ergebnisse für alle untersuchten Personen.
Solution
library(rtweet)library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.3 ✔ readr 2.1.4
✔ forcats 1.0.0 ✔ stringr 1.5.0
✔ ggplot2 3.4.4 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.0
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ purrr::flatten() masks rtweet::flatten()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tidytext)library(lsa) # Stopwörter
Loading required package: SnowballC
library(SnowballC) # Stemming
data(sentiws, package ="pradadata")
Zuerst muss man sich anmelden und die Tweets herunterladen:
Warning in inner_join(., sentiws): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 6649 of `x` matches multiple rows in `y`.
ℹ Row 3102 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
"many-to-many"` to silence this warning.
Joining with `by = join_by(word)`
Joining with `by = join_by(word)`
Warning in inner_join(., sentiws): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 6649 of `x` matches multiple rows in `y`.
ℹ Row 3102 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
"many-to-many"` to silence this warning.
Joining with `by = join_by(word)`
Joining with `by = join_by(word)`
Warning in inner_join(., sentiws): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 6649 of `x` matches multiple rows in `y`.
ℹ Row 3102 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
"many-to-many"` to silence this warning.
Joining with `by = join_by(word)`
Joining with `by = join_by(word)`
Warning in inner_join(., sentiws): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 6649 of `x` matches multiple rows in `y`.
ℹ Row 3102 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
"many-to-many"` to silence this warning.
Joining with `by = join_by(word)`
Joining with `by = join_by(word)`
Warning in inner_join(., sentiws): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 17223 of `x` matches multiple rows in `y`.
ℹ Row 2894 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
"many-to-many"` to silence this warning.
Categories:
textmining
twitter
programming
Source Code
---extype: stringexsolution: NAexname: twitter06expoints: 1categories:- textmining- twitter- programmingdate: '2022-10-28'slug: twitter06title: twitter06---# ExerciseLaden Sie $n=10^k$ Tweets von Twitter herunter (mit $k=4$) via der Twitter API;die Tweets sollen jeweils an eine prominente Person gerichtet sein.Beziehen Sie sich auf folgende Personen bzw. Twitter-Accounts:- `Markus_Soeder`- `karl_lauterbach`.Bereiten Sie die Textdaten mit grundlegenden Methoden des Textminings auf (Tokenisieren, Stopwörter entfernen, Zahlen entfernen, ...).Nutzen Sie die Daten dann,um eine Sentimentanalyse zu erstellen.Vergleichen Sie die Ergebnisse für alle untersuchten Personen.</br></br></br></br></br></br></br></br></br></br></br></br></br></br></br></br></br></br></br></br># Solution```{r}library(rtweet)library(tidyverse)library(tidytext)library(lsa) # Stopwörterlibrary(SnowballC) # Stemming``````{r}data(sentiws, package ="pradadata")```Zuerst muss man sich anmelden und die Tweets herunterladen:```{r eval=FALSE}source("/Users/sebastiansaueruser/credentials/hate-speech-analysis-v01-twitter.R")auth <- rtweet_app(bearer_token = Bearer_Token)``````{r eval=FALSE}tweets_to_kl <- search_tweets("@karl_lauterbach", n = 1e2, include_rts = FALSE)#write_rds(tweets_to_kl, file = "tweets_to_kl.rds", compress = "gz")tweets_to_ms <- search_tweets("@Markus_Soeder", n = 1e4, include_rts = FALSE)#write_rds(tweets_to_ms, file = "tweets_to_ms.rds", compress = "gz")``````{r echo=FALSE}tweets_to_kl_raw <- read_rds(file = "/Users/sebastiansaueruser/datasets/Twitter/tweets_to_karl_lauterbach.rds")tweets_to_ms_raw <- read_rds(file = "/Users/sebastiansaueruser/datasets/Twitter/tweets_to_ms.rds")```Die Vorverarbeitung pro Screenname packen wir in eine Funktion,das macht es hinten raus einfacher:```{r}prepare_tweets <-function(tweets){ tweets %>%select(full_text) %>%unnest_tokens(output = word, input = full_text) %>%anti_join(tibble(word = lsa::stopwords_de)) %>%mutate(word =str_replace_na(word, "^[:digit:]+$")) %>%mutate(word =str_replace_na(word, "hptts?://\\w+")) %>%mutate(word =str_replace_na(word, " +")) %>%drop_na()}```Test:```{r}kl_prepped <-prepare_tweets(tweets_to_kl_raw)head(kl_prepped)``````{r}ms_prepped <-prepare_tweets(tweets_to_ms_raw)head(ms_prepped)```Scheint zu passen.Die Sentimentanalyse packen wir auch in eine Funktion:```{r}get_tweets_sentiments <-function(tweets){ tweets %>%inner_join(sentiws) %>%group_by(neg_pos) %>%summarise(senti_avg =mean(value, na.rm =TRUE),senti_sd =sd(value, na.rm =TRUE),senti_n =n()) }```Test:```{r}kl_prepped %>%get_tweets_sentiments()```Test:```{r}tweets_to_kl_raw %>%prepare_tweets() %>%get_tweets_sentiments()```Scheint zu passen.Wir könnten noch die beiden Funktionen in eine wrappen:```{r}prep_sentiments <-function(tweets) { tweets %>%prepare_tweets() %>%get_tweets_sentiments()}``````{r}tweets_to_kl_raw %>%prep_sentiments()```Okay, jetzt werden wir die Funktion auf jede Screenname bzw. die Tweets jedes Screennames an.```{r}tweets_list <-list(kl = tweets_to_kl_raw, ms = tweets_to_ms_raw)``````{r}sentis <- tweets_list %>%map_df(prep_sentiments, .id ="id")```---Categories: - textmining- twitter- programming