An interesting hypothesis has surfaced that states that every non-hyperbolic tweet from Donald Trump is from iPhone (his staff), and every hyperbolic tweet is from Android (from him).
For example, when Trump wishes the Olympic team good luck, it’s from an iPhone. When he’s insulting Iraq, it’s from an Android. Is this a legitimate pattern? Let’s do some math following David Robinson’s approach.
First, we retrieve the necessary data and clean it up:
1
2
3
4
5
6
7
setup_twitter_oauth(getOption("twitter_consumer_key"),
getOption("twitter_consumer_secret"),
getOption("twitter_access_token"),
getOption("twitter_access_token_secret"))
trump_tweets <- userTimeline("realDonaldTrump", n = 3200)
trump_tweets_df <- tbl_df(map_df(trump_tweets, as.data.frame))
1
2
3
4
5
6
library(tidyr)
tweets <- trump_tweets_df %>%
select(id, statusSource, text, created) %>%
extract(statusSource, "source", "Twitter for (.*?)<") %>%
filter(source %in% c("iPhone", "Android"))
Overall, this includes 628 tweets from iPhone, and 762 tweets from Android.
First, let’s see if there’s any patterns in time of day.
1
2
3
4
5
6
7
8
9
10
11
12
library(lubridate)
library(scales)
tweets %>%
count(source, hour = hour(with_tz(created, "EST"))) %>%
mutate(percent = n / sum(n)) %>%
ggplot(aes(hour, percent, color = source)) +
geom_line() +
scale_y_continuous(labels = percent_format()) +
labs(x = "Hour of day (EST)",
y = "% of tweets",
color = "")
Trump on the Android does a lot more tweeting in the morning, while the campaign posts from the iPhone more in the afternoon and early evening.
Let’s do some sentiment analysis to see if we can back this up: are Trump’s tweets more negative on Android?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
sources <- tweet_words %>%
group_by(source) %>%
mutate(total_words = n()) %>%
ungroup() %>%
distinct(id, source, total_words)
by_source_sentiment <- tweet_words %>%
inner_join(nrc, by = "word") %>%
count(sentiment, id) %>%
ungroup() %>%
complete(sentiment, id, fill = list(n = 0)) %>%
inner_join(sources) %>%
group_by(source, sentiment, total_words) %>%
summarize(words = sum(n)) %>%
ungroup()
head(by_source_sentiment)
1
2
3
4
5
6
7
8
9
## # A tibble: 6 x 4
## source sentiment total_words words
## <chr> <chr> <int> <dbl>
## 1 Android anger 4901 321
## 2 Android anticipation 4901 256
## 3 Android disgust 4901 207
## 4 Android fear 4901 268
## 5 Android joy 4901 199
## 6 Android negative 4901 560
1
2
3
4
5
6
7
library(broom)
sentiment_differences <- by_source_sentiment %>%
group_by(sentiment) %>%
do(tidy(poisson.test(.$words, .$total_words)))
sentiment_differences
Let’s visualize the difference with a 95% confidence interval:
Trump’s Android account uses 40-80% more words related to disgust, sadness, fear, anger, and other “negative” sentiments than the iPhone account does. This looks pretty convincing.
Another key observation is that it seems much more likely for Trump’s iPhone tweets to have a picture or a link, which makes sense with an “announcement narrative” from his campaign. Let’s see if this true.
1
2
3
4
5
6
7
8
9
tweet_picture_counts <- tweets %>%
filter(!str_detect(text, '^"')) %>%
count(source,
picture = ifelse(str_detect(text, "t.co"),
"Picture/link", "No picture/link"))
ggplot(tweet_picture_counts, aes(source, n, fill = picture)) +
geom_bar(stat = "identity", position = "dodge") +
labs(x = "", y = "Number of tweets", fill = "")
As it turns out, tweets from the iPhone are 38 times as likely to contain either a picture or a link than tweets from Android.
From time of day, sentiment, and tweet format, the argument that Trump’s own tweets are only from Android seem pretty convincing!