Ellie Frymire’s journey began when she discovered the joy of puzzles as a child. This love of puzzles propelled her to pursue mathematics throughout her education, ultimately graduating from Villanova University in 2013 with a bachelors degree in mathematics. She then began her
career as a consultant for Deloitte Consulting. After four years of seeing the truth in the data often misunderstood, she discovered that there’s a greater challenge when it comes to our data: Insights are meaningless if they can’t be understood by everyone. Ellie recognized the power of meaningful design and intentional presentation of data, and how it can shape understanding. This desire to tell the story of data brought her to Parsons School of Design, where she obtained her Masters of Science in Data Visualization. Ellie’s project #metoo is an exploration in effective presentation of data. The #metoo movement is a increasingly complex conversation, and this project aims to help the viewer understand the
overarching twitter sentiment. After scraping and analyzing over 1.4 million tweets, she uses design and presentation of data to answer the question “what are people really saying about #metoo?”
Following the wake of several women coming forward against Harvey Weinstein, on October 15th, 2017, Alyssa Milano started an online movement behind the hashtag #metoo. She posted,
“If you’ve been sexually harassed or assaulted write ‘me too’ as a reply to this tweet” (@Alyssa_Milano).
What followed was a flood of stories, building a community of support, natively and primarily through social media. The movement encouraged more women to come forward — not only validating the experience of victims, but exposing more perpetrators beyond Weinstein. But is that all that was said within #metoo? This project explores the text of tweets from the 6 months following the birth of the digitally native social movement. By using unsupervised k-means cluster analysis, we can uncover organic themes. The project aims to answer the question: “what are people really saying with #metoo?”
#metoo was shared millions of times from countries all over the world. It sparked new hashtags in other languages – like #balancetonporc and #yotambien. The data used in this project was scraped from the public twitter search page. The only requirement was that the tweet used the hashtag #metoo and was tweeted between October 14th and April 14th. The result was nearly 1.4 million tweets —
1,392,076, to be exact. The below bar chart is a log scale of tweets per day over 6 months. There are clear peaks for the first wave of #metoo tweets (October 16), the release of the silence breakers as Time’s Person of the Year (December 5th), the Golden Globes and the announcement of the #timesup movement (January 7th), and the Oscars (March 4th).
K-means clustering is an unsupervised machine learning process that uses the input to find natural groups in the data. In this case, the only information being used to create these groups was the words in the tweets. For example, if the word “trump” or “vote” was used, the tweet was assigned to be grouped with other tweets that use those words as well. This process is critical for this project because it does not require human intervention. Through k-means clustering, we can let the words “speak for themselves” and create groups just by the nature of the words within.
The roughly 1.4 million tweets in this dataset are analyzed using “bag-of-words”, a process which identifies unique words and finds them across the entire corpus. Because of the size of data, 26,193,288 unique words were found. To make this more manageable, the clustering process only used a small portion of them — the word must be present in at least 0.5% of the corpus and cannot be present in 99% of the corpus. That narrowed the total words down to 348.
The clustering process considers if the word was included — as well as relationships to other words. The result is 425 clusters, each with a group of tweets within them.
The final clusters vary by size. The largest cluster, with over 81,000 tweets, serves as a “catch-all” for those tweets without an obvious direction or intention. The remaining cluster sizes vary from 17,000 – 32. Hover over a cluster to find out more, including its size, name, and top 10 words in that cluster.
Although each individual cluster is interesting. 425 is quite a lot to parse through. By applying a qualitative lens and examining the top words in each tweet and the top tweets in each cluster, we can find some interesting themes brought to life from the data. The following are just a few of the themes that arose from the
clusters. Each theme is comprised of anywhere from 2-10 individual clusters. Each circle represents a single tweet, and the size of the circle is representative of the tweet’s engagement. The top 1000 tweets from each theme are represented here, but there are many more in each cluster.
Although #metoo born through the experiences in Hollywood, the effect reverberated. In these six clusters, the tweets discuss #metoo in politics from both sides of the aisle.
Top words: vote, please, moore, democrat, president, american, donald, end, country, trump.