Study of the influence of the MeToo movement on women's mediatic representation


#MeToo, a catalyst for change?


MeToo. Two words, a hashtag, a civil rights movement with tremendous impacts on our society. Everything starts on October 15th, 2017, with a viral Twitter hashtag from Alyssa Milano. The movement, initially created by Tarana Burke, spread all over the world, inviting women to break silence on sexual harassment, violence, discrimination via social networks. The medias stepped in with widespread coverage and enhanced the movement's influence. Finally, women inequalities and the unsafe climate created by actors of society became a trending and polarizing topic on all media. Still, the question of the size of MeToo's impact outside the realm of social media is to be answered. By using the Quotebank [1] dataset from 2015 to 2020, we will investigate the significance of the movement's impact.


The MeToo movement does not exist without social media, especially Twitter. Ever since october 2017, women from various parts of the world started breaking their silence about past misconducts they suffered, trying to hold personnalities accountable (mostly male politicians and entertainment celebrities). The movement transcended virtual spaces and went to the streets. Women are not objects, and sexual harrasment should be talked about. The activists' goals are simple. Not only should the past be dug up, but one should not be afraid to speak up and fight for what is right.

As we can see in the chart below taken from Google trends, the emergence of MeToo (it did not exist before Alyssa Milano wrote her famous tweet) created a wave of searches around the topic of sexual harrasment, which made it peak at the maximal index of interest during the last week of November 2017. The index then came back to where it was before, signifying that the MeToo movement might not have a large long-term influence outside of social media.

Which is why we will try to answer these questions:

Did MeToo have an impact on the place of women in media? And if so, was it all positive, or did it have negative repercussions?

MeToo motif

Women speak much less than men.

After a first look at the data, we can see that the percentage of women speakers in the whole dataset varies between 15 and 20%. We also see that starting from late 2017, the percentage starts going up slightly, which coincides with the first Metoo tweet. We deduce then that MeToo gave more place to women in media, and our study is done...

It is of course not that simple, and even though the rise in percentages is statistically significant, we cannot draw any conclusions this early yet.

When speaking about Women in general or MeToo, men have a 60% majority.

Distribution of the percentage of women speakers in Quotebank and other subdatasets

For different subsets of Quotebank, this graph allows us to see that in general women speakers have increased relatively:

  • The red line fitting the "General" plot roughly goes from 15.6% to 19%.
  • The blue line fitting the "When talking about women" plot goes from 38% to 43%
  • The orange line fitting the "When talking about MeToo" plot goes from 39.5% to 43.7%
  • MeToo motif

    Ok! but what do these men have so much to talk about?

    In order to find out, we trained a machine learning model with data coming from social media, namely Reddit, and tried to see how accurately this model can classify a sentence as mysoginistic or not. When testing it on Quotebank's subdatasets (Talking about women or metoo), we get a very low ratio of mysoginistic quotes to total quotes for both, with a slightly higher ratio when metoo is the subject.

    What we had actually created was a detector of extremely sexist and rude quotes, since a couple of months would not be enough for us to perfect the model and make it detect subtle sexism, as this can be a project by itself. So it proves itself a little useful if we want to extract some sexist quotes on the fly as examples, such as:

    "That's all they do is bitch, moan, and complain... What happened to you today, sweetheart, did they not chill your rosé? Was the trolley not running down at the mall?" - Bill Burr

    "Feminism causes women to hate men, kill their children, become witches, whores & lesbians." - Alex Hall

    "I am a rapist now. I would never rape you because you do not deserve it… slut!" - Jair Bolsonaro

    Running a sentiment analysis...

    Analysing the variation of sentiment for different categories may lead to understanding the reaction the certain groups of people to the metoo movement, and how the overall sentiment around women changed. The sentiment analysis was run on both subdatasets, and we separated quotes by the speakers' gender and age. We also extracted the average sentiment around 2 key dates in the movement, the 15th of October (Date of initial tweet) and the 8th of March (women’s day).

    Young people are more positive than old people.

    The subject of MeToo brings down the sentiment from clearly positive to neutral and slightly negative.

    Middle aged and old men's sentiment pre (2015) and post (2019) metoo did not change. All women and younger men became more positive.

    All the claims above are statistically significant, after analysing them during our work.

    Mean sentimental compound value for different categories

    MeToo motif

    What words appear the most in the quotes?

    Showed below are 3 wordclouds generated in the subdataset where the subject revolves around women.

    Firstly, 2015, when MeToo did not exist yet. The main words we see are related to family, and "sexual assault" and "domestic violence" appear in small.

    Pre wordcloud

    Then, 2017, the year the movement was created. We see new terms appearing (e.g. "sexual harrassment", "sexual misconnduct"...). It is expected for such words and bigrams to blow up since we are at the peak at the movement and any other result would have been unexpected.

    During wordcloud

    Finally, 2019, when the movement is supposed to be past its peak. The main words we see go back family-related ones, but now keywords related to MeToo are more existent than in 2015. It is a great result to see, because even though the main subject around women is back to where it was, we can see some influence that the movement had on the content of the quotes after 2017.

    Post wordcloud

    Our findings coincide with an article made by the Economist, which can be found here. In the image below, the graph on the right shows how the mention of "sexual harrasment" peaked in news articles bewteen 2017 and 2018, and then the number of mentions dropped, but not to the same level as pre-2017

    Post wordcloud
    MeToo motif

    What topics appear the most in the quotes?

    With the help of NLP libraries, we extracted the main topics present in Quotebank, when the subject revolves around women. For this, tokenization and lemmatisation have been applied and we took into account both monograms and bigrams only. Some topics related to MeToo appear (e.g. Harvey Wienstein, sexual abuse, marriage equality...). This is indeed promising, and shows a certain influence of the movement. But it is not a statistically strong claim to say that MeToo had such a big impact that it became a full topic in of itself. The closest our clustering has come to is in cluster 8, where for &lambda = 0.85, we can infer a topic centered around men-women relations, where men can sometimes go too far.

    MeToo motif

    To Conclude...

    Going back to our main question: Did MeToo have any impact on the place of women in media?

    The studies conducted above can lay a strong argument for the possibility that MeToo did have an influence. However, this has to be taken with a grain of salt. The rise in women speakers, the shift in sentiment, the slight change in word usage might be a result of what transpired in October 2017, but it could also be that these results, along with the explosion of the MeToo movement, are both caused by something outside the scope of our study and Quotebank.

    Quotebank dataset

    The data explored in this story comes from the Quotebank dataset. This dataset represents a collection of 178 million quotations from various English speaking news articles. Every quote has then been sanitized and analyzed by a machine learning model named Quobert, based on the NLP model BERT [2] , developed by Google. Quotebank contains probabilities of potential speakers for each quote that was captured from 2008 to 2020. It is publicly available here.

    Data Exclusion Rules

    Data included in this story may differ slightly from other published reports due to certain data decisions. For the purposes of these analyses, the Quotebank dataset has been manipulated as follows:

    1. The whole speaker-attributed dataset from 2008 to 2020 has size around 32.8 GB. This would be too much to handle to create a datastory at our scale. Also, the MeToo movement does not happen until the latter part of the 2010's, which means it would be a waste of memory and time to use all data since 2008.

    2. Since Quotebank assigns to a single quote different speakers with different degrees of certainty, the speaker with the highest probability is kept for the datastory, and the other speakers are discarded.

    3. Quotebank sometimes attributes certain quotes to an "Unknown" speaker. The quote is discarded if "Unknown" is the only possibility. If there are multiple choices, the most certain known speaker is kept and the rest are discared.

    4. A large filtering of the quotes has been done. Since the story revolves around MeToo and the place of women in media, 2 subdatasets have been created with keywords-based filtering:

  • A subdataset representing quotes where women are mentioned or talked about.
  • Another much smaller subdataset where only MeToo-related keywords were fed in order to obtain much more representative quotes towards the movement.
  • References

    [1] Vaucher, T., Spitz, A., Catasta, M. & West, R. (2021). Quotebank: A Corpus of Quotations from a Decade of News . In WSDM ’21, March 8–12, 2021, Virtual Event, Israel.

    [2] Devlin, J., Chang, M.W., Lee, K. & Toutanova, K. (2019) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT.