[Reducing Gender Bias in Google Translate] Algorithmic Inclusion: A Scalable Approach to Reducing Gender Bias in Google Translate

Time: Wednesday 10-Jun-2020 23:30 (This is a past event.)

Discussion Facilitator:


Motivation / Abstract
Machine learning (ML) models for language translation can be skewed by societal biases reflected in their training data. One such example, gender bias, often becomes more apparent when translating between a gender-specific language and one that is less-so. For instance, Google Translate historically translated the Turkish equivalent of “He/she is a doctor” into the masculine form, and the Turkish equivalent of “He/she is a nurse” into the feminine form.
Questions Discussed
- Sources of bias other than in training data
- How creating a masculine-to-feminine translation dataset would differ if the language used was not English
- How this work would be extended from sentence to document level translation
- Other examples of post-editing applications, at Google or otherwise
- What other solutions were considered to solve the problem of reducing gender bias in translations
Key Takeaways
- Machine learning systems are biased by training data, which is biased due to society. Even using human readers to evaluate translations highlights biases as some readers will prefer masculine or feminine translations in certain situations.
- Developing a model that first detects gendered queries and then translates them into the other gendered versions is not scalable. One reason being that building a classifier to detect gender neutrality in each source language is very data intensive. Post-editing, that is, using a sentence-level rewriter, is scalable and is a better approach especially when translating from gender-neutral languages into English, since that doesn’t require a gender-neutrality detector.
- The method of post-editing translations results in 95% less bias across 4 translation pairs, according to the bias reduction metric.
- Melvin and the team at Google Research plan to extend this work from the sentence level to the document level as well, so be on the lookout for that research, hopefully coming out soon.
Stream Categories:
 Algorithmic Inclusion