The People, Politics, & Histories Behind Machine Learning Datasets

Time: Thursday 22-Oct-2020 23:30 (This is a past event.)

slides: please to see content

Motivation / Abstract
Discussions of fairness in machine learning datasets typically focus on the *content* of those datasets, in how they (mis)represent certain groups of data subjects. Comparatively less attention has been paid to to the histories, values, and norms that are embedded in the *creation* of these datasets. This paper remedies this gap in the literature, laying out a framework for understanding the labor embedded in dataset construction, and thereby articulating new avenues for contesting these data.
Questions Discussed
• What do the problems and limitations of benchmark machine learning datasets mean for machine learning models?
• Why is interdisciplinarity important to machine learning research?
• What can the machine learning community to ensure appropriate responsibility for the "afterlife" of machine learning datasets?
Key Takeaways
• We must move beyond (only) focusing on the statistical properties of datasets, and move towards examining the contextual and contingent conditions of datasets' construction
• Datasets act as infrastructure for machine learning research and development, which limits the avenues of contesting these datasets
• Genealogy provides a method of de-naturalizing data infrastructure, opening up new avenues of contestation and intervention
Stream Categories:
 AI Ethics