Time: Thursday 22-Oct-2020 23:30 (This is a past event.)
slides: please to see content
Motivation / Abstract
Discussions of fairness in machine learning datasets typically focus on the *content* of those datasets, in how they (mis)represent certain groups of data subjects. Comparatively less attention has been paid to to the histories, values, and norms that are embedded in the *creation* of these datasets. This paper remedies this gap in the literature, laying out a framework for understanding the labor embedded in dataset construction, and thereby articulating new avenues for contesting these data.
• What do the problems and limitations of benchmark machine learning datasets mean for machine learning models? • Why is interdisciplinarity important to machine learning research? • What can the machine learning community to ensure appropriate responsibility for the "afterlife" of machine learning datasets?
• We must move beyond (only) focusing on the statistical properties of datasets, and move towards examining the contextual and contingent conditions of datasets' construction • Datasets act as infrastructure for machine learning research and development, which limits the avenues of contesting these datasets • Genealogy provides a method of de-naturalizing data infrastructure, opening up new avenues of contestation and intervention