Past Recording
The People, Politics, & Histories Behind Machine Learning Datasets
Thursday Oct 22 2020 23:30 GMT
Please to join the live chat.
The People, Politics, & Histories Behind Machine Learning Datasets
Why This Is Interesting

Discussions of fairness in machine learning datasets typically focus on the content of those datasets, in how they (mis)represent certain groups of data subjects. Comparatively less attention has been paid to to the histories, values, and norms that are embedded in the creation of these datasets. This paper remedies this gap in the literature, laying out a framework for understanding the labor embedded in dataset construction, and thereby articulating new avenues for contesting these data.

Discussion Points

• What do the problems and limitations of benchmark machine learning datasets mean for machine learning models? • Why is interdisciplinarity important to machine learning research? • What can the machine learning community to ensure appropriate responsibility for the “afterlife” of machine learning datasets?


• We must move beyond (only) focusing on the statistical properties of datasets, and move towards examining the contextual and contingent conditions of datasets’ construction • Datasets act as infrastructure for machine learning research and development, which limits the avenues of contesting these datasets • Genealogy provides a method of de-naturalizing data infrastructure, opening up new avenues of contestation and intervention

Time of Recording: Thursday Oct 22 2020 23:30 GMT
slides: please to see content