Introduction to Data-Centric AI

IAP 2023

Typical machine learning classes teach techniques to produce effective models for a given dataset. In real-world applications, data is messy and improving models is not the only way to get better performance. You can also improve the dataset itself rather than treating it as fixed. Data-Centric AI (DCAI) is an emerging science that studies techniques to improve datasets, which is often the best way to improve performance in practical ML applications. While good data scientists have long practiced this manually via ad hoc trial/error and intuition, DCAI considers the improvement of data as a systematic engineering discipline.

This is the first-ever course on DCAI. This class covers algorithms to find and fix common issues in ML data and to construct better datasets, concentrating on data used in supervised learning tasks like classification. All material taught in this course is highly practical, focused on impactful aspects of real-world ML applications, rather than mathematical details of how particular models work. You can take this course to learn practical techniques not covered in most ML classes, which will help mitigate the “garbage in, garbage out” problem that plagues many real-world ML applications.

Flawed Data

Inspired by XKCD 2494 “Flawed Data”

Registration

Sign up for the IAP class by filling out this registration form.

Syllabus

Each lecture has an accompanying lab assignment, a hands-on programming exercise in Python / Jupyter Notebook. You can work on these on your own, in groups, and/or in office hours. This is a not-for-credit IAP class, so you don’t need to hand in homework.

General information

Dates: Tuesday, January 17 – Friday, January 27, 2023
Lecture: 6-120, 1pm–2pm
Office hours: 2-132, 3pm–5pm (every day, after lecture)

Staff: This class is co-taught by Anish, Curtis, Jonas, Cody, Ola, and Sharon.
Questions: Post on Piazza (preferred) or email us at dcai@mit.edu.
Twitter: Follow us at @dcai_course.

Prerequisites

Anyone is welcome to take this course, regardless of background. To get the most out of this course, we recommend that you:

Acknowledgements

We thank Elaine Mello / MIT Open Learning for making it possible for us to record lecture videos, Kate Weishaar / MIT Office of Experiential Learning for supporting this class, and Ashay Athalye / MIT SOUL for editing the lecture videos.


Source code.

Licensed under CC BY-NC-SA.

See here for contribution & translation guidelines.