Schedule

Readings in normal font should be completed and annotated ahead of lecture.
Readings in italic provide optional additional depth on the material.
Assignments are listed on the day when I suggest you begin working on them.

Reading sources:

PSC: Lecture notes I’ve written for this course, hosted here.
PDSH: The Python Data Science Handbook by Vanderplas (2016).
BHN: Fairness and Machine Learning: Limitations and Opportunities by Barocas, Hardt, and Narayanan (2023).

Week 1

Tue Feb. 13	Welcome!
Tue Feb. 13	We introduce our topic and discuss how the course works.
	Learning Objectives Getting Oriented	Reading Course syllabus Collaboration Why I Don't Grade by Jesse Stommel	Notes Welcome slides Data, Patterns, and Models	Warmup Set up your software.	Assignments Math pre-assessment.
Thu Feb. 15	The Classification Workflow in Python
Thu Feb. 15	We work through a simple, complete example of training and evaluating a classification model on a small data set.
	Learning Objectives Navigation Experimentation	Reading PDSH: Data Manipulation with Pandas (through "Aggregation and Grouping")	Notes Lecture notes Live notes (Google Colab)	Warmup Meet the Palmer Penguins!	Assignments Blog Post: Penguins

Week 2

Tue Feb. 20	Linear Score-Based Classification
Tue Feb. 20	We study a fundamental method for binary classification in which data points are assigned scores. Scores above a certain threshold are assigned to one class; scores below are assigned to another.
	Learning Objectives Theory Experimentation	Reading Linear Classifiers from MITx.	Notes Lecture notes Live notes (Google Colab)	Warmup Graphing Decision Boundaries	Assignments ACTUAL REAL DUE DATE: Reflective Goal-Setting due 2/27
Thu Feb. 22	Statistical Decision Theory and Automated Decision-Making
Thu Feb. 22	We discuss the theory of making automated decisions based on a score function. We go into detail on thresholding, error rates, and cost-based optimization.
	Learning Objectives Theory Experimentation	Reading PDSH: Introduction to Numpy	Notes Lecture notes Live notes (Google Colab)	Warmup Choosing a Threshold	Assignments Blog Post: Whose Costs?

Week 3

Tue Feb. 27	Auditing Fairness
Tue Feb. 27	We introduce the topics of fairness and disparity in automated decision systems using a famous case study.
	Learning Objectives Social Responsibility Experimentation	Reading BHN: Introduction Machine Bias by Julia Angwin et al. for ProPublica.	Notes Lecture notes Live notes (Google Colab)	Warmup Experiencing (Un)Fairness	Assignments Reflective Goal-Setting due today
Thu Feb. 29	Statistical Definitions of Fairness in Automated Decision-Making
Thu Feb. 29	We offer formal mathematical definitions of several natural intuitions of fairness, review how to assess them empirically on data in Python, and prove that two major definitions are incompatible with each other.
	Learning Objectives Social Responsibility Theory	Reading BHN: Classification (ok to skip "Relationships between criteria" and below)	Notes Lecture notes Live notes (Google Colab)	Warmup Reading Check	Assignments Blog Post: Bias Replication Study and/or Blog Post: Women in Data Science Conference

Week 4

Tue Mar. 05	Normative Theory of Fairness
Tue Mar. 05	We discuss some of the broad philosophical and political positions that underly the theory of fairness, and connect these positions to statistical definitions.
	Learning Objectives Social Responsibility	Reading BHN: Relative Notions of Fairness	Notes Discussion guide shared in Slack	Warmup COMPAS and Equality of Opportunity
Thu Mar. 07	Critical Perspectives: Interrogate Your Task
Thu Mar. 07	We discuss several critical views that seek to move our attention beyond the fairness of algorithms and towards their role in sociotechnical systems. We center two questions: who benefits from a given data science task? What tasks could we approach instead if our aims were to uplift the oppressed?
	Learning Objectives Social Responsibility	Reading Data Feminism: The Power Chapter by Catherine D'Ignazio and Lauren Klein "The Digital Poorhouse" by Virginia Eubanks "Studying Up: Reorienting the study of algorithmic fairness around issues of power" by Barabas et al.	Notes Discussion guide shared in Slack	Warmup Power, Data, and Studying Up	Assignments Blog Post: Limitations of the Quantitative Approach

Week 5

Tue Mar. 12	Critical Perspectives: Interrogate Your Data
Tue Mar. 12	We discuss the importance of understanding the context of data when planning and executing data science, and effectively communicating this context when sharing our findings.
	Learning Objectives Social Responsibility	Reading Data Feminism: The Numbers Don't Speak For Themselves by Catherine D'Ignazio and Lauren Klein Datasheets for Datasets by Timnit Gebru et al.	Notes Discussion guide shared in Slack	Warmup Data Context and Data Sheets
Thu Mar. 14	Introduction to Model Training: The Perceptron
Thu Mar. 14	We study the perceptron as an example of a linear model with a training algorithm. Our understanding of this algorithm and its shortcomings will form the foundation of our future explorations in empirical risk minimization.
	Learning Objectives Theory	Reading No reading today, but please be ready to put some extra time into the warmup. It may be useful to review our lecture notes on score-based classification and decision theory when completing the warmup.	Notes Lecture notes Live notes (Google Colab)	Warmup Linear Models, Perceptron, and Torch	Assignments Blog Post: Implementing Perceptron

Week 6

Tue Mar. 19	Spring Break!
Tue Mar. 19
		Warmup TBD
Thu Mar. 21	Spring Break!
Thu Mar. 21
		Warmup TBD

Week 7

Tue Mar. 26	Convex Empirical Risk Minimization
Tue Mar. 26	We introduce the framework of convex empirical risk minimization, which offers a principled approach to overcoming the many limitations of the perceptron algorithm.
	Learning Objectives Theory	Reading Convexity Examples by Stephen D. Boyles, pages 1 - 7 (ok to stop when we start talking about gradients and Hessians).	Notes Lecture notes Live notes (Google Colab)	Warmup Practice with Convex Functions	Assignments ACTUAL REAL DUE DATE: Mid-semester reflection due 04/02
Thu Mar. 28	Gradient Descent
Thu Mar. 28	We study a method for finding the minima of convex functions using techniques from calculus and linear algebra.
	Learning Objectives Theory	Reading No reading today, but please budget some extra time for the warmup.	Notes Lecture notes Live notes (Google Colab)	Warmup A First Look at Gradient Descent	Assignments Blog Post: Implementing Logistic Regression

Week 8

Tue Apr. 02	Feature Maps and Regularization
Tue Apr. 02	We re-introduce feature maps as a method for learning nonlinear decision boundaries, and add regularization to the empirical risk minimization problem in order to control the complexity of our learned models.
	Learning Objectives Theory Experimentation	Reading No reading today -- please think hard about your project pitches!	Notes Lecture notes Live notes (Google Colab)	Warmup Project Pitches	Assignments Mid-semester reflection due today ,ACTUAL REAL DUE DATE: Project proposal due 4/9
Thu Apr. 04	Linear Regression
Thu Apr. 04	We introduce linear regression through the framework of convex empirical risk minimization.
	Learning Objectives Theory	Reading No additional reading, but you may need to open up your linear algebra textbook in order to complete the warmup.	Notes Linear Regression Live notes (Google Colab)	Warmup Eigenvalues and Linear Systems

Week 9

Tue Apr. 09	Vectorization and Feature Engineering
Tue Apr. 09	We illustrate the interplay of vectorization and feature engineering on image data.
	Learning Objectives Experimentation Implementation	Reading Image Kernels Explained Visually by Victor Powell	Notes Vectorization and Feature Engineering Live notes (Google Colab)	Warmup Kernel Convolution	Assignments Blog Post: Sparse Kernel Machines
Thu Apr. 11	Kernel Methods
Thu Apr. 11	We introduce kernel methods as an alternative approach to the problem of fitting nonlinear models to data.
	Learning Objectives Theory	Reading Classification and K-Nearest Neighbours by Hiroshi Shimodaira for a course at the University of Edinburgh	Notes Kernel Methods Live notes (Google Colab)	Warmup Introducing Kernel Regression

Week 10

Tue Apr. 16	The Problem of Features and Deep Learning
Tue Apr. 16	We motivate deep learning as an approach to the problem of learning complex nonlinear features in data.
	Learning Objectives Theory Implementation	Reading	Notes The Problem of Features and Deep Learning Live notes (Google Colab)	Warmup Project Check In
Thu Apr. 18	Modern Optimization
Thu Apr. 18	We briefly introduce two concepts in optimization that have enabled large-scale deep learning: stochastic first-order optimization techniques and automatic differentiation.
	Learning Objectives Theory Implementation	Reading	Notes Modern Optimization for Deep Learning Live notes (Google Colab)	Warmup Introducing Stochastic Gradient Descent	Assignments Blog Post: The Adam Algorithm for Optimization

Week 11

Tue Apr. 23	Deep Image Classification
Tue Apr. 23	We return to the image classification problem, using deep learning and large-scale optimization to optimize convolutional kernels as part of the training process.
	Learning Objectives Theory Implementation	Reading Convolutional Neural Networks from MIT's course 6.036: Introduction to Machine Learning	Notes Deep Image Classification Live notes (Google Colab)	Warmup Project Check In
Thu Apr. 25	Deeper Image Classification
Thu Apr. 25	We continue working on an extended practical case study of deep learning for image classification.
	Learning Objectives Theory Implementation	Reading Convolutional Neural Networks from MIT's course 6.036: Introduction to Machine Learning	Notes Deep Image Classification Live notes (Google Colab)	Warmup What Needs to Be Learned?

Finals Period

During the reading and final exam period, you’ll meet with me 1-1 for about 15 minutes. The purpose of this meeting is to help us both reflect on your time in the course and agree on a final grade.

References

Barocas, Solon, Moritz Hardt, and Arvind Narayanan. 2023. Fairness and Machine Learning: Limitations and Opportunities. Cambridge, Massachusetts: The MIT Press.

Vanderplas, Jacob T. 2016. Python Data Science Handbook: Essential Tools for Working with Data. First edition. Sebastopol, CA: O’Reilly Media, Inc.