Schedule

Reading sources:

Week 1

Tue
Feb. 13
Welcome!
We introduce our topic and discuss how the course works.
Learning Objectives
Getting Oriented
Reading
Course syllabus
Collaboration
Why I Don't Grade by Jesse Stommel
Notes
Welcome slides
Data, Patterns, and Models
Warmup
Set up your software.
Assignments
Math pre-assessment.
Thu
Feb. 15
The Classification Workflow in Python
We work through a simple, complete example of training and evaluating a classification model on a small data set.
Learning Objectives
Navigation
Experimentation
Reading
PDSH: Data Manipulation with Pandas (through "Aggregation and Grouping")
Notes
Lecture notes
Live notes (Google Colab)
Warmup
Meet the Palmer Penguins!
Assignments
Blog Post: Penguins

Week 2

Tue
Feb. 20
Linear Score-Based Classification
We study a fundamental method for binary classification in which data points are assigned scores. Scores above a certain threshold are assigned to one class; scores below are assigned to another.
Learning Objectives
Theory
Experimentation
Reading
Linear Classifiers from MITx.
Notes
Lecture notes
Live notes (Google Colab)
Warmup
Graphing Decision Boundaries
Assignments
ACTUAL REAL DUE DATE: Reflective Goal-Setting due 2/27
Thu
Feb. 22
Statistical Decision Theory and Automated Decision-Making
We discuss the theory of making automated decisions based on a score function. We go into detail on thresholding, error rates, and cost-based optimization.
Learning Objectives
Theory
Experimentation
Reading
PDSH: Introduction to Numpy
Notes
Lecture notes
Live notes (Google Colab)
Warmup
Choosing a Threshold
Assignments
Blog Post: Whose Costs?

Week 3

Tue
Feb. 27
Auditing Fairness
We introduce the topics of fairness and disparity in automated decision systems using a famous case study.
Learning Objectives
Social Responsibility
Experimentation
Reading
BHN: Introduction
Machine Bias by Julia Angwin et al. for ProPublica.
Notes
Lecture notes
Live notes (Google Colab)
Warmup
Experiencing (Un)Fairness
Assignments
Reflective Goal-Setting due today
Thu
Feb. 29
Statistical Definitions of Fairness in Automated Decision-Making
We offer formal mathematical definitions of several natural intuitions of fairness, review how to assess them empirically on data in Python, and prove that two major definitions are incompatible with each other.
Learning Objectives
Social Responsibility
Theory
Reading
BHN: Classification (ok to skip "Relationships between criteria" and below)
Notes
Lecture notes
Live notes (Google Colab)
Warmup
Reading Check
Assignments
Blog Post: Bias Replication Study
and/or
Blog Post: Women in Data Science Conference

Week 4

Tue
Mar. 05
Normative Theory of Fairness
We discuss some of the broad philosophical and political positions that underly the theory of fairness, and connect these positions to statistical definitions.
Learning Objectives
Social Responsibility
Reading
BHN: Relative Notions of Fairness
Notes
Discussion guide shared in Slack
Warmup
COMPAS and Equality of Opportunity
Thu
Mar. 07
Critical Perspectives: Interrogate Your Task
We discuss several critical views that seek to move our attention beyond the fairness of algorithms and towards their role in sociotechnical systems. We center two questions: who benefits from a given data science task? What tasks could we approach instead if our aims were to uplift the oppressed?
Learning Objectives
Social Responsibility
Reading
Data Feminism: The Power Chapter by Catherine D'Ignazio and Lauren Klein
"The Digital Poorhouse" by Virginia Eubanks
"Studying Up: Reorienting the study of algorithmic fairness around issues of power" by Barabas et al.
Notes
Discussion guide shared in Slack
Warmup
Power, Data, and Studying Up
Assignments
Blog Post: Limitations of the Quantitative Approach

Week 5

Tue
Mar. 12
Critical Perspectives: Interrogate Your Data
We discuss the importance of understanding the context of data when planning and executing data science, and effectively communicating this context when sharing our findings.
Learning Objectives
Social Responsibility
Reading
Data Feminism: The Numbers Don't Speak For Themselves by Catherine D'Ignazio and Lauren Klein
Datasheets for Datasets by Timnit Gebru et al.
Notes
Discussion guide shared in Slack
Warmup
Data Context and Data Sheets
Thu
Mar. 14
Introduction to Model Training: The Perceptron
We study the perceptron as an example of a linear model with a training algorithm. Our understanding of this algorithm and its shortcomings will form the foundation of our future explorations in empirical risk minimization.
Learning Objectives
Theory
Reading
No reading today, but please be ready to put some extra time into the warmup. It may be useful to review our lecture notes on score-based classification and decision theory when completing the warmup.
Notes
Lecture notes
Live notes (Google Colab)
Warmup
Linear Models, Perceptron, and Torch
Assignments
Blog Post: Implementing Perceptron

Week 6

Tue
Mar. 19
Spring Break!
Warmup
TBD
Thu
Mar. 21
Spring Break!
Warmup
TBD

Week 7

Tue
Mar. 26
Convex Empirical Risk Minimization
We introduce the framework of convex empirical risk minimization, which offers a principled approach to overcoming the many limitations of the perceptron algorithm.
Learning Objectives
Theory
Reading
Convexity Examples by Stephen D. Boyles, pages 1 - 7 (ok to stop when we start talking about gradients and Hessians).
Notes
Lecture notes
Live notes (Google Colab)
Warmup
Practice with Convex Functions
Assignments
ACTUAL REAL DUE DATE: Mid-semester reflection due 04/02
Thu
Mar. 28
Gradient Descent
We study a method for finding the minima of convex functions using techniques from calculus and linear algebra.
Learning Objectives
Theory
Reading
No reading today, but please budget some extra time for the warmup.
Notes
Lecture notes
Live notes (Google Colab)
Warmup
A First Look at Gradient Descent
Assignments
Blog Post: Implementing Logistic Regression

Week 8

Tue
Apr. 02
Feature Maps and Regularization
We re-introduce feature maps as a method for learning nonlinear decision boundaries, and add regularization to the empirical risk minimization problem in order to control the complexity of our learned models.
Learning Objectives
Theory
Experimentation
Reading
No reading today -- please think hard about your project pitches!
Notes
Lecture notes
Live notes (Google Colab)
Warmup
Project Pitches
Assignments
Mid-semester reflection due today ,ACTUAL REAL DUE DATE: Project proposal due 4/9
Thu
Apr. 04
Linear Regression
We introduce linear regression through the framework of convex empirical risk minimization.
Learning Objectives
Theory
Reading
No additional reading, but you may need to open up your linear algebra textbook in order to complete the warmup.
Notes
Linear Regression
Live notes (Google Colab)
Warmup
Eigenvalues and Linear Systems

Week 9

Tue
Apr. 09
Vectorization and Feature Engineering
We illustrate the interplay of vectorization and feature engineering on image data.
Learning Objectives
Experimentation
Implementation
Reading
Image Kernels Explained Visually by Victor Powell
Notes
Vectorization and Feature Engineering
Live notes (Google Colab)
Warmup
Kernel Convolution
Assignments
Blog Post: Sparse Kernel Machines
Thu
Apr. 11
Kernel Methods
We introduce kernel methods as an alternative approach to the problem of fitting nonlinear models to data.
Learning Objectives
Theory
Reading
Classification and K-Nearest Neighbours by Hiroshi Shimodaira for a course at the University of Edinburgh
Notes
Kernel Methods
Live notes (Google Colab)
Warmup
Introducing Kernel Regression

Week 10

Tue
Apr. 16
The Problem of Features and Deep Learning
We motivate deep learning as an approach to the problem of learning complex nonlinear features in data.
Learning Objectives
Theory
Implementation
Reading

Notes
The Problem of Features and Deep Learning
Live notes (Google Colab)
Warmup
Project Check In
Thu
Apr. 18
Modern Optimization
We briefly introduce two concepts in optimization that have enabled large-scale deep learning: stochastic first-order optimization techniques and automatic differentiation.
Learning Objectives
Theory
Implementation
Reading

Notes
Modern Optimization for Deep Learning
Live notes (Google Colab)
Warmup
Introducing Stochastic Gradient Descent
Assignments
Blog Post: The Adam Algorithm for Optimization

Week 11

Tue
Apr. 23
Deep Image Classification
We return to the image classification problem, using deep learning and large-scale optimization to optimize convolutional kernels as part of the training process.
Learning Objectives
Theory
Implementation
Reading
Convolutional Neural Networks from MIT's course 6.036: Introduction to Machine Learning
Notes
Deep Image Classification
Live notes (Google Colab)
Warmup
Project Check In
Thu
Apr. 25
Deeper Image Classification
We continue working on an extended practical case study of deep learning for image classification.
Learning Objectives
Theory
Implementation
Reading
Convolutional Neural Networks from MIT's course 6.036: Introduction to Machine Learning
Notes
Deep Image Classification
Live notes (Google Colab)
Warmup
What Needs to Be Learned?
No matching items

Finals Period

During the reading and final exam period, you’ll meet with me 1-1 for about 15 minutes. The purpose of this meeting is to help us both reflect on your time in the course and agree on a final grade.



© Phil Chodrow, 2024

References

Barocas, Solon, Moritz Hardt, and Arvind Narayanan. 2023. Fairness and Machine Learning: Limitations and Opportunities. Cambridge, Massachusetts: The MIT Press.
Vanderplas, Jacob T. 2016. Python Data Science Handbook: Essential Tools for Working with Data. First edition. Sebastopol, CA: O’Reilly Media, Inc.