S022 - Introduction to Statistical Computing and Data Science in Education (Teaching Fellow)




This course will focus on applying modern data science and machine learning tools to real-world datasets in education. We first teach tools for exploring new datasets in order to identify new patterns, make predictions from flexible models, and visualize data in ways that communicate complex associations.  We will also expand on the core conceptual building blocks taught in S-40 to provide more flexible approaches to estimation and inference, with a particular focus on the bootstrap.  Throughout, we will learn statistical computing in R, an increasingly important skill in the modern, data-driven era.  By the end of the course, students will be able to independently analyze data of various types, carrying a project from getting the data ready for analysis to creating technical reports of one’s findings. Topics covered will likely include classification and regression trees and random forests, regularized regression methods, cross-validation, data wrangling, model selection, bootstrapping, simulation, data visualization, and, possibly, analysis of text. While we assume foundational statistical knowledge, we do not assume any initial familiarity with statistical computing or the R language. Students who are interested in learning R before the course starts should contact the instructor.