Jay Baxter

About Me

Hi, I'm Jay Baxter, a Data Scientist on the Media Science team at Twitter Boston. Previously, I was an MIT Computer Science student (S.B. '13, M.Eng. '14). My master's thesis was BayesDB, a Bayesian database table that allows users to query the probable implications of their data as easy as SQL lets users query raw data.

There's not much here except a glorified resume, but feel free to check out some of the projects I've worked on and a list of the classes I've taken.

Email: jaynbaxter [at] gmail [dot] com. Twitter


Here's a selection of some of the bigger projects I've worked on (including my research, jobs, and independent work).


BayesDB, a Bayesian database table I built for my M.Eng. thesis, lets users query the probable implications of their data in the same way a SQL database lets them query the data itself. Users can detect predictive relationships between variables, infer missing values, simulate probable rows, and identify statistically similar database entries using inferences that are based in part on CrossCat, a nonparametric Bayesian model.

Diffeo Knowledge Graph

As a machine learning intern at Diffeo, I worked on hierarchical probabilistic models to perform cross-document entity coreference (entity resolution), which involved experimenting with structure priors and different sets of MCMC moves to perform structure learning and semi-supervised parameter learning techniques.

"Nutrition Facts" for Wikipedia Article Content

How much of the O.J. Simpson article is about football, and how much is about his trial? How does the topic breakdown vary over the versions of Wikipedia in different languages? Shahar Ronen and I used latent dirichlet allocation and other techniques to answer these questions.

Recommender System for Reddit

I implemented and tested various flavors of nearest-neighbor methods, singular value decompositions, and probabilistic matrix factorizations to recommend subreddits using collaborative filtering (paper).

Food Recognition

I built a computer vision system that recognizes food from the Pittsburgh Fast Food Image dataset with over 80% accuracy by first probabilistically labelling ingredients with a semantic texton forest, then using an SVM with pairwise local ingredient-level features and a histogram intersection kernel (paper).

Viewshed Analysis

During my internship at Palantir, I worked on the map application and geospatial data analysis. My hack week project with Vineet Gopal and David Skiff, adding viewshed analysis (determining line-of-sight using elevation data) to Palantir, was one of two hack week projects that were selected for presentation at Govcon 2012.

Natural Language Question Answering System

In 2012, I spent MIT's January term, IAP, working on the START web-based natural language question answering system with CSAIL's InfoLab Group.

Google Book Alerts

During my summer internship at Google in 2011, I integrated book search with Google Alerts to make Google Book Alerts, which notifies users when books that match their query (by author, title, subject, or fulltext) become available.

Neocortex-inspired Learning Algorithms

As a summer intern at Numenta in 2010, I worked on Numenta's development platform that provides tools to create, train, and test a hierarchical temporal memory (HTM) system.

Smartphone Sensor Data Analysis

In 2009 and 2010, I worked on the FunF project studying social networks using sensors on Android smartphones with the Human Dynamics Group at the MIT Media Lab. I developed the backend that processed and analyzed sensor data that was uploaded from the phones in real time.


Here's a history of the coursework I did while I was at MIT. Graduate classes are marked with (G), and advanced undergraduate courses are marked with (A).

Machine Learning, Artificial Intelligence, and Statistics

  • Harvard CS281 Advanced Machine Learning (G) [Listener]
  • 6.867 Machine Learning (G)
  • 6.869 Advances in Computer Vision (G)
  • 6.437 Inference and Information (G)
  • Harvard Stat221 Statistical Computing and Visualization (G)
  • MAS.S60 Practical Natural Language Processing
  • 6.370 Battlecode AI Programming Competition
  • 6.034 Artificial Intelligence

Computer Systems

  • 6.885 Advanced Topics in Data Processing (G) [Listener]
  • 6.035 Programming Language Engineering (A)
  • 6.814 Database Systems (A)
  • 6.033 Computer Systems Engineering
  • 6.004 Computation Structures
  • 6.02 Intro to EECS II: Digital Communication Systems
  • 6.01 Introduction to EECS I: software, control, circuits, planning

Software Engineering

  • 6.170 Software Studio
  • 6.005 Software Construction
  • 21W.789 Communicating with Mobile Technology

Math and Algorithms

  • 6.046 Design and Analysis of Algorithms
  • 6.006 Introduction to Algorithms
  • 6.041 Probabilistic Systems Analysis
  • 18.06 Linear Algebra
  • 18.03 Differential Equations
  • 18.02 Multivariable Calculus
  • 6.042 Mathematics for Computer Science


  • 14.12 Game Theory
  • 14.01 Principles of Microeconomics
  • 14.02 Principles of Macroeconomics

Natural Sciences

  • 8.022 Physics II: Electricity and Magnetism
  • 7.013 Introductory Biology
  • 3.091 Solid State Chemistry


  • 24.00 Problems of Philosophy
  • 9.00 Introduction to Psychology
  • MAS.A12 Games and Puzzles
  • 21A.00 Introduction to Anthropology