Jay Baxter

Hi, I'm Jay Baxter. I lead Community Notes Machine Learning at X as a Sr. Staff ML Engineer.

I formerly was a lead on Twitter's Cortex Applied Machine Learning Research team, where we pushed the state of the art of very-real-time large-scale recommender systems.

I studied computer science and AI at MIT (S.B. '13, M.Eng. '14) where I built BayesDB. I also did software engineering and machine learning internships at Palantir, Google, Diffeo, and Numenta.

Find me on X: @_jaybaxter_


Here's a selection of some of the projects I've worked on including my research, jobs, and independent work: some huge, and some teeny tiny.

Community Notes

Community Notes (formerly Birdwatch) is a crowdsourced approach to add context to potentially misleading posts. I founded and lead the ML team which builds an open source algorithm to identify notes that people from a wide range of viewpoints find helpful. We encourage external researchers to download and analyze the fully-public data and algorithm, as Vitalik Buterin did here.

Push Notification Volume Personalization

I led development of a new notification opt-out prediction model that we used to personalize notification volume (blog post of our work, published by my collaborators after I had moved on to Birdwatch), which drove a large amount of mDAU without significant impact on opt-outs via a utility maximization approach.

Embedding-based Candidate Generation

I pioneered using two-tower neural networks to generate candidates in large-scale recommender systems at Twitter (e.g. account recommendations, notifications, tweets from people you don't follow), and wrote this paper documenting how we tackled dataset bias, the biggest hurdle to getting these models to work well: Lessons Learned Addressing Dataset Bias in Model-Based Candidate Generation at Twitter

I also worked on centralized embeddings that could be used by multiple product teams' models, and described some basic techniques here: Fighting Redundancy and Model Decay with Embeddings.

Recommendations for New Users

Built the first models to rank account recommendations for new users (previously, heuristics were used to re-rank what the candidate generators produced), resulting in follow and DAU wins in A/B tests.

User States and Causal Retention Drivers

Produced a new user state model, an hidden markov model, which is still used company-wide 7 years later in key metrics/OKRs, to split all A/B test results, and as an impactful feature in many production models.

The original use of the model was to determine precisely when users switched behavior modes, which enabled analysis to try to determine causal retention drivers using observation techniques such as propensity score matching and natural experiments.


BayesDB, an open source Bayesian database I built for my M.Eng. thesis, lets users query the probable implications of their data in the same way a SQL database lets them query the data itself. Users can detect predictive relationships between variables, infer missing values, simulate probable rows, and identify statistically similar database entries using inferences that are based in part on CrossCat, a nonparametric Bayesian model. Paper.

Knowledge Graph Entity Resolution

As a machine learning intern at Diffeo in 2013, I worked on hierarchical probabilistic models to perform cross-document entity coreference (entity resolution), which involved experimenting with structure priors and different sets of MCMC moves to perform structure learning and semi-supervised parameter learning techniques.

Viewshed Analysis

During my 2012 internship at Palantir, I worked on the map application and geospatial data analysis. My team's hack week project, adding viewshed analysis (determining line-of-sight using elevation data) to Palantir, won an award and for some reason its patent is cited 3x more than my next most cited paper....

Natural Language Question Answering System

In 2012, I spent MIT's January term, IAP, working on the START web-based natural language question answering system with CSAIL's InfoLab Group.

Google Book Alerts

During my summer internship at Google in 2011, I integrated book search with Google Alerts to make Google Book Alerts, which notifies users when books that match their query (by author, title, subject, or fulltext) become available.

Neocortex-inspired Learning Algorithms

As a summer intern at Numenta in 2010, I worked on Numenta's development platform that provides tools to create, train, and test a hierarchical temporal memory (HTM) system.

Smartphone Sensor Data Analysis

In 2009 and 2010, with the Human Dynamics Group at the MIT Media Lab, I worked on the FunF project studying in-person social networks (e.g. predicting disease and opinion spread) using smartphone data from opted-in study participants who lived in the same dorm and filled out daily surveys. I developed the backend that processed and analyzed sensor data that was uploaded from the phones in real time.

Recommender System for Reddit

I implemented and tested various flavors of nearest-neighbor methods, singular value decompositions, and probabilistic matrix factorizations to recommend subreddits using collaborative filtering (writeup).

Food Recognition

I built a computer vision system that recognizes food from the Pittsburgh Fast Food Image dataset with over 80% accuracy by first probabilistically labelling ingredients with a semantic texton forest, then using an SVM with pairwise local ingredient-level features and a histogram intersection kernel (paper). (This was >10 years ago; you could do way better now with off-the-shelf CNNs!)