Jay Baxter
Hi, I'm Jay Baxter. I lead Community Notes Machine Learning at X as a Sr. Staff ML Engineer.
I formerly was a lead on Twitter's Cortex Applied Machine Learning Research team, where we pushed the state of the art of very-real-time large-scale recommender systems.
I studied computer science and AI at MIT (S.B. '13, M.Eng. '14) where I built BayesDB. I also did software engineering and machine learning internships at Palantir, Google, Diffeo, and Numenta.
Find me on X: @_jaybaxter_
Projects
Here's a selection of some of the projects I've worked on including my research, jobs, and independent work: some huge, and some teeny tiny.
Community Notes
Push Notification Volume Personalization
Embedding-based Candidate Generation
Recommendations for New Users
Built the first models to rank account recommendations for new users (previously, heuristics were used to re-rank what the candidate generators produced), resulting in follow and DAU wins in A/B tests.
User States and Causal Retention Drivers
Produced a new user state model, an hidden markov model, which is still used company-wide 7 years later in key metrics/OKRs, to split all A/B test results, and as an impactful feature in many production models.
The original use of the model was to determine precisely when users switched behavior modes, which enabled analysis to try to determine causal retention drivers using observation techniques such as propensity score matching and natural experiments.
BayesDB
BayesDB, an open source Bayesian database I built for
my M.Eng. thesis, lets users query the
probable implications of their data in the same way a SQL database lets them query the data itself. Users can detect predictive relationships between variables, infer missing values, simulate probable rows, and identify statistically similar database entries using inferences that are based in part on
CrossCat, a nonparametric Bayesian model.
Paper.
Knowledge Graph Entity Resolution
As a machine learning intern at
Diffeo in 2013, I worked on hierarchical probabilistic models to perform cross-document entity coreference (entity resolution), which involved experimenting with structure priors and different sets of MCMC moves to perform structure learning and semi-supervised parameter learning techniques.
Viewshed Analysis
During my
2012 internship at Palantir,
I worked on the map application and geospatial data analysis. My team's hack week project, adding viewshed analysis (determining line-of-sight using elevation data)
to Palantir, won an award and for some reason its patent is cited 3x more than my next most cited paper....
Natural Language Question Answering System
In 2012, I spent MIT's January term, IAP, working on the
START
web-based natural language question answering system with CSAIL's
InfoLab Group.
Google Book Alerts
During my summer internship at Google in 2011, I integrated book search with Google Alerts
to make
Google Book Alerts,
which notifies users when books that match their query (by author, title, subject, or fulltext) become available.
Neocortex-inspired Learning Algorithms
As a summer intern at Numenta in 2010, I worked on Numenta's development platform that provides tools to
create, train, and test a
hierarchical temporal memory (HTM) system.
Smartphone Sensor Data Analysis
In 2009 and 2010, with the
Human Dynamics Group at the MIT Media Lab, I worked on the
FunF project studying in-person social networks (e.g. predicting disease and opinion spread) using smartphone data from opted-in study participants who lived in the same dorm and filled out daily surveys.
I developed the backend that processed and analyzed sensor data that was uploaded from the phones in real time.
Recommender System for Reddit
I implemented and tested various flavors of nearest-neighbor methods, singular value decompositions,
and probabilistic matrix factorizations to recommend subreddits using collaborative filtering (
writeup).
Food Recognition
I built a computer vision system that recognizes food from the
Pittsburgh Fast Food Image dataset with over 80% accuracy by
first probabilistically labelling ingredients with a semantic texton forest,
then using an SVM with pairwise local ingredient-level features and a histogram intersection kernel
(
paper). (This was >10 years ago; you could do way better now with off-the-shelf CNNs!)