About Me
Hi, I'm Jay Baxter, a Data Scientist on the Media Science team at Twitter Boston. Previously, I was an MIT Computer Science student (S.B. '13, M.Eng. '14). My master's thesis was BayesDB, a Bayesian database table that allows users to query the probable implications of their data as easy as SQL lets users query raw data.
There's not much here except a glorified resume, but feel free to check out some of the projects I've worked on and a list of the classes I've taken.
Email: jaynbaxter [at] gmail [dot] com.
Twitter
Projects
Here's a selection of some of the bigger projects I've worked on (including my research, jobs, and independent work).
BayesDB
BayesDB, a Bayesian database table I built for
my M.Eng. thesis, lets users query the probable implications of their data in the same way a SQL database lets them query the data itself. Users can detect predictive relationships between variables, infer missing values, simulate probable rows, and identify statistically similar database entries using inferences that are based in part on
CrossCat, a nonparametric Bayesian model.
Diffeo Knowledge Graph
As a machine learning intern at
Diffeo, I worked on hierarchical probabilistic models to perform cross-document entity coreference (entity resolution), which involved experimenting with structure priors and different sets of MCMC moves to perform structure learning and semi-supervised parameter learning techniques.
"Nutrition Facts" for Wikipedia Article Content
How much of the O.J. Simpson article is about football, and how much is about his trial? How does
the topic breakdown vary over the versions of Wikipedia in different languages?
Shahar Ronen and I
used latent dirichlet allocation
and other techniques to answer these questions.
Recommender System for Reddit
I implemented and tested various flavors of nearest-neighbor methods, singular value decompositions,
and probabilistic matrix factorizations to recommend subreddits using collaborative filtering (
paper).
Food Recognition
I built a computer vision system that recognizes food from the
Pittsburgh Fast Food Image dataset with over 80% accuracy by
first probabilistically labelling ingredients with a semantic texton forest,
then using an SVM with pairwise local ingredient-level features and a histogram intersection kernel
(
paper).
Viewshed Analysis
During my
internship at Palantir,
I worked on the map application and geospatial data analysis. My hack week project with
Vineet Gopal and David Skiff, adding viewshed analysis (determining line-of-sight using elevation data)
to Palantir, was one of two hack week projects that were selected for presentation at
Govcon 2012.
Natural Language Question Answering System
In 2012, I spent MIT's January term, IAP, working on the
START
web-based natural language question answering system with CSAIL's
InfoLab Group.
Google Book Alerts
During my summer internship at Google in 2011, I integrated book search with Google Alerts
to make
Google Book Alerts,
which notifies users when books that match their query (by author, title, subject, or fulltext) become available.
Neocortex-inspired Learning Algorithms
As a summer intern at Numenta in 2010, I worked on Numenta's development platform that provides tools to
create, train, and test a
hierarchical temporal memory (HTM) system.
Smartphone Sensor Data Analysis
In 2009 and 2010, I worked on the
FunF project studying social networks using sensors on Android smartphones
with the
Human Dynamics Group at the MIT Media Lab.
I developed the backend that processed and analyzed sensor data that was uploaded from the phones in real time.