I build multimodal ML systems that run at internet scale.
Founding member of IAS's $150M trust & safety classification suite. Detecting low-prevalence unsafe content across 3T impressions/day. Patented.
Real systems running in production at scale.
Built the core Trust & Safety classification product from zero. Detects hard-to-catch, low-prevalence unsafe content across TikTok, YouTube, Meta, and Twitter. 3 trillion impressions/day at 34% YoY growth. Deployed across 23 countries. Patented.
Built an internal Python package for end-to-end multimodal labeling. Vector search finds relevant content, LLM+VLM classifies it, human A/B testing validates, and DSPy optimizes prompts in a continuous RLAIF loop. Cut labeling cost by 99.5%.
Led PEFT/LoRA adoption for deepfake detection models, slashing compute by ~80% and boosting experiment throughput 3x. No performance degradation.
Directed end-to-end development using multimodal and DistilBERT models extended for longer token classification. Catches low-prevalence misinformation, content that evades simple filters, across 22M+ videos per day in production.
Built a pseudo-labeling pipeline using SigLIP and Vicuna that cut ML development costs by 90%+ across 49 video categories. Separately led synthetic data generation with GPT-Neo that boosted Brand Safety precision by 49%.
Built custom translation models with OpenNMT for 42 language pairs. Translates 75 trillion characters annually while reducing costs by 67% for the core contextual classification pipeline.
Built a causal inference pipeline ingesting 8 billion ad impressions per day to measure real cause-and-effect on conversion rates for ad campaigns using Bayesian methods.
Built predictive models and churn strategies that identified customer behavior patterns and converted 18% of one-time buyers into repeat customers.
We classify extremely low-prevalence, nuanced categories of unsafe content in social media video. Previously, building a new classifier meant weeks of work: sampling the right balance of rare data from a large amount of production data which is actually just safe content, designing experiments to extract signal, then paying for expensive, time-consuming human annotations on subjective user-generated video. Each iteration was slow, costly, and hard to scale.
An internal Python package that closes the entire loop. It finds relevant content in production via vector search, accepts custom prompts (versioned in GitHub), and processes any modality. For video, it extracts keyframes, deduplicates frames, then sends everything to a cost-optimized LLM + VLM system for multimodal fusion. The output includes multilabel classifications, topic/subtopic stratification for comprehensive dataset sampling, and artifacts ready for model training or human review.
Human reviewers validate AI labels through a custom A/B testing UI. Their responses feed back into the package, where DSPy and GEPA optimize the original prompts automatically. This is essentially an RLAIF pipeline. The LLM-as-judge acts as a reward model, with engineered reward functions calibrated against human baselines to prevent reward hacking. The loop runs continuously: label, validate, optimize, retrain.
Not a laundry list. Every skill tied to something I shipped.
Open to conversations about ML, data science leadership, and impactful opportunities.