Available · July 2026

Benjamin
Holderbein

I build retrieval and LLM systems that actually ship. Currently at Asurion replacing a 3-year-old semantic search with a modular RAG pipeline serving 10,000+ daily queries, while finishing my M.S. in Data Science & AI at USF.

Benjamin Holderbein
01About

AI engineer with a research bent and a builder’s instinct.

I'm an AI Engineer based in San Francisco. My day-to-day is RAG plumbing: ingestion, chunking, embeddings, retrieval, evals — making sure the thing that comes out the other end is actually better than what was there before.

Right now I'm at Asurion, where I built a modular RAG system that replaces a three-year-old semantic search serving 10,000+ daily customer queries. Before that I shipped a React Native app at StudyStudio.ai, an NLP matching system at USF's Data Institute that's still in production, and time-series pipelines for residential energy research at Frontier Energy.

Outside of work I ran USF's 100+ member rock climbing club for two years. I like things that are precise, durable, and slightly understated.

02NowMay 2026
  • 01Building the next iteration of Asurion's retrieval stack — eval harness over BGE / Qwen3 / Nemotron / Gemini.
  • 02Wrapping up my M.S. in Data Science & AI at USF (June 2026).
  • 03Reading: Sebastian Raschka — Build a Large Language Model.
03Selected work8 projects
2026

Ecommerce Ticket Triage Assistant

Multi-service GCP system that triages support tickets by predicted priority with a fine-tuned DistilBERT classifier. FastAPI backend on Cloud Run with Firebase auth and per-user rate limits, Cloud SQL Postgres scoped by uid, Vertex AI for training and the model registry.

GCPCloud RunFastAPIDistilBERTVertex AI
2025

LLM From Scratch

From-scratch GPT-2 small in PyTorch (~124M params): BPE tokenizer with a sliding-window dataloader, multi-head causal self-attention, the full transformer stack, and a pretraining loop that samples generations mid-training. Each file maps to a stage of Sebastian Raschka's Build a Large Language Model.

PyTorchTransformerstiktokenPython
2026

Claude Orchestrate

A Claude Code skill and matching sub-agent that turn the main agent into a coordinator. Decomposes a plan into waves, spawns implementer sub-agents in isolated git worktrees, runs them in parallel where safe, and stays out of the way between checkpoints.

Claude CodeAgentsGit Worktrees
2024

Cellular Microscopy Counting

U-Net CNN for image segmentation that automates cell counting for biomedical research. Mean error of 1.4 cells, beating the 3-cell target by 2x and replacing manual counting.

PyTorchU-NetCV
04Experience
2025 — Now

AI Engineer, Intern

Asurion · San Francisco
Built a modular multi-tenant RAG ingestion pipeline replacing a 3-year-old semantic search serving 10,000+ daily customer queries — cut deployment from full-code to zero-code config. Benchmarked four embedding models (BGE, Qwen3, Nemotron, Gemini) for production selection. On the voice side, led the opus-mt vs Gemini eval for IVR translation — opus-mt shipped at 7.3× lower latency (p50 138ms vs 1011ms) and an 88% good-rate on an 800-row EN-ES test set. Ran a 10K-message corpus experiment that surfaced an email-dictation failure in 76% of voice-safety calls and shipped the fix.
2025

Software Engineer, Intern

StudyStudio.ai · San Francisco
Developed and deployed a cross-platform mobile app using React Native, TypeScript, and Clerk for iOS and Android. Delivered a polished prototype mirroring core web functionality and ready for store deployment.
2024

AI Engineer, Intern

USF Data Institute · San Francisco
Designed and shipped an NLP algorithm that automates internship/student matching at USF — clustering qualifications and matching them against employer requirements with LLMs. Built a preprocessing pipeline that cut runtime and API cost by 40%. Still in production.
2022 & 2023

Data Scientist, Intern

Frontier Energy · Davis, CA
Two summer stints analyzing 40+ data channels monitoring residential building performance across California pilot sites. Built an Azure Data Explorer dashboard and a Python pipeline that cleaned, validated, and resampled time-series data from 1-second to 5-minute intervals.
05Education
2025 — 2026

M.S. in Data Science & AI

University of San Francisco · San Francisco
Expected June 2026
2021 — 2025

B.S. in Data Science

University of San Francisco · San Francisco
3.96 GPA
06ContactSay hello