Things
I've Built.
LLM infrastructure, agent systems, and AI-native products. Production-grade, mostly battle-tested.
50k sessions/day. Tier-scoped static system prompts, dynamic context via synthetic tool results — solved the cache fragmentation problem at scale.
Tier-scoped static system prompts serving 50k sessions/day. Moved dynamic user context to synthetic tool results, reducing uncached token ratio from 94% to 18%.
Provider-agnostic agent framework with opinionated tool-calling loop across Claude, GPT-4o, and Gemini. Parallel tool execution, retry budgets, full Langfuse trace observability.
Batched chunk enhancement using Anthropic's Contextual Retrieval method. ~49% retrieval failure reduction, ~97% processing cost reduction via Batch API vs real-time.
Curated registry of composable LLM agent skills — each a self-contained tool with schema, implementation, and test suite. Pre-dates Claude Code's native plugin system.
Sub-200ms latency voice agent on LiveKit. Streaming ASR → LLM inference with tool calls → TTS in a single pipeline. Evaluated FastRTC vs LiveKit for production deployment.
Real-time signal monitoring dashboard for algo trading strategies. FastAPI + TimescaleDB for time-series, client-side Plotly.js for live charts. Railway, zero-downtime deploys.
Hybrid BM25 + embedding search with RRF fusion across a multilingual product catalogue. Chinese jieba and Arabic CAMeL Tools for tokenization. LanceDB for vector storage.