01 / work

Things
I've Built.

LLM infrastructure, agent systems, and AI-native products. Production-grade, mostly battle-tested.

7+
Projects
50k
Daily sessions
10×
Cost reduction
Featured project
Live · Production
01 / LLM Infrastructure
Multi-Tenant Chatbot with KV-Cache Architecture

50k sessions/day. Tier-scoped static system prompts, dynamic context via synthetic tool results — solved the cache fragmentation problem at scale.

18%
Uncached tokens
94%
Cache hit gain
10×
Cost reduction
Python AWS ECS PostgreSQL Langfuse Anthropic API
View case study
LLM Infrastructure
3 projects
01
Production
Multi-Tenant Chatbot — KV-Cache Architecture

Tier-scoped static system prompts serving 50k sessions/day. Moved dynamic user context to synthetic tool results, reducing uncached token ratio from 94% to 18%.

Python AWS ECS PostgreSQL Langfuse
02
Production
Multi-LLM Agentic Loop

Provider-agnostic agent framework with opinionated tool-calling loop across Claude, GPT-4o, and Gemini. Parallel tool execution, retry budgets, full Langfuse trace observability.

Python LiteLLM Langfuse
03
Production
Document Processing — Contextual Retrieval

Batched chunk enhancement using Anthropic's Contextual Retrieval method. ~49% retrieval failure reduction, ~97% processing cost reduction via Batch API vs real-time.

Python Anthropic API LanceDB BM25
Agent Systems
2 projects
04
In progress
AgentBox — AI Agent Skills Marketplace

Curated registry of composable LLM agent skills — each a self-contained tool with schema, implementation, and test suite. Pre-dates Claude Code's native plugin system.

Claude Code MCP Node.js
05
Production
Voice Agent — Real-Time LLM Interview System

Sub-200ms latency voice agent on LiveKit. Streaming ASR → LLM inference with tool calls → TTS in a single pipeline. Evaluated FastRTC vs LiveKit for production deployment.

FastAPI LiveKit Whisper
Data & Monitoring
2 projects
06
Production
Trading Strategy Monitor

Real-time signal monitoring dashboard for algo trading strategies. FastAPI + TimescaleDB for time-series, client-side Plotly.js for live charts. Railway, zero-downtime deploys.

FastAPI TimescaleDB Railway Plotly.js
07
Production
Product Visual Search

Hybrid BM25 + embedding search with RRF fusion across a multilingual product catalogue. Chinese jieba and Arabic CAMeL Tools for tokenization. LanceDB for vector storage.

Ruby on Rails LanceDB BM25 Firecrawl
AI-indexed · llms.txt enabled