01 / work

Things
I've Built.

LLM infrastructure, agent systems, and AI-native products. Production-grade, mostly battle-tested.

Projects

50k

Daily sessions

10×

Cost reduction

Featured project

Live · Production

01 / LLM Infrastructure

Multi-Tenant Chatbot with KV-Cache Architecture

50k sessions/day. Tier-scoped static system prompts, dynamic context via synthetic tool results — solved the cache fragmentation problem at scale.

18%

Uncached tokens

94%

Cache hit gain

10×

Cost reduction

Python AWS ECS PostgreSQL Langfuse Anthropic API

View case study

LLM Infrastructure

3 projects

Production

Multi-Tenant Chatbot — KV-Cache Architecture

Tier-scoped static system prompts serving 50k sessions/day. Moved dynamic user context to synthetic tool results, reducing uncached token ratio from 94% to 18%.

Python AWS ECS PostgreSQL Langfuse

Case study

Production

Multi-LLM Agentic Loop

Provider-agnostic agent framework with opinionated tool-calling loop across Claude, GPT-4o, and Gemini. Parallel tool execution, retry budgets, full Langfuse trace observability.

Python LiteLLM Langfuse

Case study

Production

Document Processing — Contextual Retrieval

Batched chunk enhancement using Anthropic's Contextual Retrieval method. ~49% retrieval failure reduction, ~97% processing cost reduction via Batch API vs real-time.

Python Anthropic API LanceDB BM25

Case study

Agent Systems

2 projects

In progress

AgentBox — AI Agent Skills Marketplace

Curated registry of composable LLM agent skills — each a self-contained tool with schema, implementation, and test suite. Pre-dates Claude Code's native plugin system.

Claude Code MCP Node.js

Case study

Production

Voice Agent — Real-Time LLM Interview System

Sub-200ms latency voice agent on LiveKit. Streaming ASR → LLM inference with tool calls → TTS in a single pipeline. Evaluated FastRTC vs LiveKit for production deployment.

FastAPI LiveKit Whisper

Case study

Data & Monitoring

2 projects

Production

Trading Strategy Monitor

Real-time signal monitoring dashboard for algo trading strategies. FastAPI + TimescaleDB for time-series, client-side Plotly.js for live charts. Railway, zero-downtime deploys.

FastAPI TimescaleDB Railway Plotly.js

Case study

Production

Product Visual Search

Hybrid BM25 + embedding search with RRF fusion across a multilingual product catalogue. Chinese jieba and Arabic CAMeL Tools for tokenization. LanceDB for vector storage.

Ruby on Rails LanceDB BM25 Firecrawl

Case study

ThingsI've Built.

Things
I've Built.