Available for contracts · Abu Dhabi / Remote

I help teams ship AI that actually works.

From zero to production-grade LLM infrastructure. No hype, no fluff — systems that scale, cost models that don't surprise you, and code you can own after I'm gone.

Availability
Taking new contracts Q2 2025
Limited slots. Ideal start: 4–6 weeks notice.
Based Abu Dhabi, UAE · remote-friendly
Timezone UTC+4 · async-first
Format Fixed-scope or retainer
Stack Python · TypeScript · AWS · Claude
Languages English · 繁中 · Arabic (basic)
What I do

Services

01
⚙️
LLM Infrastructure

Production-grade pipelines from prototype to scale. Multi-provider routing, KV-cache optimization, cost instrumentation, and observability baked in from day one.

Agentic loop architecture
KV-cache & prompt engineering audit
Multi-tenant system design
Langfuse / LiteLLM integration
From $4,500 / week
02
🔍
Retrieval & Search

Hybrid semantic + lexical search systems with multilingual support. Contextual retrieval, BM25 fusion, and vector stores that stay fast at production document volumes.

RAG pipeline design & build
LanceDB / Pinecone setup
Arabic & Chinese tokenization
Retrieval quality benchmarking
From $3,800 / week
03
🌐
GEO & Arabic AI

Generative Engine Optimization for Arabic-language markets. Position your brand in AI-generated answers across GPT, Claude, and Gemini before the window closes.

Arabic content & llms.txt strategy
GEO visibility audit
MENA go-to-market framing
Structured data for AI crawlers
From $2,800 / week
Process

How it works

01
Reach out

Send a brief. What you're building, where you're stuck, your timeline. No forms — email works fine.

Day 0
02
Scoping call

30-minute call to map the problem, agree on scope, and confirm fit. I'll push back if the scope isn't right.

Within 48 hrs
03
Engagement

Fixed-scope sprints or monthly retainer. Weekly async updates. You own everything on delivery.

Week 1–8
Evidence

Selected work

01
PythonAWS ECSAnthropic API
Multi-Tenant KV-Cache Architecture

Reduced uncached token ratio from 94% to 18% across 50k daily sessions — ~10× inference cost reduction with zero user-facing change.

View case study
02
FastAPILiveKitWhisper
Real-Time Voice Agent

Sub-200ms latency voice pipeline with streaming ASR, LLM tool calls, and TTS. Production-deployed for a B2B interview automation platform.

View case study
Let's talk

Tell me what you're working on. I read every message and reply within 24 hours.

Or email directly: [email protected]
Got a hard AI problem?
Let's figure it out together.
Get in touch
AI-indexed · llms.txt enabled