Available for contracts · Abu Dhabi / Remote

I help teams ship AI that actually works.

From zero to production-grade LLM infrastructure. No hype, no fluff — systems that scale, cost models that don't surprise you, and code you can own after I'm gone.

Get in touch See past work

Availability

Taking new contracts Q2 2025

Limited slots. Ideal start: 4–6 weeks notice.

Based Abu Dhabi, UAE · remote-friendly

Timezone UTC+4 · async-first

Format Fixed-scope or retainer

Stack Python · TypeScript · AWS · Claude

Languages English · 繁中 · Arabic (basic)

What I do

Services

⚙️

LLM Infrastructure

Production-grade pipelines from prototype to scale. Multi-provider routing, KV-cache optimization, cost instrumentation, and observability baked in from day one.

Agentic loop architecture

KV-cache & prompt engineering audit

Multi-tenant system design

Langfuse / LiteLLM integration

From $4,500 / week

🔍

Retrieval & Search

Hybrid semantic + lexical search systems with multilingual support. Contextual retrieval, BM25 fusion, and vector stores that stay fast at production document volumes.

RAG pipeline design & build

LanceDB / Pinecone setup

Arabic & Chinese tokenization

Retrieval quality benchmarking

From $3,800 / week

🌐

GEO & Arabic AI

Generative Engine Optimization for Arabic-language markets. Position your brand in AI-generated answers across GPT, Claude, and Gemini before the window closes.

Arabic content & llms.txt strategy

GEO visibility audit

MENA go-to-market framing

Structured data for AI crawlers

From $2,800 / week

Process

How it works

Reach out

Send a brief. What you're building, where you're stuck, your timeline. No forms — email works fine.

Day 0

Scoping call

30-minute call to map the problem, agree on scope, and confirm fit. I'll push back if the scope isn't right.

Within 48 hrs

Engagement

Fixed-scope sprints or monthly retainer. Weekly async updates. You own everything on delivery.

Week 1–8

Evidence

Selected work

PythonAWS ECSAnthropic API

Multi-Tenant KV-Cache Architecture

Reduced uncached token ratio from 94% to 18% across 50k daily sessions — ~10× inference cost reduction with zero user-facing change.

View case study

FastAPILiveKitWhisper

Real-Time Voice Agent

Sub-200ms latency voice pipeline with streaming ASR, LLM tool calls, and TTS. Production-deployed for a B2B interview automation platform.

View case study

Let's talk

Tell me what you're working on. I read every message and reply within 24 hours.

Your name

What are you building?

Or email directly: [email protected]

Got a hard AI problem?

Let's figure it out together.

Get in touch