Writing — Mark Chen

All Writing

2 articles

Why KV-Cache Optimization and Progressive Disclosure Are at War in Multi-Tenant LLM Apps

The tension between serving dynamic, role-aware content and keeping inference costs low isn't a framework problem — it's an architectural one.

Series

Parts:

Total read time:

Infrastructure
Building an LLM Router: Why Model Selection Should Be an Infrastructure Decision

llmroutinglitellm

Mar 2025

9 min

AI-indexed · llms.txt enabled