Projects

Selected work across AI infrastructure, inference platforms, and MLOps tooling. Most are open source — explore the code on GitHub.

LLM Inference Gateway

A high-throughput inference gateway for serving open-weight LLMs with token streaming, request batching, and per-tenant rate limiting. Built to run on Kubernetes with autoscaling backed by GPU node pools.

Python
vLLM
FastAPI
Kubernetes
Triton

MLOps Platform Blueprint

Reference architecture and Terraform modules for an end-to-end MLOps platform on AWS — feature store, model registry, CI/CD for models, and automated rollout with shadow deployments.

Terraform
AWS
SageMaker
MLflow
GitHub Actions

GPU Cost Observability

A monitoring stack that attributes GPU utilization and cloud spend to individual models and teams, with Grafana dashboards and Prometheus exporters for inference workloads.

Go
Prometheus
Grafana
DCGM
Kubernetes

Vision Pipeline Toolkit

A modular toolkit for building real-time computer vision pipelines with hardware-accelerated decoding, model ensembling, and ONNX/TensorRT export for edge deployment.

Python
PyTorch
TensorRT
ONNX
OpenCV

Embeddings Search Service

A semantic search microservice with pluggable vector stores, hybrid retrieval, and a thin caching layer to keep tail latencies predictable under load.

Python
Qdrant
FastAPI
Redis
Docker

Model Deployment CLI

A developer-friendly CLI that packages, validates, and rolls out models to staging and production with reproducible builds and automatic canary checks.

TypeScript
Node.js
Docker
Helm