Michał Wojdylak

AI Infrastructure Engineer

Building production AI systems, LLM infrastructure, inference platforms and cloud-native ML solutions.

Latest writing

June 10, 20262 min read

Deploying LLMs in Production: An Infrastructure Playbook

A practical walkthrough of the infrastructure decisions behind serving large language models reliably — from GPU selection to batching, autoscaling, and observability.

LLMInfrastructureInference

May 22, 20261 min read

Cutting GPU Inference Costs Without Hurting Latency

Quantization, batching, and right-sizing strategies that reduced our inference bill by 60% while keeping p99 latency flat.

InferenceOptimizationCost

April 30, 20262 min read

Building an MLOps Platform on Kubernetes from Scratch

The core building blocks of a production MLOps platform — model registry, CI/CD for models, and safe rollouts with canaries and shadow deployments.

MLOpsKubernetesInfrastructure

Featured projects

All projects →

LLM Inference Gateway

A high-throughput inference gateway for serving open-weight LLMs with token streaming, request batching, and per-tenant rate limiting. Built to run on Kubernetes with autoscaling backed by GPU node pools.

Python
vLLM
FastAPI
Kubernetes
Triton

MLOps Platform Blueprint

Reference architecture and Terraform modules for an end-to-end MLOps platform on AWS — feature store, model registry, CI/CD for models, and automated rollout with shadow deployments.

Terraform
AWS
SageMaker
MLflow
GitHub Actions

GPU Cost Observability

A monitoring stack that attributes GPU utilization and cloud spend to individual models and teams, with Grafana dashboards and Prometheus exporters for inference workloads.

Go
Prometheus
Grafana
DCGM
Kubernetes