Java / JVM · Distributed Systems · Kubernetes · Elasticsearch

Distributed systems
don't fail loudly.

They degrade quietly, accumulate debt silently, and block the next phase of work long before anyone names the blocker. I've spent 17 years finding those blockers and removing them.

The work takes three forms

1 · Build the platform that makes the next thing possible

RCSB Protein Data Bank (5B requests/yr, 14.5M users): integrated Elastic APM, surfaced sequential Elasticsearch execution no application metric had caught, cut P99 latency 33% with zero caller-code changes.

[video: SCaLE 20x — RCSB → K8s] [slides: Observability with Elastic APM]

DESY: ISPyB refactor that let the beamline team bring one of the facility's most commercially significant beamlines live.

OpenStack → Kubernetes migration: $200K direct savings.

[video: WeAreDevelopers World Congress 2024]

Now building: a fully automated trading-strategy platform — concept to live production with no manual handoffs.

[case study]

2 · Generate the evidence that prevents the wrong thing from being built

Code-level assessment of a distributed storage engine: modeled 5×–10× growth, located where metadata coordination, request fan-out, and object-storage latency combine into production constraints — before any appeared.

72-scenario load study of SciCat's metadata API; showed 20+ European and US facilities where their scaling ceiling sits.

[paper: NOBUGS 2024] [DOI: 10.5281/zenodo.15056189]

3 · Extract the method and leave it behind

Tango REST specification → industry standard across 100+ facilities.

Published, reusable benchmarking framework and load-testing methodology.

[Zenodo records] [GitHub: waltz]

Still shipping

Production Java is where I'm sharpest — including low-level Unsafe-based work — and I intend to keep shipping it.

Java/JVM · Node/TypeScript · Rust · C++ · ELK · Kubernetes

Python for tooling and analysis, not production.

Selected talks

Selected talks

  1. Reactive Programming for Tango-Controls 40th Tango Users Meeting · 2026

    Composing async event streams from heterogeneous scientific instruments with shared upstream multicast — operator UIs stay live without polling.

  2. How I Saved $200K with 0 Code in K8s WeAreDevelopers World Congress · 2024

    Infrastructure cost reduction as a platform decision — VPA + rightsizing found savings the migration itself had missed.

  3. Performance Benchmarking of Node.js, Rust, Python and Go NOBUGS 2024

    Methodology for eliminating toolchain bias from benchmark results across four runtimes at 20+ facilities.

  4. Migrating RCSB.org to Kubernetes SCaLE 20x · 2023

    How a migration becomes an observability project when you instrument first.

Full list of talks →