Case Study

From Idea to Live Bot in One Command

live on AWS eu-central-1 · testnet grace period, mainnet one env var away · ~$2.50/month total infra

/pipeline Step Functions Designer Selector Researcher Deployer LLM agent Deterministic Trigger / output

The problem

The lifecycle of a new trading strategy used to look like this: write the idea as Backtrader code, smoke-test it by hand, run an Optuna notebook, copy-paste the results CSV, analyse parameters in a REPL, write a spec document manually, translate the logic into the production bot's codebase, configure YAML, deploy, and hope. Every handoff was manual, undocumented at the boundary, and dependent on human memory between sessions. The gap between "idea" and "live bot" was measured in days.

This is the same failure mode I've spent years removing from scientific infrastructure: the system works, but every transition through it requires a person to carry context that nothing else holds.

What it does now

One Telegram command:

/pipeline "fast ema crosses slow ema" 5

produces a live deployed Lambda bot in a single unattended execution. Behind that command, AWS Step Functions orchestrates four agents on ECS Fargate:

  1. Designer (LLM) — turns the natural-language idea into a strategy spec, a pure-Python implementation, and smoke tests.
  2. Selector (deterministic) — quick single-pass backtest on every candidate instrument (~1 minute per symbol), ranked by composite score.
  3. Researcher (deterministic) — Optuna TPE optimisation on the top three candidates only, with equity charts as evidence.
  4. Deployer (LLM) — generates the production Lambda from the spec, runs contract tests, executes sam deploy, registers the bot in the fleet config, and reports to Telegram.

Fleet management afterwards: /fleet, /pause, /resume, /kill, /pnl.

The decisions worth writing down

Strategy-as-spec

The canonical artifact is strategy_spec.json: typed parameters with ranges, a state-machine definition, signal types, and a natural-language logic block. The Designer generates it, the Researcher updates it with optimised values, the Deployer reads it to generate production code. Research and production never share a framework; they share a contract. This is the same instinct as the Tango REST specification: when two worlds must cooperate, define the boundary as data, not shared code.

Screen cheap, optimise expensive

The original design ran optimisation before instrument selection. Full Optuna (200 trials) across eight-plus symbols costs hours, while a single default-parameter backtest costs a minute and eliminates instruments that have no affinity with the strategy. The pipeline now spends its compute where the candidates have already earned it.

Notify, don't block

All four agents run unattended. The Deployer sends a full summary to Telegram before deploying. Stage duration is unpredictable (Fargate capacity allocation); blocking approval gates become a race between token expiry and a human noticing a message in time. The gate infrastructure remains in the codebase, switchable per agent, for stages that may need a hard stop. /pause and /kill exist for when seeing is not enough.

LLM agents only where judgment lives

Two of the four agents use an LLM (idea→spec, spec→code). Screening and optimisation are deterministic, reproducible, cheap. Drawing that line deliberately keeps the pipeline auditable: every number that reaches production came from a backtest, not a model's opinion.

Status and numbers

Live on AWS (eu-central-1) · testnet grace period · mainnet: one env var away

  • Total infra cost: ~$2.50/month
  • Production Lambda cold start: irrelevant at 5-minute tick intervals
  • Per-bot runtime cost: ~$0.01/month
  • Java 21 / GraalVM native-image path proven in reference bot, scheduled when fleet reaches 3+ profitable bots

Stack

Python 3.12 agents (Claude via Bedrock) · AWS Step Functions · ECS Fargate · Lambda · DynamoDB · S3 · EventBridge · SAM/CloudFormation · Backtrader · Optuna · Telegram Bot API · Java 21 / GraalVM (reference production bot)

What this demonstrates, in the framing of the rest of this site: build the platform that makes the next thing possible (idea to live bot, one command); generate evidence before committing (screen, optimise, promote — testnet to mainnet, Python to Java, each gate opened by proof); leave the method behind (strategy-as-spec, the bug catalog, notify-don't-block as an operational pattern). The system is small. The discipline is the product.