The problem
The lifecycle of a new trading strategy used to look like this: write the idea as Backtrader code, smoke-test it by hand, run an Optuna notebook, copy-paste the results CSV, analyse parameters in a REPL, write a spec document manually, translate the logic into the production bot's codebase, configure YAML, deploy, and hope. Every handoff was manual, undocumented at the boundary, and dependent on human memory between sessions. The gap between "idea" and "live bot" was measured in days.
This is the same failure mode I've spent years removing from scientific infrastructure: the system works, but every transition through it requires a person to carry context that nothing else holds.
What it does now
One Telegram command:
/pipeline "fast ema crosses slow ema" 5 produces a live deployed Lambda bot in a single unattended execution. Behind that command, AWS Step Functions orchestrates four agents on ECS Fargate:
- Designer (LLM) — turns the natural-language idea into a strategy spec, a pure-Python implementation, and smoke tests.
- Selector (deterministic) — quick single-pass backtest on every candidate instrument (~1 minute per symbol), ranked by composite score.
- Researcher (deterministic) — Optuna TPE optimisation on the top three candidates only, with equity charts as evidence.
- Deployer (LLM) — generates the production Lambda from the spec,
runs contract tests, executes
sam deploy, registers the bot in the fleet config, and reports to Telegram.
Fleet management afterwards: /fleet, /pause,
/resume, /kill, /pnl.
The decisions worth writing down
Strategy-as-spec
The canonical artifact is strategy_spec.json: typed parameters with
ranges, a state-machine definition, signal types, and a natural-language
logic block. The Designer generates it, the Researcher updates it
with optimised values, the Deployer reads it to generate production code. Research
and production never share a framework; they share a contract. This is the same
instinct as the Tango REST specification: when two worlds must cooperate, define
the boundary as data, not shared code.
Screen cheap, optimise expensive
The original design ran optimisation before instrument selection. Full Optuna (200 trials) across eight-plus symbols costs hours, while a single default-parameter backtest costs a minute and eliminates instruments that have no affinity with the strategy. The pipeline now spends its compute where the candidates have already earned it.
Notify, don't block
All four agents run unattended. The Deployer sends a full summary to Telegram
before deploying. Stage duration is unpredictable (Fargate capacity allocation);
blocking approval gates become a race between token expiry and a human noticing
a message in time. The gate infrastructure remains in the codebase, switchable
per agent, for stages that may need a hard stop. /pause and
/kill exist for when seeing is not enough.
LLM agents only where judgment lives
Two of the four agents use an LLM (idea→spec, spec→code). Screening and optimisation are deterministic, reproducible, cheap. Drawing that line deliberately keeps the pipeline auditable: every number that reaches production came from a backtest, not a model's opinion.
Status and numbers
Live on AWS (eu-central-1) · testnet grace period · mainnet: one env var away
- Total infra cost: ~$2.50/month
- Production Lambda cold start: irrelevant at 5-minute tick intervals
- Per-bot runtime cost: ~$0.01/month
- Java 21 / GraalVM native-image path proven in reference bot, scheduled when fleet reaches 3+ profitable bots
Stack
Python 3.12 agents (Claude via Bedrock) · AWS Step Functions · ECS Fargate · Lambda · DynamoDB · S3 · EventBridge · SAM/CloudFormation · Backtrader · Optuna · Telegram Bot API · Java 21 / GraalVM (reference production bot)
What this demonstrates, in the framing of the rest of this site: build the platform that makes the next thing possible (idea to live bot, one command); generate evidence before committing (screen, optimise, promote — testnet to mainnet, Python to Java, each gate opened by proof); leave the method behind (strategy-as-spec, the bug catalog, notify-don't-block as an operational pattern). The system is small. The discipline is the product.