Milestones

All 9 implementation milestones with deliverables and done-when criteria.

Nine milestones sequence the implementation. Each milestone builds on the previous and has a specific "done when" criterion that is testable, not aspirational.


Milestone 0: Skeleton

Timeline: Day 1–2
Goal: A running process with DB, API, and audit writer. No pipeline yet.

Deliverables:

  • Repo structure, pyproject.toml with deps (fastapi, uvicorn, sqlalchemy, alembic, pydantic)
  • config.py with typed settings (Pydantic BaseSettings, reads DATABASE_URL from env)
  • PostgreSQL database + Alembic migration 0001: audit_events, system_health tables
  • AuditWriter implemented and unit-tested
  • GET /health returns a hardcoded heartbeat
  • GET /audit returns audit events (empty list)

Done when: uv run python -m syris.main starts. Hit /health and get a JSON response. Write an audit event via test helper. See it at GET /audit. All unit-tested.


Milestone 1: Pipeline skeleton with real audit output

Timeline: Week 1
Goal: An event flows end-to-end through Normalize → Route → Execute (fast only) and every stage emits a queryable audit event. The full routing cascade structure is wired — even if most branches are stubs — so future milestones add intents without touching the router's structure.

Deliverables:

  • schemas/ package: MessageEvent, RoutingDecision, AuditEvent (Pydantic v2)
  • storage/models.py + repos for events, audit, routing_decisions
  • normaliser.py: accepts raw dict + channel, returns MessageEvent, persists + audits
  • router.py: full cascade structure wired — filters → IFTTT rules → fastpath → LLM fallback. Ships with one fastpath intent (timer.set). Rules and LLM fallback are stubs that return a canned routing.unhandled response, but the cascade is real and traversed in order.
  • pipeline/executor.py: fast lane only, calls NoopTool, audits
  • tools/executor.py: scope check + idempotency + NoopTool + audit
  • pipeline/responder.py: stub final stage — logs that a response would be sent, does nothing yet
  • GET /events, GET /audit, GET /audit?trace_id=X working

Fastpath intents added this milestone: timer.set

Done when: POST a raw event → full trace at /audit?trace_id=X showing exactly 4 audit events (event.ingested, routing.decided, tool_call.attempted, tool_call.succeeded). POST same event again → event.deduped. Sending an event the router cannot match → routing.unhandled in audit with no tool call attempted. Zero log-spelunking required.


Milestone 2: Task engine + LLM-planned tasks

Timeline: Week 2
Goal: Multi-step workflows with checkpointing, retries, and crash recovery. Adds the llm_plan execution lane so the LLM can act as executor — not just router — for inputs that require dynamic step sequences.

Deliverables:

  • schemas/tasks.py: Task, Step, RetryPolicy
  • storage/repos/tasks.py
  • tasks/engine.py: claim → execute → checkpoint loop with FOR UPDATE SKIP LOCKED
  • tasks/step_runner.py: run step, handle retries, write checkpoint
  • tasks/state.py: state machine enforcement (illegal transitions blocked at DB level)
  • tasks/recovery.py: startup reconciliation
  • tasks/llm_runner.py: handles steps of type llm_decide. At each such step, calls the LLM with the user's original intent plus all prior step outputs, and receives a structured decision: either call a named tool next, or mark the task complete. The resulting tool call is dispatched through the normal tool executor with full gating. The step sequence is built dynamically as the task executes — it is not fixed at creation.
  • GET /tasks, GET /tasks/{id}, POST /tasks/{id}/cancel|pause|resume

Fastpath intents added this milestone: task.status, task.cancel

Done when: Create a 3-step task with a NoopTool at step 2. Kill the process mid-step-2. Restart. Task resumes from step 2 (not step 1). Audit shows interruption and recovery. Run 10 times — no duplicated side effects. Separately, route an unrecognised input through llm_plan → task created → LLM runner fires → llm.step_decided + tool_call.attempted in audit.


Milestone 3: Safety layer + Approvals + Response synthesis

Timeline: Week 3
Goal: Autonomy levels, risk classification, and approval gates working end-to-end. Also: for user-initiated events, a response is synthesised and sent back through the originating channel.

Deliverables:

  • safety/autonomy.py: read/write current level, persist history
  • safety/risk.py: classify tool action → risk level
  • safety/gates.py: gate decision logic per autonomy × risk matrix
  • safety/dryrun.py: preview protocol
  • schemas/approvals.py + storage/repos/approvals.py
  • GET /approvals, POST /approvals/{id}/approve|deny
  • POST /controls/autonomy
  • pipeline/responder.py (full implementation): replaces the stub from Milestone 1. After any user-initiated event completes (fast-lane or multi-step), collects the tool outputs and calls the LLM to compose a natural-language reply. Dispatches the reply via the originating channel's outbound adapter. The send itself goes through the tool executor with normal gating — at A1, sending a reply email still requires approval.

Fastpath intents added this milestone: autonomy.set, approval.list, approval.approve, approval.deny

Done when: Set autonomy to A1. Trigger a medium-risk tool call. Approval created at /approvals, tool not executed, gate.required in audit. Approve via API. Tool executes. gate.approved + tool_call.succeeded + response.sent in audit. Full trace queryable. Separately, trigger a low-risk fast-lane action — response.sent appears in audit with the composed reply.


Milestone 4: Scheduler + Watchers

Timeline: Week 4
Goal: Timers and scheduled events flow through the pipeline proactively.

Deliverables:

  • scheduler/loop.py: cron + interval + one-shot loop
  • storage/repos/schedules.py
  • watchers/base.py + HeartbeatWatcher
  • GET /schedules, POST /schedules, PATCH /schedules/{id}
  • GET /watchers, PATCH /watchers/{id}
  • GET /health now uses real heartbeat data

Fastpath intents added this milestone: schedule.create, schedule.list, schedule.cancel, timer.set (full implementation replacing stub), schedule.pause

Done when: Create a 30-second interval schedule. Observe schedule.fired in audit every ~30 seconds. Heartbeat appears at /health with real uptime. Disable watcher via API → confirm it stops ticking. Kill process, restart → schedule catches up per policy.


Milestone 5: Rules Engine

Timeline: Week 5
Goal: IFTTT-style rules fire, suppress correctly, and emit child events.

Deliverables:

  • rules/engine.py + condition evaluator
  • storage/repos/rules.py (rules stored in DB)
  • Debounce + dedupe tracking on rule records
  • Quiet hours enforcement (add quiet_hours_policies table)
  • GET /rules, PATCH /rules/{id}

Fastpath intents added this milestone: rule.list, rule.enable, rule.disable, rule.create

Done when: Rule matching ha_event fires → emits child event with parent_event_id set → both in audit with same trace_id. Same event fired 5× in 1 second with 10s debounce → 1 rule.triggered + 4 rule.suppressed in audit.


Milestone 6: LLM conversation quality

Timeline: Week 6
Goal: The LLM is actually good at its job. It gets structured context, knows what tools are available, and produces consistent, useful responses. The architecture should support the model running multi-turn tool calling in an agentic loop — making it not just an LLM, but a true agent. This is the milestone where SYRIS goes from "LLM wired in" to "LLM worth using."

Deliverables:

  • llm/context.py: builds the context bundle passed to the LLM on any call. Includes: the user's original message, recent conversation history scoped to thread_id, relevant recent audit events (last N completions, active tasks), and the current tool registry (so the LLM knows what capabilities exist).
  • llm/prompts.py: prompt templates and a SYRIS system prompt covering personality, role, response style, and constraints. Includes few-shot examples for the most common fallback intents.
  • llm/provider.py: thin CompletionProvider interface wrapping the inference backend. Swappable — initial impl can be a direct API call; later milestones can swap in SGLang or another engine without touching callers.
  • Thread tracking: thread_id propagated through MessageEvent and stored on tasks, so conversation history is joinable.
  • GET /llm/context?trace_id=X debug endpoint: shows exactly what context the LLM received for a given trace.

Done when: Send two related messages in the same thread. LLM response to the second message demonstrably references the first (verifiable via /llm/context). Tool registry is present in context — LLM correctly selects a registered tool for an ambiguous input rather than hallucinating one. System prompt and templates are version-controlled and testable.


Milestone 7: MCP Integration

Timeline: Week 7–8
Goal: An MCP server's tools appear in the tool registry and execute with full SYRIS gating. Fastpath intents are auto-registered for discovered MCP tools.

Deliverables:

  • mcp/connection.py: persistent connection + reconnect
  • mcp/provider.py: tool discovery + registry sync
  • mcp/adapter.py: MCPToolAdapter(BaseTool)
  • mcp/trust.py: TrustPolicy schema + loader
  • On discovery, dynamically register fastpath intents for each MCP tool (e.g. mcp.<server>.<tool_name>)
  • GET /integrations showing MCP server health
  • MCP connection lifecycle audit events

Done when: Connect a real MCP server. Tools appear in GET /integrations and in the tool registry visible to the LLM context. Execute one tool → full audit trail (scope check, risk, idempotency, gate, result). Disconnect server → health degrades in dashboard within 30 seconds. LLM correctly selects a newly-registered MCP tool for a relevant input.


Milestone 8: Worker Skeleton

Timeline: Week 9
Goal: A gated job submission and status reporting mechanism exists.

Deliverables:

  • workers/manager.py: job table, spawn/progress/cancel
  • workers/runtimes/process.py: OS process isolation
  • GET /state shows job count

Done when: Submit a stub long-running job via API → observe in /state → cancel → see cancellation in audit.


Milestone 9: First real integrations

Timeline: Week 10+
Goal: A Home Assistant adapter or email adapter working end-to-end with live data, including a full response loop.

Deliverables:

  • First inbound adapter (e.g. webhook receiver for Home Assistant events or email ingest)
  • First outbound tool (e.g. HA service call or email send)
  • Secrets store wired to real credentials
  • Approval flow exercised with a real risky action (e.g. device control)
  • Full response loop exercised: inbound message → route → tool call → LLM-composed reply → sent via outbound adapter

Done when: A real-world event flows in, routes correctly, executes a tool with full audit trail, requires and passes approval if risk demands it, and a natural-language reply is sent back through the originating channel. SYRIS is useful.