Tracker Observability: Understanding Plan and Task State in Soorma
When you run a multi-step agent workflow, two questions come up immediately: what is happening right now, and what went wrong when it fails. The Tracker service is Soorma's answer to both.
The Observability Gap in Event-Driven Systems
Event-driven architectures are powerful, but they trade sequential traceability for concurrency. When a Planner emits five task events and three Workers start processing them in parallel, there is no single call stack to inspect. The correlation chain lives in the event envelope, not in a thread.
Soorma's Tracker service exists to reconstruct that chain into a readable state machine — without requiring your agent code to manually instrument every step.
Plan and Task State Model
Every plan in Soorma has a lifecycle:
PENDING → IN_PROGRESS → COMPLETED | FAILED | CANCELLED
Each task within a plan follows the same shape:
PENDING → RUNNING → DELEGATED | WAITING | COMPLETED | FAILED | CANCELLED
When a plan is created via PlanContext.create_from_goal(), a plan record is persisted in the Memory service and the Tracker begins observing it automatically. As the event bus delivers task completions, the platform updates task states. When all tasks reach a terminal state, the plan closes.
Starting a Plan
Plan state is created and managed through PlanContext — a durable state machine in the Memory service that the Tracker observes automatically as events flow.
from soorma.agents.planner import Planner, GoalContext
from soorma.context import PlatformContext
from soorma.plan_context import PlanContext
from soorma_common.state import StateConfig, StateAction, StateTransition
@planner.on_goal("maintenance.goal")
async def plan_maintenance(goal: GoalContext, context: PlatformContext) -> None:
states = {
"start": StateConfig(
state_name="start",
description="Initial state",
default_next="parts_check",
),
"parts_check": StateConfig(
state_name="parts_check",
description="Check parts availability",
action=StateAction(
event_type="parts.check.requested",
response_event="parts.check.completed",
data={"vehicle_id": "{{goal_data.vehicle_id}}"},
),
transitions=[
StateTransition(on_event="parts.check.completed", to_state="done")
],
),
"done": StateConfig(
state_name="done",
description="Terminal state",
is_terminal=True,
),
}
plan = await PlanContext.create_from_goal(
goal=goal,
context=context,
state_machine=states,
current_state="start",
status="pending",
)
# plan.plan_id is the stable identifier for Tracker queries
await plan.execute_next()
As plan.execute_next() emits task events with response_event declared, those events carry the originating correlation_id. The Tracker records each response_event completion against the plan automatically. Use plan.plan_id to query plan state via context.tracker.get_plan_progress().
How Workers Complete Tasks
Workers complete tasks by emitting their declared response_event. The Tracker observes this automatically — there is no emit_progress() write API on the tracker wrapper.
from soorma.task_context import TaskContext
@worker.on_task("parts.check.requested")
async def check_parts(task: TaskContext, context: PlatformContext) -> None:
result = await query_inventory(task.data.get("vehicle_id"))
# task.complete() emits the response_event — the Tracker records it automatically
await task.complete({"result": result})
If the Worker process crashes mid-task, no response event is emitted. The task remains in RUNNING state, allowing the system to identify stalled tasks without polling. For richer state management — delegations, retries, sub-task tracking — use TaskContext directly. See ARCHITECTURE_PATTERNS.md Section 5.
Querying Plan State
The context.tracker wrapper exposes a get_plan_progress() method for synchronous reads:
# tenant_id and user_id come from the originating goal or task event
progress = await context.tracker.get_plan_progress(
plan_id=plan.plan_id,
tenant_id=goal.tenant_id,
user_id=goal.user_id,
)
print(progress.status) # IN_PROGRESS | COMPLETED | FAILED
print(progress.tasks["parts-check"].status) # COMPLETED
print(progress.tasks["schedule-appointment"].status) # RUNNING
This is particularly useful for:
- Human-in-the-loop checkpoints: pause until an approval task transitions to
COMPLETED - Planner retry logic: inspect which tasks failed before deciding whether to retry or escalate
- Client-side status polling: your frontend can query plan state without subscribing to the event bus
What Gets Recorded Automatically
The platform records the following without any manual instrumentation in your agent code:
| Event | Recorded By |
|---|---|
| Plan created | PlanContext.create_from_goal() |
| Task emitted | context.bus.request() with response_event |
| Task received | Worker on_task handler entry |
| Task completed | task.complete() emits the response event |
| Plan closed | All tasks in terminal state |
Manual emit_progress() calls add richness — intermediate state, result payloads, error details — but the core lifecycle is captured automatically.
What v0.9.1 Ships
The v0.9.1 Tracker service includes:
context.tracker.get_plan_progress()— synchronous plan state readcontext.tracker.get_plan_tasks()— task execution history for a plancontext.tracker.get_plan_timeline()— event execution timelinecontext.tracker.query_agent_metrics()— agent performance metrics- Automatic state recording via
response_eventandcorrelation_id— no write API needed
See the Tracker service README and ARCHITECTURE_PATTERNS.md Section 5 for the full state management specification.
Next up: Service Discovery and the Schema Registry in Soorma.