Tracker Observability: Understanding Plan and Task State in Soorma

When you run a multi-step agent workflow, two questions come up immediately: what is happening right now, and what went wrong when it fails. The Tracker service is Soorma's answer to both.

The Observability Gap in Event-Driven Systems

Event-driven architectures are powerful, but they trade sequential traceability for concurrency. When a Planner emits five task events and three Workers start processing them in parallel, there is no single call stack to inspect. The correlation chain lives in the event envelope, not in a thread.

Soorma's Tracker service exists to reconstruct that chain into a readable state machine — without requiring your agent code to manually instrument every step.

Plan and Task State Model

Every plan in Soorma has a lifecycle:

PENDING → IN_PROGRESS → COMPLETED | FAILED | CANCELLED

Each task within a plan follows the same shape:

PENDING → RUNNING → DELEGATED | WAITING | COMPLETED | FAILED | CANCELLED

When a plan is created via PlanContext.create_from_goal(), a plan record is persisted in the Memory service and the Tracker begins observing it automatically. As the event bus delivers task completions, the platform updates task states. When all tasks reach a terminal state, the plan closes.

Starting a Plan

Plan state is created and managed through PlanContext — a durable state machine in the Memory service that the Tracker observes automatically as events flow.

from soorma.agents.planner import Planner, GoalContext
from soorma.context import PlatformContext
from soorma.plan_context import PlanContext
from soorma_common.state import StateConfig, StateAction, StateTransition

@planner.on_goal("maintenance.goal")
async def plan_maintenance(goal: GoalContext, context: PlatformContext) -> None:
    states = {
        "start": StateConfig(
            state_name="start",
            description="Initial state",
            default_next="parts_check",
        ),
        "parts_check": StateConfig(
            state_name="parts_check",
            description="Check parts availability",
            action=StateAction(
                event_type="parts.check.requested",
                response_event="parts.check.completed",
                data={"vehicle_id": "{{goal_data.vehicle_id}}"},
            ),
            transitions=[
                StateTransition(on_event="parts.check.completed", to_state="done")
            ],
        ),
        "done": StateConfig(
            state_name="done",
            description="Terminal state",
            is_terminal=True,
        ),
    }

    plan = await PlanContext.create_from_goal(
        goal=goal,
        context=context,
        state_machine=states,
        current_state="start",
        status="pending",
    )
    # plan.plan_id is the stable identifier for Tracker queries
    await plan.execute_next()

As plan.execute_next() emits task events with response_event declared, those events carry the originating correlation_id. The Tracker records each response_event completion against the plan automatically. Use plan.plan_id to query plan state via context.tracker.get_plan_progress().

How Workers Complete Tasks

Workers complete tasks by emitting their declared response_event. The Tracker observes this automatically — there is no emit_progress() write API on the tracker wrapper.

from soorma.task_context import TaskContext

@worker.on_task("parts.check.requested")
async def check_parts(task: TaskContext, context: PlatformContext) -> None:
    result = await query_inventory(task.data.get("vehicle_id"))

    # task.complete() emits the response_event — the Tracker records it automatically
    await task.complete({"result": result})

If the Worker process crashes mid-task, no response event is emitted. The task remains in RUNNING state, allowing the system to identify stalled tasks without polling. For richer state management — delegations, retries, sub-task tracking — use TaskContext directly. See ARCHITECTURE_PATTERNS.md Section 5.

Querying Plan State

The context.tracker wrapper exposes a get_plan_progress() method for synchronous reads:

# tenant_id and user_id come from the originating goal or task event
progress = await context.tracker.get_plan_progress(
    plan_id=plan.plan_id,
    tenant_id=goal.tenant_id,
    user_id=goal.user_id,
)

print(progress.status)          # IN_PROGRESS | COMPLETED | FAILED
print(progress.tasks["parts-check"].status)   # COMPLETED
print(progress.tasks["schedule-appointment"].status)  # RUNNING

This is particularly useful for:

Human-in-the-loop checkpoints: pause until an approval task transitions to COMPLETED
Planner retry logic: inspect which tasks failed before deciding whether to retry or escalate
Client-side status polling: your frontend can query plan state without subscribing to the event bus

What Gets Recorded Automatically

The platform records the following without any manual instrumentation in your agent code:

Event	Recorded By
Plan created	`PlanContext.create_from_goal()`
Task emitted	`context.bus.request()` with `response_event`
Task received	Worker `on_task` handler entry
Task completed	`task.complete()` emits the response event
Plan closed	All tasks in terminal state

Manual emit_progress() calls add richness — intermediate state, result payloads, error details — but the core lifecycle is captured automatically.

What v0.9.1 Ships

The v0.9.1 Tracker service includes:

context.tracker.get_plan_progress() — synchronous plan state read
context.tracker.get_plan_tasks() — task execution history for a plan
context.tracker.get_plan_timeline() — event execution timeline
context.tracker.query_agent_metrics() — agent performance metrics
Automatic state recording via response_event and correlation_id — no write API needed

See the Tracker service README and ARCHITECTURE_PATTERNS.md Section 5 for the full state management specification.

Next up: Service Discovery and the Schema Registry in Soorma.