Investigation Pipeline
Pipeline Overview
Every BWTS alert investigation follows a fixed sequence of phases. The key design principle: Phase 1 runs 4 agents in parallel to minimise investigation time, and gate-based dependency tracking ensures each phase transitions instantly without polling delays.
sequenceDiagram participant App as Dashboard App participant AM as Alert Monitor participant MGR as IOT Manager participant DA as Data Agent participant MA as Manual Agent participant PMS as PMS Agent participant CF as Casefile Agent participant RPT as Report Agent participant ENG as Engineering Team
App->>AM: Alert email + API record activate AM AM->>AM: Claim & deduplicate AM->>MGR: Create alert task (alert_monitor_v1) deactivate AM
activate MGR Note over MGR: Parse alert, assess severity
MGR->>DA: Phase 1 — Sensor trends MGR->>MA: Phase 1 — All possible causes MGR->>PMS: Phase 1 — Maintenance history MGR->>CF: Phase 1 — Past incidents MGR->>MGR: Create Phase 1 Gate (blocked by all 4) deactivate MGR
activate DA activate MA activate PMS activate CF Note over DA,CF: Running in parallel
DA-->>MGR: Trends, correlations, anomaly timeline deactivate DA MA-->>MGR: All causes with alarm codes deactivate MA PMS-->>MGR: Overdue status, service gaps deactivate PMS CF-->>MGR: Past incidents, fleet patterns deactivate CF
Note over MGR: Gate resolves → auto-wake
activate MGR Note over MGR: SYNTHESIS — Cross-reference all evidence — Confirm root causes — Assign confidence levels
MGR->>MA: Phase 2 — Targeted remediation MGR->>MGR: Create Phase 2 Gate deactivate MGR
activate MA MA-->>MGR: Step-by-step procedures + safety warnings deactivate MA
Note over MGR: Gate resolves → auto-wake
activate MGR MGR->>RPT: Compile full context → Report deactivate MGR
activate RPT RPT->>RPT: Build HTML report RPT->>ENG: Email report RPT-->>MGR: Delivery confirmed deactivate RPT
Note over MGR: Mark investigation complete ✓Phase-by-Phase Detail
Alert Detection
The BWTS dashboard application monitors sensor data continuously. When a parameter crosses a threshold (e.g., UV intensity drops below IMO compliance level), the app creates an alert record in the API and sends a notification email.
The Alert Monitor agent picks this up via its scheduled routine, claims the alert, and fetches the full picture of all unresolved alerts.
graph LRA["Sensor Breach"] --> B["API Record"]A --> C["Email Notification"]B --> D["Alert Monitor"]C --> DD --> E["Claim + Deduplicate"]E --> F["Forward to Manager"]style A fill:#fef3c7,stroke:#f59e0bstyle D fill:#dbeafe,stroke:#3b82f6style F fill:#ede9fe,stroke:#8b5cf6Phase 1 — Parallel Investigation
The IOT Manager receives the alert task and immediately dispatches 4 specialist agents in parallel:
Agent Investigation Focus Data Source Data Analysis Sensor trends, correlations, anomaly timeline PostgreSQL (telemetry) Manual Agent All possible causes for the alarm type PureBallast 3.1 manual (621 pages) PMS Agent Maintenance history, overdue components PostgreSQL (maintenance log) Casefile Agent Past similar incidents, fleet patterns Task history + events DB After dispatching, the Manager creates a Phase 1 Gate — a task blocked by all 4 subtask IDs. When the last specialist finishes, the gate automatically unblocks and wakes the Manager.
Synthesis
The Manager personally cross-references all Phase 1 outputs:
- Check data evidence — Is the decline gradual (degradation) or sudden (failure)?
- Cross-check maintenance — Is the component overdue for service?
- Validate manual causes — Does the alarm code match what the data shows?
- Check history — Has this happened before? What worked last time?
Each root cause gets a confidence rating:
Rating Meaning Criteria HIGH Very likely 3+ independent sources agree MEDIUM Probable 2 sources agree LOW Possible Single source or conflicting evidence This is the one step the Manager never delegates — it requires reasoning across all evidence sources simultaneously.
Phase 2 — Targeted Remediation
The Manager sends the confirmed root causes to the Machinery Manual Agent for a second, targeted search:
- Exact step-by-step repair procedures for each confirmed cause
- Safety warnings (quoted verbatim from the manual)
- Required tools and spare parts
- Manual section and page references
Report & Delivery
The Final Report & Email Agent receives the complete investigation package and:
- Validates all required inputs are present
- Generates a structured HTML report using
report_builder.py - Applies urgency-based colour coding (red/amber/blue)
- Sends the report via Gmail SMTP to the engineering team
The report contains:
- Alert Summary — All active alerts in a table
- Diagnosis — Confirmed root causes with confidence levels
- Recommended Actions — Step-by-step remediation grouped by urgency
Gate-Based Wake Pattern
The critical design feature that eliminates polling delays:
graph TD subgraph "Phase 1" T1["Data Agent Task"] T2["Manual Agent Task"] T3["PMS Agent Task"] T4["Casefile Agent Task"] end
G1["Phase 1 Gate\nstatus: blocked\nblockedBy: T1, T2, T3, T4"]
T1 -- "done ✓" --> G1 T2 -- "done ✓" --> G1 T3 -- "done ✓" --> G1 T4 -- "done ✓" --> G1
G1 -- "All resolved → auto-wake" --> SYN["Manager: Synthesis"]
SYN --> T5["Manual Agent Phase 2"] T5 --> G2["Phase 2 Gate\nblockedBy: T5"] G2 -- "Resolved → auto-wake" --> RPT["Manager: Dispatch Report"]
style G1 fill:#fef3c7,stroke:#f59e0b style G2 fill:#fef3c7,stroke:#f59e0b style SYN fill:#fce7f3,stroke:#ec4899 style RPT fill:#fef2f2,stroke:#ef4444