AI incident dataset
The Incident Collector (Agent #1 of the AI Guardrail Lab) needs raw material. We curate a set of ten or more incidents drawn from four canonical sources.
Sources
| Source | What it gives |
|---|---|
| OECD AI Incidents and Hazards Monitor (AIID) | Government-tier catalogued incidents with structured fields |
| AIAAIC Repository | Independent journalism-style incident archive |
| Stanford AI Index — AI-related incidents | Academic-tier catalogued cases |
| Damien Charlotin’s tracker | Practitioner-curated, court-decision focus |
What an incident record looks like
Each curated incident is one JSONL row:
{ "id": "INC-2024-0142", "title": "...", "date_occurred": "2024-08-15", "system_type": "LLM chatbot", "deployment_context": "customer support", "harm_type": ["misinformation", "financial"], "severity": 4, "description": "...", "sources": ["...", "..."], "lessons": "...", "related_incidents": ["INC-2024-0089"]}Why JSONL
The Incident Collector reads one record per line, processes, writes a .md sidecar (Stage 2), then Stage 3 analyzers fan out — Root Cause, Threat Modeling, Guardrail Designer all read the same record through different lenses.
JSONL keeps the pipeline streaming-friendly and grep-able.
Download
| Format | Link |
|---|---|
| JSONL (canonical) | link added on publication |
| CSV (Excel-friendly) | (auto-generated from JSONL) |
| Markdown (human-readable) | (auto-generated, one file per incident) |
How to extend
NBS engineers can add their own incidents — internal post-mortems, near-misses, observability anomalies. The schema is open; the Incident Collector will pick them up if they land in the configured inbox.
Read next
The 9 agents Incident Collector is Agent #1 — see the full line-up.
Day 2 reveal The reveal session where this dataset is ingested live.
References Back to the references hub.