AI incident dataset

The Incident Collector (Agent #1 of the AI Guardrail Lab) needs raw material. We curate a set of ten or more incidents drawn from four canonical sources.

Sources

Source	What it gives
OECD AI Incidents and Hazards Monitor (AIID)	Government-tier catalogued incidents with structured fields
AIAAIC Repository	Independent journalism-style incident archive
Stanford AI Index — AI-related incidents	Academic-tier catalogued cases
Damien Charlotin’s tracker	Practitioner-curated, court-decision focus

What an incident record looks like

Each curated incident is one JSONL row:

{
  "id": "INC-2024-0142",
  "title": "...",
  "date_occurred": "2024-08-15",
  "system_type": "LLM chatbot",
  "deployment_context": "customer support",
  "harm_type": ["misinformation", "financial"],
  "severity": 4,
  "description": "...",
  "sources": ["...", "..."],
  "lessons": "...",
  "related_incidents": ["INC-2024-0089"]
}

Why JSONL

The Incident Collector reads one record per line, processes, writes a .md sidecar (Stage 2), then Stage 3 analyzers fan out — Root Cause, Threat Modeling, Guardrail Designer all read the same record through different lenses.

JSONL keeps the pipeline streaming-friendly and grep-able.

Download

Format	Link
JSONL (canonical)	link added on publication
CSV (Excel-friendly)	(auto-generated from JSONL)
Markdown (human-readable)	(auto-generated, one file per incident)

How to extend

NBS engineers can add their own incidents — internal post-mortems, near-misses, observability anomalies. The schema is open; the Incident Collector will pick them up if they land in the configured inbox.