Audit-first AI governance
← Insights
Field Guide

A field guide to the NIST AI Risk Management Framework

A practical walkthrough of Govern, Map, Measure, and Manage — and how to turn each function from a binder of documentation into controls that actually fire at the moment a decision is made.

The NIST AI Risk Management Framework is the most widely referenced map of AI governance in the United States, and one of the most widely misread. Teams cite it in board decks and vendor questionnaires, then quietly treat it as a checklist to be survived. It is neither a checklist nor a certification — it is a structure for thinking about AI risk, and structure only pays off if you carry it down to enforcement. This guide walks the four functions as a practitioner would, and names where most programs run out of road: the framework's hardest function is also the one organizations build last and weakest.

What the framework is, and what it isn't

The AI RMF is voluntary — NIST guidance, not regulation. There is no certificate, no auditor who stamps you compliant, no pass/fail line. That is a feature: it lets the framework describe capabilities a risk program should have rather than thresholds it must hit, which keeps it usable across a hospital, a community bank, and a logistics company alike.

It is organized around four functions: Govern, Map, Measure, and Manage. Govern is cross-cutting — it sits underneath the other three rather than beside them, which form a loose loop: understand the context (Map), assess and track the risks (Measure), and act on them (Manage). The framework is deliberately not prescriptive about how, which is why so many programs stall: it is easy to produce documentation that gestures at all four functions and changes nothing about what your AI is actually allowed to do. Read the framework as a question — not "do you have a policy for this?" but "when an AI system does something consequential, what happens?"

Govern — the function that holds the other three up

Govern is about accountability and the standing roles that make risk management a habit rather than an event. It asks who owns AI risk, who can approve a system for use, and who gets told when something goes wrong. Because it is cross-cutting, weakness here undermines everything else: a flawless risk assessment is worthless if no one is accountable for acting on it.

On paper, Govern is the easiest function to fake — write a policy, name a committee, declare it satisfied. The honest test is different. Governance is real only when accountability is specific and enforced: when a named owner has bounded authority, when an action outside those bounds cannot quietly self-approve, and when the escalation path actually routes to a human on the hook. Write down who is accountable for each consequential action, what it is permitted to do, and what happens at the edge of that permission — then make those bounds something the system is checked against, not merely described by.

Map — know where the AI acts and who it touches

Map is about context: for each AI system, what it is for, how it is intended to be used, who is affected by its outputs, and what could go wrong. It turns "we use AI" into a specific, enumerable list of decisions the AI participates in.

Done well, Map produces an inventory with teeth — not a spreadsheet of model names, but a register of actions: each AI-touched decision, its intended use, its stakeholders, and its stakes. A model inventory tells you what you bought; an action inventory tells you where harm could enter. Stakes drive everything downstream — a model that ranks marketing emails and one that denies a loan are not in the same risk tier, and Map is where you say so. The common failure is mapping models instead of consequences, never pinning down the moments where an AI's output becomes an action with a real effect on a real person. If you cannot point to the exact decision points, you cannot control them.

Measure — assess, test, and track, including what's hard to measure

Measure is the analytic function: assess the risks you mapped, test the system against them, and track those measures over time. Accuracy, bias, robustness, drift — Measure is where you put numbers on behavior and watch them move.

It is also where honesty is most required, because some of the most important risks resist clean measurement. Generative systems make this acute. How do you put a stable metric on the propensity to fabricate, on the harm of a confidently wrong answer, on behavior that shifts with a prompt you didn't anticipate? A mature program says which risks it can quantify, which it can only observe, and which it currently cannot see at all — rather than reporting a comforting number that measures the easy thing instead of the dangerous one. Treat the unmeasurable risks as a reason to constrain behavior, not a reason to look away, and track every measure over time rather than at a single launch gate.

Manage — the function where control actually lives

Manage is where the framework asks you to act: prioritize the risks you've measured, decide how to treat them, and respond when something goes wrong. Govern set accountability, Map found the decisions, Measure assessed them — Manage is where you do something about it. It is also the runtime function. Govern, Map, and Measure can all be satisfied before a system ever runs; Manage is the only one that has to operate at the point of action, at the live moment an AI is about to approve, deny, send, or move money.

Risk treatment that exists only as a documented plan is not treatment; it is intention. Real treatment is the gate that sits in front of the action and decides whether it may proceed. That is the part most programs underbuild — and the part that separates a governance posture from a governance system.

The honest critique: too much paper, too little enforcement

Walk into most AI governance programs and the investment is lopsided. Govern has a policy and a committee. Map has an inventory. Measure has dashboards. These are real and worth having. But Manage — the act-time function — is too often a paragraph saying risks "will be mitigated through appropriate controls," with no control that fires when an action crosses a line.

The reason is structural. Govern, Map, and Measure produce artifacts — documents, registers, charts — that look like progress and survive a review. Manage, done properly, requires something harder: a control that sits in the live path of a decision and is willing to say no. Documentation is comfortable; enforcement is not. So programs accumulate paper around three functions and leave the fourth as good intentions. But the framework does not let you off this hook. A risk you have governed, mapped, and measured but not managed at the point of action is a risk you have described, not reduced.

From function to control

This is where the framework meets enforcement. KAiM Helm is a deterministic control engine: it evaluates a proposed AI action against six axes — Authority, Policy, Evidence, Harm, Regulation, and Escalation — and returns one of three outcomes, ALLOW, BLOCK, or ESCALATE, with a signed record of why. Read against the framework, the upstream functions feed the gate, and Manage becomes the gate itself.

RMF functionWhat it producesHow it becomes a control in KAiM Helm
GovernAccountability, roles, bounded authorityEach agent's authority is declared up front; the Authority axis checks every action against those bounds, so an out-of-bounds action cannot self-approve
MapIntended use, affected parties, stakes per decisionThe mapped action and its stakes set which checks apply and how stringent they are — the higher the stakes, the tighter the gate
MeasureAssessments, tests, evidence of riskThe Policy, Evidence, Harm, and Regulation axes encode those measures as conditions the action must satisfy at runtime, not just at launch
ManageTreat, prioritize, respond — at the point of actionThe live gate itself: ALLOW lets a clean action proceed, BLOCK stops one that crosses a line, ESCALATE routes an uncertain one to a human — and every outcome is captured in a signed, append-only record

The shift is small to describe and large in effect: the same risk thinking the framework already asks for stops living in a binder and starts living in the decision path. When an AI is about to act, something checks it — and either lets it through, stops it, or hands it to a person. That is Manage made real, and it is the only version of Manage that protects anyone.

KAiM Helm is at the design-partner and controlled-demonstration stage. The controls described here are real and demonstrable; we make no customer-deployment or certification claims, and nothing in this guide should be read as one.

Key takeaways


If you have done the documentation but are not sure what would actually happen the next time an AI system is about to take a consequential action, that uncertainty is the gap — and it lives in Manage. A Control Gap Assessment is a scoped read of your AI decisions against exactly that question: which risks you have governed, mapped, and measured, and which ones have a control that fires at the point of action. It tells you, honestly, where the paper ends and the enforcement begins.