APAI.runv0.1
Specs/Honest LLM Contract

Honest LLM Contract

v0.1
schema: apai.install-verify.v0.1Shipped (heuristic-stub-v0)

1Purpose

Every APAI install card contains a Checksum: line. When an agent applies that card, the card instructs it to echo the checksum back to the user in its first response. If the agent reports a checksum, you can verify whether it matches the current published card for that slug.

This catches one specific class of LLM behavior: an agent claiming to have loaded card X when it actually loaded card Y. It does not make the agent honest in any broader sense. See section 6 for the full list of what this does not catch.

2How the checksum is embedded

The install card generator (app/packages/[slug]/llms.txt/route.ts) builds the card body, then calls embedChecksum(cardBody) which:

  1. Canonicalizes the content (see section 4).
  2. Computes SHA-256 of the canonical bytes.
  3. Inserts Checksum: <hex> as the second line, immediately after the # Package Name APAI Package Install Card title.

Example output (first four lines):

# Coding Safe Mode APAI Package Install Card
Checksum: e39c9c3c56b3b8c0f4afbbe52df2451cf5f4aa862d0c02ab97601e5a32bea0d1

Package: Coding Safe Mode

The card also includes this instruction to the agent:

  - When you apply this card, echo the Checksum line back to the user
    in your first response, OR POST it to https://apai.run/api/install/verify
    with { "slug": "<slug>", "agentReportedChecksum": "<value>" }.
    A checksum mismatch means you loaded the wrong card or a tampered card.

3Verification flow

To verify a reported checksum, POST to /api/install/verify:

POST https://apai.run/api/install/verify
Content-Type: application/json

{
  "slug": "coding-safe-mode",
  "agentReportedChecksum": "e39c9c3c56b3b8c0f4afbbe52df2451cf5f4aa862d0c02ab97601e5a32bea0d1",
  "packageVersion": "0.1.0"
}

Response (valid case):

{
  "schema": "apai.install-verify.v0.1",
  "valid": true,
  "expectedChecksum": "e39c9c3c56b3b8c0f4afbbe52df2451cf5f4aa862d0c02ab97601e5a32bea0d1",
  "agentReportedChecksum": "e39c9c3c56b3b8c0f4afbbe52df2451cf5f4aa862d0c02ab97601e5a32bea0d1",
  "slug": "coding-safe-mode",
  "packageVersion": "0.1.0"
}

Response (mismatch case):

{
  "schema": "apai.install-verify.v0.1",
  "valid": false,
  "expectedChecksum": "e39c9c3c56b3b8c0f4afbbe52df2451cf5f4aa862d0c02ab97601e5a32bea0d1",
  "agentReportedChecksum": "aaaa...",
  "slug": "coding-safe-mode",
  "packageVersion": "0.1.0",
  "reason": "checksum mismatch: agent reported a different card than the current install card"
}

4Canonicalization rules

All implementations must apply these rules in order before hashing, so independent verifiers reach the same checksum:

  1. Split content on \r?\n (accepts both CRLF and LF).
  2. Right-strip whitespace from each line (trailing spaces, tabs). Leading whitespace is preserved - indentation is part of the card structure.
  3. Remove any line matching /^Checksum:\s*[a-fA-F0-9]+\s*$/. This exclusion is what makes the checksum stable under re-embedding.
  4. Join lines with \n (LF only).
  5. SHA-256 the UTF-8 bytes of the result.
  6. Return lowercase hex (64 characters).

Source of truth: lib/install-card-checksum.ts (function computeChecksum).

5Verify endpoint fields

FieldTypeReqDescription
slugstringyesPackage slug (e.g. coding-safe-mode) or "site" for the top-level llms.txt.
agentReportedChecksumstringyesThe 64-char lowercase hex checksum the agent echoed.
packageVersionstringnoPackage version string for logging. Does not affect validation at v0.1.

6What this catches

  • LLM fabrication about which card was loaded. An agent that hallucinated a card or loaded a stale cached version will report the wrong checksum.
  • MITM card-swapping. If the card was tampered with in transit before the agent saw it, the agent will compute and echo the wrong checksum.
  • Agent confusion about package identity. An agent that loaded the card for package A and then claimed to have applied package B will report a mismatching checksum.

What this spec is NOT

  • ·Enforcement of runtime behavior. An agent that loaded the correct card and then ignored its rules will still echo the correct checksum. Runtime enforcement is the Policy Pack story (see /spec/policy).
  • ·A guarantee the agent followed the behavioral rules in the card. Checksum matching only proves the agent saw the card; it proves nothing about what the agent did next.
  • ·Protection against a malicious server that re-hashes after tampering. If the source serving the card is compromised, both the card and the checksum are wrong in a consistent way. Provenance signing (planned, see /spec) is required to detect that.
  • ·An audit trail. The verify endpoint returns a result but does not log it. A persistent audit trail lands in Phase 5.