Push Tracker
ria-toolkit-oss/docs/per_user_registration_keys_plan.md

233 lines
10 KiB
Markdown

---
name: Per-user agent registration keys
description: Replace the shared [wac] API_KEY with per-user registration keys issued from the RIA Agents page on RIA Hub.
type: plan
---
# Per-user agent registration keys — plan
**Status:** design only; nothing implemented.
**Owner (toolkit side):** `ria-toolkit-oss`
**Owner (hub side):** `ria-hub` / `controller`
**Related:** [screens_agent_handoff.md](./screens_agent_handoff.md), [agent_tx_protocol.md](./agent_tx_protocol.md)
---
## Context (current state)
Today, `ria-agent register` calls `POST {hub_url}/screens/agents/register` with
an `X-API-Key` header ([cli.py:41-64](../src/ria_toolkit_oss/agent/cli.py#L41-L64)).
The hub validates that header against a single shared secret — `[wac] API_KEY`
in the hub's `app.ini` ([legacy_executor.py:821-822](../src/ria_toolkit_oss/agent/legacy_executor.py#L821-L822)).
The hub responds with `{agent_id, token}`; the agent persists both to
`~/.ria/agent.json` and uses `token` as the bearer on the WS handshake
afterwards.
Consequences of the shared secret:
- Every agent operator holds the same key → no per-user attribution in logs.
- Revoking one operator forces a rotation across every deployed agent.
- Key-in-CLI-history leaks escalate to the whole fleet.
- Nothing ties a registered agent to a human in the hub's user table.
## Goal
A user signs into `riahub.ai`, opens an **RIA Agents** page, mints a key, and
uses it once with `ria-agent register`. The resulting agent is owned by that
user; the key can be revoked without affecting anyone else's agents.
The agent-side `token` returned by `/screens/agents/register` keeps its current
role (bearer for the WS handshake). Only the *registration* credential
changes.
---
## User flow
1. User signs into `https://riahub.ai`.
2. User navigates to **Settings → RIA Agents** (or a top-level `/agents`
page — see open question O1).
3. User clicks **Generate registration key**. A modal shows the key **once**,
with copy-to-clipboard. Only a prefix + hash is stored server-side.
4. User runs, on the agent host:
```
ria-agent register --hub https://riahub.ai --api-key ria_reg_<...>
```
5. Hub validates the key, creates an agent row owned by the user, marks the
key as `consumed` (one-shot) or bumps `last_used_at` (multi-use — see O2),
and returns `{agent_id, token}` exactly as today.
6. The agent list on the same page shows the new agent's `name`, `hardware[]`,
`last_heartbeat`, and **Revoke** / **Rename** actions.
---
## Scope split
### Toolkit (`ria-toolkit-oss`)
The CLI already sends `X-API-Key`, so no protocol change is required. Two
small quality-of-life changes:
| # | Change | File |
|---|--------|------|
| T1 | Update `--api-key` help text and [cli.py:8 docstring](../src/ria_toolkit_oss/agent/cli.py#L8) to say "personal registration key from the RIA Agents page" rather than "Hub API key". | [agent/cli.py](../src/ria_toolkit_oss/agent/cli.py) |
| T2 | On registration failure, if the response body is JSON with a `reason` field (`invalid_key` / `expired` / `already_consumed` / `revoked`), surface it verbatim instead of the raw `HTTPError`. Makes user-facing errors actionable. | [agent/cli.py:56-61](../src/ria_toolkit_oss/agent/cli.py#L56-L61) |
No change to `config.py`, `ws_client.py`, or the streamer — the `token`
returned by register is still what authenticates the WS connection.
### Hub (`ria-hub` / `controller`)
Paths below are inferred from [screens_agent_handoff.md](./screens_agent_handoff.md)
(`controller/app/modules/...`). Hub team should sanity-check before starting.
#### Prior art — check RIA Conductor first
The RIA Conductor feature is believed to already implement similar key
generation (likely for authenticating conductors to the hub). **Before
building anything in this section, read the Conductor key code** and decide
whether to:
- **Reuse** it as-is (shared `registration_keys` table, `kind` column
discriminating `conductor` vs. `agent`) — preferred if the shapes line up.
- **Extract** the hashing / minting / revoke primitives into a shared
`registration_keys` module that both features depend on.
- **Fork** a parallel `agent_registration_keys` table — only if the
Conductor model is materially different (e.g. per-org scoping, different
lifetime rules) and forcing a merge would distort one or both features.
Whichever path is chosen should be decided up front and noted on the PR, so
we don't end up with two near-identical key subsystems by accident. The
security notes below (argon2id, one-time reveal, rate limits, audit logging)
apply regardless of which path is taken — confirm Conductor already does
these; if not, the fix belongs in the shared code, not this feature.
#### Data model
New collection (Mongo) or table (if Postgres is used for users):
```
registration_keys
_id
user_id # FK to hub users
name # user-supplied label, e.g. "lab laptop"
key_prefix # first 8 chars of the plaintext, for UI display
key_hash # argon2id or bcrypt of the full plaintext
created_at
expires_at # optional; null = no expiry
consumed_at # null until first successful registration (if one-shot)
revoked_at # null unless explicitly revoked
last_used_at # updated on every successful use (if multi-use)
```
Augment the existing agents collection with `owner_user_id` (FK) and
`registered_via_key_id` (FK to `registration_keys._id`).
Decide O2 before building: one-shot vs. reusable. Recommendation: one-shot by
default with an optional "reusable for N days" toggle, since one-shot is the
lower-blast-radius default and matches how GitHub/Gitea deploy keys behave.
#### Endpoints
| # | Endpoint | Notes |
|---|----------|-------|
| H1 | `POST /api/v1/user/registration-keys` | Auth: session cookie. Body: `{name, expires_in_days?, reusable?}`. Returns plaintext key **once**. |
| H2 | `GET /api/v1/user/registration-keys` | Auth: session cookie. Lists the caller's keys (prefix + metadata, never plaintext). |
| H3 | `DELETE /api/v1/user/registration-keys/{id}` | Auth: session cookie. Revokes. |
| H4 | `POST /screens/agents/register` (existing) | Change auth: look up `X-API-Key` by hash instead of string-compare against `[wac] API_KEY`. Reject if revoked / expired / consumed. Set `owner_user_id` on the new agent row. |
| H5 | `GET /api/v1/user/agents` | Auth: session cookie. Lists the caller's agents for the UI. |
| H6 | `DELETE /api/v1/user/agents/{id}` | Auth: session cookie. De-registers and closes any live WS. |
H4 is the only backwards-incompatible change. See the migration section for
how to ship it without breaking existing deployments.
#### Frontend
New page — **Settings → RIA Agents** — two panels:
- **Registration keys:** table (name, prefix, created, expires, last used,
revoke button) + "Generate" button that opens the one-time-reveal modal.
- **Agents:** table (name, hardware, status, last heartbeat, rename, revoke).
Matches the existing Gitea-style Settings sidebar if RIA Hub is Gitea-based
(O3).
---
## Migration from the shared `[wac] API_KEY`
The shared key is likely in use on every existing deployment. To avoid a
flag day:
1. **Dual-accept window.** H4 accepts *either* a per-user key (lookup by
hash) *or* the legacy `[wac] API_KEY` string. When the legacy key is used,
the resulting agent has `owner_user_id = null` and a warning is logged.
2. **Admin UI surfaces "unowned" agents** so an admin can re-assign them or
ask owners to re-register.
3. **Deprecation window of one release**, then H4 rejects the legacy key and
the `[wac] API_KEY` config is removed from `app.ini`.
No toolkit-side migration needed — existing `~/.ria/agent.json` files already
store the post-registration `token`, which keeps working regardless of how
registration itself was authenticated.
---
## Security notes
- Store `key_hash` with a password hash (argon2id), not a fast hash. The key
is a secret-equivalent: treat it like a password.
- Plaintext key format: `ria_reg_<base64url of 32 random bytes>`. Prefix makes
the purpose obvious in leaked logs and lets scanners (trufflehog etc.)
recognize it.
- One-time reveal in the UI — never persist or re-display the plaintext.
- Rate-limit H4 per source IP and per `key_prefix` to blunt brute-force on
leaked prefixes. Lock a key out after N failed attempts in M minutes.
- Log every H4 call (success + failure, with key prefix and source IP)
to the audit trail.
---
## Open questions
- **O1.** Where does the page live? A top-level `/agents` route is
discoverable; `/user/settings/agents` matches Gitea's existing IA. Pick
before F7 (frontend task).
- **O2.** One-shot vs. reusable keys (default and whether both are offered).
Recommendation above; needs product sign-off.
- **O3.** Is RIA Hub's web UI really a Gitea fork? URL patterns
(`/qoherent/-/packages/...`, `.git` clones) suggest yes, but the "Settings"
integration plan depends on confirming this. If it isn't, F7 is a standalone
page instead.
- **O4.** Does the agent bearer `token` need per-user scoping too, or is
ownership-at-registration enough? Today the token is opaque and not tied
to a user in the WS handler. Probably fine to defer until after per-user
keys ship.
- **O5.** Should admins be able to mint keys on behalf of other users (for
onboarding)? If yes, H1 needs an admin-scoped variant.
- **O6.** Conductor reuse decision — reuse / extract / fork. Must be answered
before any hub-side code lands. See "Prior art" above.
---
## Out of scope
- SSO / OIDC for agent-to-hub auth (current `token` bearer is kept as-is).
- Per-agent capability scoping beyond what `--allow-tx` already does at
registration time.
- Fleet provisioning (N agents from one key); covered instead by "reusable"
flag in O2 if that's the chosen default.
---
## MVP cut
If the hub team wants the smallest shippable slice:
- H1, H2, H3, H4 (with dual-accept), H5.
- Frontend: registration-keys panel only; reuse the existing agents admin
view if one already exists.
- T1 toolkit copy-change.
Defer H6, rename flows, T2, and audit logging to a follow-up.