# RIA Hub — Example Files

This repository contains example input files, configurations, and expected outputs for every tool on RIA Hub. If you are new to the platform, start here. Download any file and follow the walkthrough for the tool you want to try.

---

## What is RIA Hub?

RIA Hub is a collaborative platform for RF and machine learning workflows. It combines a Git-based repository system with a suite of specialized tools that cover the full pipeline from raw IQ recordings to live inference deployments:

| Stage | Tools |
|-------|-------|
| **Collect** | Library — browse, organize, and share RF recordings and models |
| **Curate** | Dataset Manager — slice, qualify, augment, and inspect radio datasets |
| **Train** | Model Builder — train, optimize, and compress PyTorch models |
| **Deploy** | Application Packager — compose and build inference applications |
| **Run** | Screens — deploy live RF inference pipelines on real hardware |

---

## Repository Structure

```
RIA_Example/
│
├── recordings/
│   └── example_iq_recording.h5          # Raw IQ capture (input to Curator)
│
├── datasets/
│   ├── example_radio_dataset.h5          # Curated radio dataset (Curator output / Model Trainer input)
│   └── example_synthetic_dataset.h5      # Synthetically generated dataset (Generator output)
│
├── models/
│   ├── example_model.ckpt                  # PyTorch Module (Model Trainer input / output)
│   └── example_model.onnx               # Exported ONNX model (Screens / Application Packager input)
│
├── applications/
│   └── example_application.json         # Application Composer output, can be built in RIA Screens (Application Packager)
│

│
├── .ria/
│   ├── train.yaml                        # Example Model Trainer workflow (committed to .ria/)
│   ├── example_application.yaml          # Example Application Composer Build workflow (committed to .ria/)

│
└── curator-configs/
    └── example_curator_config.json       # Example curation configuration for the Curator tool
```

---

## Tool Walkthroughs

### Library

The Library is a cross-repository browser for all RF and ML assets on the platform. It automatically discovers files pushed to any repository you have access to.

**To explore the example recording:**
1. Import `recordings/example_iq_recording.h5` into any repository via **New Repository → Upload Files** or by pushing via Git LFS.
2. Navigate to **Library** in the top navigation bar.
3. Select the **Recordings** tab. Your file will appear with metadata and a spectrogram thumbnail.
4. Click the file to open the detail view — you can inspect signal properties, view the spectrogram, and copy the file to another repository.

**Supported asset types in the Library:**

| Type | Extension | Description |
|------|-----------|-------------|
| Recording | `.h5` / `.hdf5` | Raw IQ capture files |
| Radio Dataset | `.h5` / `.hdf5` | Labelled, curated training datasets |
| PyTorch Module | `.py` | PyTorch model definitions with a nn.Module class |
| PyTorch State Dict | `.pt` / `.pth` | Model weights / state dictionaries |
| PyTorch Checkpoint | `.ckpt` | Training checkpoints with weights, optimizer state, and metadata |
| ONNX Graph | `.onnx` | Portable inference models |

---

### Dataset Manager — Curator

The Curator takes raw IQ recordings and produces a labelled, ready-to-train HDF5 dataset. It applies a configurable DSP pipeline: slicing, quality filtering, and optional augmentation.

**Example files:** `recordings/example_iq_recording.h5`, `curator-configs/example_curator_config.json`  
**Expected output:** `datasets/example_radio_dataset.h5`

**Steps:**
1. Upload `example_iq_recording.h5` to a repository (the Curator reads from the Library).
2. Go to **Dataset Manager → Curator**.
3. Select your recording from the Library panel on the left.
4. Configure the pipeline using the settings below, or load `example_curator_config.json` as a reference:
   - **Data type:** `IQ`
   - **Slicer:** `simple` — slice length `1024`
   - **Qualifier:** `rms` — minimum threshold `0.01` (filters out silent/noise-only slices)
   - **Augmentation:** `basic` policy — `2` augmented copies per slice
5. Set a dataset name, description, and radio task label.
6. Click **Curate**. A progress bar tracks the Celery task.
7. When complete, commit the output dataset to your repository.

**Slicer options:**

| Slicer | Best for |
|--------|----------|
| `simple` | Fixed-length slices, good starting point |
| `random` | Randomized slice positions |
| `overlap` | Overlapping slices for smaller datasets |

**Qualifier options:**

| Qualifier | Filters on |
|-----------|-----------|
| `rms` | Root mean square amplitude |
| `snr` | Signal-to-noise ratio estimate |
| `energy` | Total signal energy |
| `bandwidth` | Occupied bandwidth |

---

### Dataset Manager — Inspector

The Inspector runs diagnostic analysis on an existing dataset — class balance, per-class statistics, anomaly detection, and dataset comparisons.

**Example file:** `datasets/example_radio_dataset.h5`

**Steps:**
1. Go to **Dataset Manager → Inspector**.
2. Select `example_radio_dataset.h5` from the file picker.
3. Choose an analysis type:
   - **Balance** — see how many samples exist per class label
   - **Per-Class Stats** — per-class mean, std, and distribution
   - **Anomaly Detection** — flag outlier samples
   - **Compare** — select a second dataset to diff against

---

### Dataset Manager — Generator

The Generator creates synthetic labelled datasets from a parameter sweep without requiring any hardware or recordings.

**Expected output:** `datasets/example_synthetic_dataset.h5`

**Steps:**
1. Go to **Dataset Manager → Generator**.
2. Configure a modulation sweep:
   - **Sampling strategy:** `grid`
   - **Parameters:** SNR from `-5` to `20` dB in steps of `5`; modulation types: `[BPSK, QPSK, 8PSK, 16QAM]`
   - **Signal:** length `1024`, sample rate `1e6`
   - **Channel model:** `awgn`
   - **Output backend:** `pytorch`
3. Click **Generate**. The task runs in the background.
4. Download the resulting `.h5` file when complete.

---

### Model Builder — Model Trainer

The Model Trainer builds a training workflow YAML and commits it to your repository. A Gitea Actions runner then executes the training job.

**Example files:** `datasets/example_radio_dataset.h5`, `models/example_model.ckpt` (optional pre-trained start)  
**Expected output:** `.riahub/workflows/train.yaml` in your repository, plus a trained `example_model.ckpt` artifact

**Steps:**
1. Go to **Model Builder → Model Trainer**.
2. In **Repository**, select the repository where you want to store the workflow and output artifacts.
3. In **Model**, choose an architecture (e.g. `ResNet1D`) or use `example_model.ckpt` as a starting checkpoint.
4. In **Dataset**, select `example_radio_dataset.h5` from the Library.
5. Configure training:
   - **Optimizer:** `Adam`, learning rate `1e-3`
   - **Epochs:** `20`
   - **Batch size:** `64`
   - **Criterion:** `CrossEntropyLoss`
6. Enable **ONNX Export** in the Evaluation section to automatically export the trained model.
7. Click **Commit Workflow**. A `train.yaml` is committed to `.riahub/workflows/` and a CI run starts.
8. Monitor the run in **Actions** within your repository.

The committed workflow file matches `workflows/train.yaml` in this repository.

---

### Model Builder — Hyperparameter Optimization

HPO runs a sweep over a configurable search space, training multiple model variants and ranking them by a target metric.

**Example file:** `datasets/example_radio_dataset.h5`  
**Expected output:** `.riahub/workflows/hpo.yaml`

**Steps:**
1. Go to **Model Builder → HPO**.
2. Configure the same model and dataset as in Model Trainer.
3. In the **Search Space** panel, define ranges to sweep:
   - Learning rate: `1e-4` to `1e-2` (log scale)
   - Optimizer: `[Adam, SGD]`
   - Batch size: `[32, 64, 128]`
4. Set **Trials:** `12` and **Target metric:** `val_accuracy`.
5. Click **Commit Workflow**. See `workflows/hpo.yaml` for the expected output format.

---

### Model Builder — Model Compression

Compression applies pruning and/or quantization to reduce model size for edge deployment. The output is an ONNX file.

**Example files:** `models/example_model.ckpt`, `datasets/example_radio_dataset.h5`  
**Expected output:** `models/example_model.onnx`

**Steps:**
1. Go to **Model Builder → Compression**.
2. Select `example_model.ckpt` as the source model and `example_radio_dataset.h5` as the calibration dataset.
3. Configure the compression pipeline (pruning ratio, quantization bits).
4. Click **Commit Workflow**. The Actions job exports the compressed model to ONNX automatically.
5. The resulting `.onnx` file is committed back to your repository.

---

### Application Packager — Application Composer

The Application Composer is a visual node-graph editor for wiring together C++ operator blocks into an inference application. The output is an application JSON file.

**Example file:** `applications/example_application.json`

**Steps:**
1. Go to **Application Packager → Application Composer**.
2. Browse the **Operators** panel on the left. Drag an operator onto the canvas.
3. Wire operator ports together by dragging from an output port to an input port.
4. Configure each operator's parameters in the sidebar.
5. Click **Commit Application** to save `example_application.json` to your repository.
6. Click **Build** to trigger a build workflow on a registered runner.

The application JSON format is documented in `schemas/application/ria_application.schema.json`. See `applications/example_application.json` for a minimal working example.

**Target profiles:**

| Profile | Use when |
|---------|----------|
| `native-x86` | Standard x86 Linux deployment |
| `native-arm64` | ARM edge devices |
| `nvidia-x86` | GPU-accelerated inference on x86 |
| `RIA Screens` | Web based runtime environment to monitor app |


---

### Screens

Screens deploys a packaged RF inference application to a live pipeline. You build an app from Application Composer, configure a data source (live SDR, file playback, or synthetic), and start the pipeline. Results stream back to the browser in real time.


#### App package format

A Screens app package is a `.tar.gz` containing:
- `manifest.json` — describes the app (models, GUI layout, data source, preprocessor)
- ONNX model file(s) at the path(s) listed in `manifest.models[].path`


**Data source types:**

| Type | Description |
|------|-------------|
| `synthetic` | Built-in AWGN tone generator — no hardware required |
| `recording` | Play back a `.h5` IQ recording from the Library |
| `sdr` | Live data from a connected SDR device |
| `agent` | Live data from a remote SDR via an edge agent node |
| `numpy_raw` | Play back a `.npy` raw IQ file |

#### ONNX model requirements

The Zone Fingerprinting model contract:
- **Input:** `iq_features` — shape `[1, 128]`, dtype `float32`
- **Output:** `scores` — shape `[1, 5]`, dtype `float32` (softmax probabilities)
- **Opset:** >= 13

When building your own Screens app, export your model to ONNX with matching input/output names and shapes, then reference them in `manifest.json`.

---

### RIA Projects

Projects group your datasets, models, training runs, and deployed applications into a single tracked entity. The project dashboard shows a three-stage pipeline view: Data Management → Model Building → Deployment.

**Steps:**
1. Go to **Projects → New Project**.
2. Name your project and create it.
3. Link assets from the Library using the **Link Asset** button on each pipeline stage.
4. As you run Curator, Model Trainer, and Screens jobs, link the outputs to track progress through the pipeline.

---

## File Format Reference

### HDF5 Radio Dataset (`.h5`)

Curated and generated datasets share a common HDF5 layout:

```
dataset.h5
├── data/           # IQ samples, shape [N, slice_length, 2] (float32)
├── labels/         # Integer class labels, shape [N]
├── metadata/       # Recording metadata carried through from source
└── attrs           # Dataset-level attributes: name, version, radio_task, backend
```


### Application JSON (`application.json`)

```json
{
  "app_name": "my_inference_app",
  "backend": "native",
  "target_profile": "native-x86",
  "ops": [
    {
      "name": "source",
      "class_name": "UDPSourceOp",
      "type": "source",
      "inputs": [],
      "outputs": [{ "name": "output" }],
      "specs": [
        { "name": "port", "value": "5000", "arg_type": "int" }
      ]
    }
  ],
  "flows": [
    {
      "upstream": "source",
      "downstream": "inference",
      "port_pairs": { "output": "input" }
    }
  ]
}
```

---

## Getting Help

- Full platform documentation is available in the **Docs** section of RIA Hub.
- Open an issue in this repository if an example file is missing, broken, or out of date.
- For tool-specific questions, use the in-app help panels (the `?` icon on each tool page).