fix: harden annotation pipeline and CLI robustness

- Replace bare metadata["sample_rate"] access with .get() + clear ValueError in threshold_qualifier, energy_detector, cusum_annotator, parallel_signal_separator, and signal_isolation - Add --sample-rate option to energy, threshold, cusum, and separate CLI commands with a pre-flight error if sample rate is still absent - Normalize namespaced metadata keys (e.g. BlockGenerator:Foo:sample_rate) to standard keys on legacy .npy load - Cap threshold_qualifier smoothing window at 1% of signal length to prevent over-smoothing short recordings into a flat envelope - Warn when threshold or energy detector returns 0 annotations due to constant-envelope signal; point to cusum as the right tool - Enforce --overwrite before any work begins; error fires before load and detection, not after - Fix qualify_slice off-by-one that silently dropped the last slice - Surface split failures in parallel_signal_separator via warnings.warn instead of swallowing them silently - Add threshold annotation example image to getting_started docs
2026-04-28 16:31:35 -04:00 · 2026-04-28 16:31:35 -04:00 · 0a1bef8453
commit 0a1bef8453
parent e5a3d327e5
10 changed files with 328 additions and 216 deletions
--- a/docs/_build/html/_sources/intro/getting_started.rst.txt
+++ b/docs/_build/html/_sources/intro/getting_started.rst.txt
@ -6,10 +6,10 @@ This is a practical reference for the ``ria`` CLI from ``ria-toolkit-oss``.
 **Scope of this guide:**
-* Installation and setup
+* **Installation and SDR driver prerequisites** — how to install RIA Toolkit OSS and configure the system drivers your hardware requires
-* End-to-end CLI workflows
+* **End-to-end CLI workflow** — a step-by-step walkthrough from hardware discovery through capture, annotation, and processing
-* Full command reference for CLI features
+* **Full command reference** — options, flags, and examples for every ``ria`` command
-* Brief scripting section
+* **Python scripting preview** — using the toolkit API directly without the CLI
 **Official resources:**
@ -18,76 +18,15 @@ This is a practical reference for the ``ria`` CLI from ``ria-toolkit-oss``.
 * `PyPI package <https://pypi.org/project/ria-toolkit-oss/>`_
 * `RIA Hub Conda package <https://riahub.ai/qoherent/-/packages/conda/ria-toolkit-oss>`_
 .. contents:: Contents
   :local:
   :depth: 2
   :backlinks: none
 1) Installation and Setup
 ==========================
-1.1 Installation with Conda
+Before using the ``ria`` CLI, follow the :doc:`Installation <installation>` guide to
----------------------------
+install RIA Toolkit OSS and any SDR drivers required for your hardware.
 RIA Toolkit OSS is available as a Conda package on RIA Hub. This is typically the easiest
 path when using SDR tooling that depends on native/system libraries.
 .. code-block:: bash
   conda update --force conda
   conda config --add channels https://riahub.ai/api/packages/qoherent/conda
   conda activate base
   conda install ria-toolkit-oss
 Verify:
 .. code-block:: bash
   conda list | grep ria-toolkit-oss
-1.2 Installation with pip
+1.1 SDR driver prerequisites
 --------------------------
 Use pip unless you specifically need to edit toolkit source.
 .. code-block:: bash
   python3 -m venv .venv
   source .venv/bin/activate
   pip install --upgrade pip
   pip install ria-toolkit-oss
 Verify CLI entrypoint:
 .. code-block:: bash
   ria --help
 ``pyproject.toml`` defines two script entry points:
 * ``ria``
 * ``ria-tools``
 Both point to the same CLI module (``ria_toolkit_oss_cli.cli:cli``).
 1.3 Optional install from source
 ----------------------------------
 Use this for local development or testing unreleased changes.
 .. code-block:: bash
   git clone https://riahub.ai/qoherent/ria-toolkit-oss.git
   cd ria-toolkit-oss
   python3 -m venv .venv
   source .venv/bin/activate
   pip install -e .
 1.4 SDR driver prerequisites
 -----------------------------
 Toolkit package install does not install all system SDR drivers. Install vendor/runtime
@ -95,11 +34,22 @@ dependencies for the hardware you use.
 Examples (depends on device and OS):
-* USRP: UHD drivers
+.. list-table::
-* Pluto: libiio / IIO utilities
+   :widths: 25 75
-* BladeRF: libbladeRF
+   :header-rows: 1
-* HackRF: libhackrf
+
-* RTL-SDR: librtlsdr
+   * - Device
     - Driver Package
   * - USRP
     - UHD drivers
   * - Pluto
     - libiio / IIO utilities
   * - BladeRF
     - libbladeRF
   * - HackRF
     - libhackrf
   * - RTL-SDR
     - librtlsdr
 See repo docs under ``docs/source/sdr_guides/*`` and your OS package instructions.
@ -119,18 +69,34 @@ Top-level CLI follows this model:
 **Top-level commands:**
-* ``discover``
+.. list-table::
-* ``init``
+   :widths: 25 75
-* ``capture``
+   :header-rows: 1
-* ``view``
+
-* ``annotate`` (group)
+   * - Command
-* ``convert``
+     - Purpose
-* ``split``
+   * - :ref:`discover <cmd-discover>`
-* ``combine``
+     - Probe SDR drivers and enumerate attached hardware
-* ``generate`` (group)
+   * - :ref:`init <cmd-init>`
-* ``transform`` (group)
+     - Create and manage user metadata defaults
-* ``transmit``
+   * - :ref:`capture <cmd-capture>`
-* ``synth`` (alias of ``generate`` in command bindings)
+     - Record IQ samples from a connected SDR
   * - :ref:`view <cmd-view>`
     - Generate visualizations from IQ files
   * - :ref:`annotate <cmd-annotate>`
     - Label signal regions manually or with auto-detection (group)
   * - :ref:`convert <cmd-convert>`
     - Convert between IQ file formats
   * - :ref:`split <cmd-split>`
     - Split, trim, or extract recordings
   * - :ref:`combine <cmd-combine>`
     - Merge multiple recordings by concatenation or addition
   * - :ref:`generate / synth <cmd-generate>`
     - Generate synthetic IQ signals (group; ``synth`` is an alias)
   * - :ref:`transform <cmd-transform>`
     - Apply augmentations or impairments to recordings (group)
   * - :ref:`transmit <cmd-transmit>`
     - Transmit IQ through a TX-capable SDR
 3) Quick End-to-End Workflow
@ -158,10 +124,8 @@ provenance fields.
 .. code-block:: bash
   ria init
-   # or non-interactive
+   ria init --author "Jane Doe" --project "rf-campaign-1" --location "Lab-A"  # non-interactive
-   ria init --author "Jane Doe" --project "rf-campaign-1" --location "Lab-A"
+   ria init --show  # show config
   # show config
   ria init --show
 3.3 Capture IQ
@ -227,13 +191,14 @@ Replay recorded or synthesized IQ through a transmit-capable SDR.
 .. code-block:: bash
   ria transmit -d hackrf -f 2.44G -s 2e6 --input capture.sigmf-data
-   # or generated waveform
+   ria transmit -d hackrf --generate lfm --continuous  # generated waveform
   ria transmit -d hackrf --generate lfm --continuous
 4) Command Reference
 =====================
 .. _cmd-discover:
 4.1 ``discover``
 -----------------
@ -263,6 +228,8 @@ Replay recorded or synthesized IQ through a transmit-capable SDR.
  hidden in default output.
 .. _cmd-init:
 4.2 ``init``
 -------------
@ -309,6 +276,8 @@ Replay recorded or synthesized IQ through a transmit-capable SDR.
   generate metadata, and YAML config loading paths).
 .. _cmd-capture:
 4.3 ``capture``
 ----------------
@ -382,6 +351,8 @@ Device selection (``--device``) is optional if only one device is detected. Exac
   ria capture -c capture_config.yaml
 .. _cmd-view:
 4.4 ``view``
 -------------
@ -442,7 +413,21 @@ Device selection (``--device``) is optional if only one device is detected. Exac
   ria view capture.npy --type full --title "Test Capture" --format pdf
   ria view capture.npy --show --no-save
   ria view old.npy --legacy --type simple
   ria view recordings\qam64_35.npy --type simple
   ria view recordings\qam64_35.npy --type full
 .. figure:: ../images/recordings/qam64_35.png
   :alt: Example output of ria view recordings\qam64_35.npy --type simple
   Output of ``ria view recordings\qam64_35.npy --type simple``
 .. figure:: ../images/recordings/qam64_35-full.png
   :alt: Example output of ria view recordings\qam64_35.npy --type full
   Output of ``ria view recordings\qam64_35.npy --type full``
 .. _cmd-annotate:
 4.5 ``annotate`` group
 -----------------------
@ -459,8 +444,30 @@ Device selection (``--device``) is optional if only one device is detected. Exac
   ria annotate <subcommand> ...
-**Subcommands:** ``list``, ``add``, ``remove``, ``clear``, ``energy``, ``cusum``,
+**Subcommands:**
-``threshold``, ``separate``
+
 .. list-table::
   :widths: 25 75
   :header-rows: 1
   * - Subcommand
     - Purpose
   * - ``list``
     - Inspect all annotations on a recording
   * - ``add``
     - Add one annotation with explicit sample-domain bounds
   * - ``remove``
     - Remove one annotation by index
   * - ``clear``
     - Remove all annotations from a recording
   * - ``energy``
     - Auto-detect regions above the estimated noise floor
   * - ``cusum``
     - Auto-detect regime changes using change-point detection
   * - ``threshold``
     - Auto-detect regions using normalized magnitude thresholding
   * - ``separate``
     - Decompose annotations into narrower spectral components
 **General behavior:**
@ -587,8 +594,16 @@ annotations.
   ria annotate add capture.sigmf-data --start 10000 --count 5000 --label burst
   ria annotate energy capture.sigmf-data --label signal --threshold 1.3
   ria annotate cusum capture.sigmf-data --min-duration 5
   ria annotate threshold recordings/sample_recording3.npy --threshold 0.7 --label 70%
   ria annotate separate capture.sigmf-data --indices 0,1 --verbose
 .. figure:: ../images/recordings/sample_recording3_annotated.png
   :alt: Example output of ria annotate threshold recordings/sample_recording3.npy --threshold 0.7 --label 70%
   Output of ``ria annotate threshold recordings/sample_recording3.npy --threshold 0.7 --label 70%``
 .. _cmd-convert:
 4.6 ``convert``
 ----------------
@ -629,6 +644,8 @@ inferred from the output file extension.
   ria convert old.npy --format sigmf --legacy --overwrite
 .. _cmd-split:
 4.7 ``split``
 --------------
@ -670,6 +687,8 @@ Choose exactly one operation per invocation:
   ria split annotated.sigmf-data --extract-annotations --annotation-label payload
 .. _cmd-combine:
 4.8 ``combine``
 ----------------
@ -717,6 +736,8 @@ Choose exactly one operation per invocation:
   ria combine signal.npy pattern.npy out.npy --mode add --align-mode repeat-spaced --repeat-spacing 10000
 .. _cmd-generate:
 4.9 ``generate`` group (and ``synth`` alias)
 ---------------------------------------------
@ -728,15 +749,34 @@ Choose exactly one operation per invocation:
 ``ria synth ...`` is an alias for ``ria generate ...``.
-**Shape:**
+**Usage:**
 .. code-block:: bash
   ria generate <subcommand> [subcommand options] [common options]
 **Available subcommands:**
-``tone``, ``noise``, ``chirp``, ``square``, ``sawtooth``, ``qam``, ``apsk``, ``pam``,
+
-``fsk``, ``ook``, ``oqpsk``, ``gmsk``, ``psk``
+.. list-table::
   :widths: 30 70
   :header-rows: 1
   * - Subcommand(s)
     - Description
   * - ``tone``
     - Clean sinusoidal calibration/reference source
   * - ``noise``
     - Baseline noise floor data or controlled additive-noise synthesis
   * - ``chirp``
     - Sweep-based radar/sonar-style signals and bandwidth occupancy tests
   * - ``square``, ``sawtooth``
     - Periodic waveform primitives
   * - ``qam``, ``apsk``, ``pam``, ``psk``
     - Digital modulation families with pulse-shaping filter support
   * - ``fsk``
     - Frequency-shift keying with configurable tone spacing
   * - ``ook``, ``oqpsk``, ``gmsk``
     - On-off keying and continuous-phase modulation schemes
 **Common options shared across all generators:**
@ -760,22 +800,16 @@ Multipath and IQ imbalance flags apply impairment-style post-processing during g
 Options: ``--frequency``, ``--amplitude``, ``--phase``
 Clean sinusoidal calibration/reference source.
 ``noise``
 ~~~~~~~~~~
 Options: ``--noise-type {gaussian,uniform}``, ``--power``
 Baseline noise floor data or controlled additive-noise synthesis.
 ``chirp``
 ~~~~~~~~~~
 Options: ``--bandwidth`` (required), ``--period`` (required), ``--type {up,down,up_down}``
 Sweep-based radar/sonar-style signals and bandwidth occupancy tests.
 ``square``
 ~~~~~~~~~~~
@ -826,6 +860,8 @@ symbol transition sharpness).
   ria synth psk -s 2e6 -r 100e3 -M 8 -N 8000 -o psk8.npy
 .. _cmd-transform:
 4.10 ``transform`` group
 -------------------------
@ -834,7 +870,7 @@ symbol transition sharpness).
 * Apply algorithmic transforms to existing recordings.
 * Run reusable augmentations/impairments for dataset diversity and robustness testing.
-**Shape:**
+**Usage:**
 .. code-block:: bash
@ -895,6 +931,8 @@ inspect parameter hints. ``--view`` writes a PNG preview alongside transform out
   ria transform custom my_filter in.npy out.npy --transform-dir ./my_transforms --params cutoff=0.2
 .. _cmd-transmit:
 4.11 ``transmit``
 ------------------
@ -993,17 +1031,7 @@ experiment-specific fields on the CLI.
   ria generate noise --config generate.yaml
-6) Practical Tips and Safety
+6) Version Notes
 =============================
 * Use ``ria discover`` before capture/transmit sessions.
 * Keep TX gain conservative first; validate with attenuators/dummy loads when needed.
 * Prefer SigMF for interoperable metadata and annotations.
 * For long workflows, keep outputs organized by campaign directories and consistent prefixes.
 * Use ``--verbose`` when debugging device init or driver issues.
 7) Version Notes
 =================
 These notes are based on the current implementation and should be re-validated against future
@ -1016,18 +1044,19 @@ releases.
 3. Multiple non-CLI modules still import ``utils.*``, which can create runtime dependency
   coupling when using only ``ria-toolkit-oss`` in isolation.
 .. tip::
   If you observe unexpected import errors after install, check the package version and
   changelog, then test ``ria --help`` in a clean virtual environment.
-8) Brief Scripting (Python) Preview
+7) Brief Scripting (Python) Preview
 =====================================
 For quick non-CLI use:
 .. code-block:: python
-   from ria_toolkit_oss.datatypes import Recording
+   from ria_toolkit_oss.data import Recording
   from ria_toolkit_oss.io import load_recording, to_sigmf
   from ria_toolkit_oss.transforms import iq_augmentations, iq_impairments
@ -1037,47 +1066,3 @@ For quick non-CLI use:
   to_sigmf(imp, filename="capture_awgn", path=".")
 You can also call annotation algorithms and block-generator primitives from Python directly.
 9) Cheat Sheet
 ===============
 .. code-block:: bash
   # Install
   pip install ria-toolkit-oss
   # Discover
   ria discover -v
   # Init defaults
   ria init --author "Jane" --project "rf1" --location "Lab-A"
   # Capture
   ria capture -d pluto -f 2.44G -s 2e6 -n 1000000 -o cap.sigmf-data
   # View
   ria view cap.sigmf-data --type simple
   # Annotate
   ria annotate energy cap.sigmf-data --threshold 1.2
   ria annotate list cap.sigmf-data --verbose
   # Convert
   ria convert cap.sigmf-data cap.npy
   # Split
   ria split cap.sigmf-data --split-every 100000 --output-dir chunks
   # Combine
   ria combine chunks/a.npy chunks/b.npy merged.npy
   # Generate
   ria generate qam -s 2e6 -r 100e3 -M 16 -N 5000 -o qam16.npy
   # Transform
   ria transform augment channel_swap cap.npy
   ria transform impair add_awgn_to_signal cap.npy --params snr=10
   # Transmit
   ria transmit -d hackrf --input cap.sigmf-data -f 2.44G -s 2e6
--- a/docs/source/intro/getting_started.rst
+++ b/docs/source/intro/getting_started.rst
@ -594,8 +594,14 @@ annotations.
   ria annotate add capture.sigmf-data --start 10000 --count 5000 --label burst
   ria annotate energy capture.sigmf-data --label signal --threshold 1.3
   ria annotate cusum capture.sigmf-data --min-duration 5
   ria annotate threshold recordings/sample_recording3.npy --threshold 0.7 --label 70%
   ria annotate separate capture.sigmf-data --indices 0,1 --verbose
 .. figure:: ../images/recordings/sample_recording3_annotated.png
   :alt: Example output of ria annotate threshold recordings/sample_recording3.npy --threshold 0.7 --label 70%
   Output of ``ria annotate threshold recordings/sample_recording3.npy --threshold 0.7 --label 70%``
 .. _cmd-convert:
--- a/src/ria_toolkit_oss/annotations/cusum_annotator.py
+++ b/src/ria_toolkit_oss/annotations/cusum_annotator.py
@ -38,7 +38,12 @@ def annotate_with_cusum(
    :type annotation_type: str
    """
-    sample_rate = recording.metadata["sample_rate"]
+    sample_rate = recording.metadata.get("sample_rate")
    if sample_rate is None:
        raise ValueError(
            "Recording metadata does not contain 'sample_rate'. "
            "Supply it with --sample-rate when using the CLI, or set recording.sample_rate before calling this function."
        )
    center_frequency = recording.metadata.get("center_frequency", 0)
    # Create an object of the time segmenter
--- a/src/ria_toolkit_oss/annotations/energy_detector.py
+++ b/src/ria_toolkit_oss/annotations/energy_detector.py
@ -6,6 +6,7 @@ and occupied bandwidth calculation following ITU-R SM.328 standard.
 """
 import json
 import warnings
 from typing import Tuple
 import numpy as np
@ -119,6 +120,17 @@ def detect_signals_energy(
    if active:
        boundaries.append((start, len(smoothed_power) - start))
    if not boundaries and noise_floor > 0:
        peak = float(np.max(smoothed_power))
        dynamic_range = peak / noise_floor
        if dynamic_range < threshold_factor:
            warnings.warn(
                f"detect_signals_energy: no signal boundaries found — dynamic range {dynamic_range:.2f}x is below "
                f"the threshold factor {threshold_factor}x. The signal may be constant-envelope (e.g. CW or chirp). "
                "If the entire recording is signal, use 'ria annotate cusum' to segment it as a single region.",
                stacklevel=2,
            )
    # Merge boundaries that are closer than min_distance
    merged_boundaries = []
    if boundaries:
@ -135,7 +147,12 @@ def detect_signals_energy(
        merged_boundaries.append((start, length))
    # Create annotations from detected boundaries
-    sample_rate = recording.metadata["sample_rate"]
+    sample_rate = recording.metadata.get("sample_rate")
    if sample_rate is None:
        raise ValueError(
            "Recording metadata does not contain 'sample_rate'. "
            "Supply it with --sample-rate when using the CLI, or set recording.sample_rate before calling this function."
        )
    center_frequency = recording.metadata.get("center_frequency", 0)
    # Validate frequency method
@ -351,7 +368,12 @@ def annotate_with_obw(
        >>> annotated = annotate_with_obw(recording, label="signal_obw")
    """
    signal = recording.data[0]
-    sample_rate = recording.metadata["sample_rate"]
+    sample_rate = recording.metadata.get("sample_rate")
    if sample_rate is None:
        raise ValueError(
            "Recording metadata does not contain 'sample_rate'. "
            "Set recording.sample_rate before calling this function."
        )
    center_freq = recording.metadata.get("center_frequency", 0)
    # Calculate OBW
--- a/src/ria_toolkit_oss/annotations/parallel_signal_separator.py
+++ b/src/ria_toolkit_oss/annotations/parallel_signal_separator.py
@ -49,6 +49,7 @@ allowing splitting of overlapping signals into separate training samples.
 """
 import json
 import warnings
 from typing import List, Optional, Tuple
 import numpy as np
@ -401,7 +402,12 @@ def split_recording_annotations(
        return recording
    signal = recording.data[0]
-    sample_rate = recording.metadata["sample_rate"]
+    sample_rate = recording.metadata.get("sample_rate")
    if sample_rate is None:
        raise ValueError(
            "Recording metadata does not contain 'sample_rate'. "
            "Supply it with --sample-rate when using the CLI, or set recording.sample_rate before calling this function."
        )
    center_frequency = recording.metadata.get("center_frequency", 0.0)
    # Build new annotation list
@ -425,8 +431,11 @@ def split_recording_annotations(
                else:
                    # No components found, keep original
                    new_annotations.append(anno)
-            except Exception:
+            except Exception as e:
-                # Split failed for any reason, keep original
+                warnings.warn(
                    f"split_recording_annotations: failed to split annotation at index {i} ({e}); keeping original.",
                    stacklevel=2,
                )
                new_annotations.append(anno)
        else:
            # Not in split list, keep as-is
--- a/src/ria_toolkit_oss/annotations/qualify_slice.py
+++ b/src/ria_toolkit_oss/annotations/qualify_slice.py
@ -24,7 +24,7 @@ def qualify_slice_from_annotations(recording: Recording, slice_length: int):
    output_recordings = []
-    for i in range((len(recording.data[0]) // slice_length) - 1):
+    for i in range(len(recording.data[0]) // slice_length):
        start_index = slice_length * i
        end_index = slice_length * (i + 1)
--- a/src/ria_toolkit_oss/annotations/signal_isolation.py
+++ b/src/ria_toolkit_oss/annotations/signal_isolation.py
@ -35,17 +35,24 @@ def isolate_signal(recording: Recording, annotation: Annotation) -> Recording:
    isolation_bw = anno_bw
    sample_rate = recording.metadata.get("sample_rate")
    if sample_rate is None:
        raise ValueError(
            "Recording metadata does not contain 'sample_rate'. "
            "Set recording.sample_rate before calling isolate_signal."
        )
    # frequency shift the center of the box about zero
    shifted_signal_slice = frequency_shift_iq_samples(
        iq_samples=signal_slice,
-        sample_rate=recording.metadata["sample_rate"],
+        sample_rate=sample_rate,
        shift_frequency=-1 * anno_base_center_freq,
    )
    # filter
-    if isolation_bw < recording.metadata["sample_rate"] - 1:
+    if isolation_bw < sample_rate - 1:
        filtered_signal = apply_complex_lowpass_filter(
-            signal=shifted_signal_slice, cutoff_frequency=isolation_bw, sample_rate=recording.metadata["sample_rate"]
+            signal=shifted_signal_slice, cutoff_frequency=isolation_bw, sample_rate=sample_rate
        )
    else:
--- a/src/ria_toolkit_oss/annotations/threshold_qualifier.py
+++ b/src/ria_toolkit_oss/annotations/threshold_qualifier.py
@ -42,6 +42,7 @@ classification or demodulation stages.
 """
 import json
 import warnings
 from typing import Optional
 import numpy as np
@ -216,11 +217,21 @@ def threshold_qualifier(
    """
    # Extract signal and metadata
    sample_data = recording.data[channel]
-    sample_rate = recording.metadata["sample_rate"]
+    sample_rate = recording.metadata.get("sample_rate")
    if sample_rate is None:
        raise ValueError(
            "Recording metadata does not contain 'sample_rate'. "
            "Supply it with --sample-rate when using the CLI, or set recording.sample_rate before calling this function."
        )
    center_frequency = recording.metadata.get("center_frequency", 0)
    n_samples = len(sample_data)
    if window_size is None:
        window_size = max(64, int(sample_rate * 0.001))
    # Cap at 1% of signal length so short recordings aren't over-smoothed into
    # a flat envelope that collapses the dynamic range below the early-exit guard.
    window_size = min(window_size, max(64, n_samples // 100))
    # --- 1. SIGNAL CONDITIONING ---
    # Convert to power (Magnitude squared)
@ -237,6 +248,12 @@ def threshold_qualifier(
    # Soft early exit: keep a guard for low-contrast noise, but compute it from
    # the quieter tail of the envelope so burst-heavy captures are not rejected.
    if dynamic_range_ratio < 1.5:
        warnings.warn(
            f"threshold_qualifier: dynamic range ratio {dynamic_range_ratio:.2f} is below 1.5 — "
            "the signal appears to be constant-envelope or pure noise, so no burst boundaries can be found. "
            "If the entire recording is signal, use 'ria annotate cusum' to segment it as a single region.",
            stacklevel=2,
        )
        return Recording(data=recording.data, metadata=recording.metadata, annotations=recording.annotations)
    trigger_val = noise_floor + threshold * (max_power - noise_floor)
@ -296,7 +313,7 @@ def threshold_qualifier(
    # burst energy does not bleed through the long window into adjacent regions,
    # which would inflate macro_residual_max and push the trigger above the
    # faint burst's average power.
-    macro_window_size = max(window_size * 16, int(sample_rate * 0.02))
+    macro_window_size = min(max(window_size * 16, int(sample_rate * 0.02)), max(window_size * 2, n_samples // 4))
    macro_kernel = np.ones(macro_window_size, dtype=np.float64) / macro_window_size
    # Expand each annotated range by half the macro window on both sides so that
    # the long convolution cannot "see" the leading/trailing edges of already-
--- a/src/ria_toolkit_oss/io/recording.py
+++ b/src/ria_toolkit_oss/io/recording.py
@ -175,6 +175,15 @@ def from_npy(file: os.PathLike | str, legacy: bool = False) -> Recording:
            )
            data = first  # already loaded without pickle (numeric array)
            metadata = np.load(f, allow_pickle=True).tolist()
            # Normalize namespaced keys (e.g. "BlockGenerator:Foo:sample_rate") to
            # their bare equivalents so downstream code can find them reliably.
            _STANDARD_KEYS = {"sample_rate", "center_frequency", "bandwidth"}
            if isinstance(metadata, dict):
                for k in list(metadata):
                    if ":" in k:
                        bare = k.rsplit(":", 1)[-1]
                        if bare in _STANDARD_KEYS and bare not in metadata:
                            metadata[bare] = metadata[k]
            try:
                annotations = list(np.load(f, allow_pickle=True))
            except EOFError:
--- a/src/ria_toolkit_oss_cli/ria_toolkit_oss/annotate.py
+++ b/src/ria_toolkit_oss_cli/ria_toolkit_oss/annotate.py
@ -51,7 +51,7 @@ def detect_input_format(filepath):
        raise click.ClickException(f"Unknown format for '{filepath}'. Supported: .sigmf, .npy, .wav, .blue")
-def determine_output_path(input_path, output_path, fmt, quiet, overwrite):
+def determine_output_path(input_path, output_path, fmt, overwrite):
    input_path = Path(input_path)
    input_is_annotated = input_path.stem.endswith("_annotated")
@ -63,24 +63,20 @@ def determine_output_path(input_path, output_path, fmt, quiet, overwrite):
    else:
        target = input_path.with_name(f"{input_path.stem}_annotated{input_path.suffix}")
-    if fmt == "sigmf":
+    final_path = normalize_sigmf_path(target) if fmt == "sigmf" else target
        final_path = normalize_sigmf_path(target)
        if not quiet:
            click.echo(f"Saving SigMF metadata to: {final_path}")
    else:
        final_path = target
        if not quiet:
            click.echo(f"Saving to: {final_path}")
-    # Always allow writing to _annotated files; guard against overwriting originals
+    if final_path.exists() and not overwrite:
-    target_is_annotated = final_path.stem.endswith("_annotated")
+        raise click.ClickException(f"{final_path} already exists. Use --overwrite to replace it.")
    if final_path.exists() and not target_is_annotated and final_path != input_path:
        click.echo(f"Error: {final_path} is not an annotated file and cannot be overwritten.", err=True)
        return None
    return final_path
 def check_output_available(input_path, output_path, overwrite):
    """Raise ClickException before any work begins if the output file already exists."""
    fmt = detect_input_format(Path(input_path))
    determine_output_path(input_path=input_path, output_path=output_path, fmt=fmt, overwrite=overwrite)
 def save_recording_auto(recording, output_path, input_path, quiet=False, overwrite=False):
    """Save recording, auto-detecting format from extension.
@ -90,11 +86,16 @@ def save_recording_auto(recording, output_path, input_path, quiet=False, overwri
    input_path = Path(input_path)
    fmt = detect_input_format(input_path)
    # Determine output path
    output_path = determine_output_path(
-        input_path=input_path, output_path=output_path, fmt=fmt, quiet=quiet, overwrite=overwrite
+        input_path=input_path, output_path=output_path, fmt=fmt, overwrite=overwrite
    )
    if not quiet:
        if fmt == "sigmf":
            click.echo(f"Saving SigMF metadata to: {output_path}")
        else:
            click.echo(f"Saving to: {output_path}")
    if fmt == "sigmf":
        # Normalize path for SigMF
        base_path = output_path
@ -312,6 +313,8 @@ def add(input, start, count, label, freq_lower, freq_upper, comment, annotation_
    except Exception as e:
        raise click.ClickException(f"Failed to load recording: {e}")
    check_output_available(input, output, overwrite)
    # Validate sample range
    n_samples = len(recording.data[0])
    if start < 0:
@ -363,12 +366,9 @@ def add(input, start, count, label, freq_lower, freq_upper, comment, annotation_
        if comment:
            click.echo(f"  Comment: {comment}")
    try:
    save_recording_auto(recording, output, input, quiet, overwrite)
    if not quiet:
        click.echo("  ✓ Saved")
    except Exception as e:
        raise click.ClickException(f"Failed to save: {e}")
 # ============================================================================
@ -466,8 +466,6 @@ def clear(input, output, overwrite, force, quiet):
    if not quiet:
        click.echo(f"\nCleared {count_before} annotation(s)")
    recording._annotations = []
    try:
        save_recording_auto(recording, output_path=input, input_path=input, quiet=quiet, overwrite=True)
        if not quiet:
@ -503,6 +501,7 @@ def clear(input, output, overwrite, force, quiet):
    default="standalone",
    help="Annotation type",
 )
@click.option("--sample-rate", type=float, default=None, help="Sample rate in Hz (overrides metadata; required if not in file)")
@click.option("--output", "-o", type=click.Path(), help="Output file path")
@click.option("--overwrite", is_flag=True, help="Overwrite input file (non-SigMF only)")
@click.option("--quiet", is_flag=True, help="Quiet mode")
@ -517,6 +516,7 @@ def energy(
    nfft,
    obw_power,
    annotation_type,
    sample_rate,
    output,
    overwrite,
    quiet,
@ -539,8 +539,11 @@ def energy(
      ria annotate energy signal.npy --threshold 1.5 --min-distance 10000
      ria annotate energy signal.sigmf-data --freq-method obw
      ria annotate energy signal.sigmf-data --freq-method full-detected
      ria annotate energy signal.npy --sample-rate 1e6
    """
    check_output_available(input, output, overwrite)
    try:
        recording = load_recording(input)
        if not quiet:
@ -548,6 +551,15 @@ def energy(
    except Exception as e:
        raise click.ClickException(f"Failed to load recording: {e}")
    if sample_rate is not None:
        recording.sample_rate = sample_rate
    if recording.sample_rate is None:
        raise click.ClickException(
            "Recording metadata does not contain a sample rate. "
            "Provide it with --sample-rate (e.g. --sample-rate 1e6)."
        )
    if not quiet:
        click.echo("\nDetecting signals using energy-based method...")
        click.echo("  Time detection:")
@ -575,12 +587,12 @@ def energy(
        if not quiet:
            click.echo(f"  ✓ Added {added} annotation(s)")
    except Exception as e:
        raise click.ClickException(f"Energy detection failed: {e}")
    save_recording_auto(recording, output, input, quiet, overwrite)
    if not quiet:
        click.echo("  ✓ Saved")
    except Exception as e:
        raise click.ClickException(f"Energy detection failed: {e}")
 # ============================================================================
@ -601,10 +613,11 @@ def energy(
    default="standalone",
    help="Annotation type",
 )
@click.option("--sample-rate", type=float, default=None, help="Sample rate in Hz (overrides metadata; required if not in file)")
@click.option("--output", "-o", type=click.Path(), help="Output file path")
@click.option("--overwrite", is_flag=True, help="Overwrite input file (non-SigMF only)")
@click.option("--quiet", is_flag=True, help="Quiet mode")
-def cusum(input, label, min_duration, window_size, tolerance, annotation_type, output, overwrite, quiet):
+def cusum(input, label, min_duration, window_size, tolerance, annotation_type, sample_rate, output, overwrite, quiet):
    """Auto-detect segments using CUSUM method.
    Detects signal state changes (on/off, amplitude transitions). Best for
@ -616,7 +629,10 @@ def cusum(input, label, min_duration, window_size, tolerance, annotation_type, o
    Examples:
      ria annotate cusum signal.sigmf-data --min-duration 5.0
      ria annotate cusum data.npy --min-duration 10.0 --label state
      ria annotate cusum data.npy --sample-rate 1e6
    """
    check_output_available(input, output, overwrite)
    try:
        recording = load_recording(input)
        if not quiet:
@ -624,6 +640,15 @@ def cusum(input, label, min_duration, window_size, tolerance, annotation_type, o
    except Exception as e:
        raise click.ClickException(f"Failed to load recording: {e}")
    if sample_rate is not None:
        recording.sample_rate = sample_rate
    if recording.sample_rate is None:
        raise click.ClickException(
            "Recording metadata does not contain a sample rate. "
            "Provide it with --sample-rate (e.g. --sample-rate 1e6)."
        )
    if not quiet:
        click.echo("\nDetecting segments using CUSUM...")
        click.echo(f"  Min duration: {min_duration} ms")
@ -644,12 +669,12 @@ def cusum(input, label, min_duration, window_size, tolerance, annotation_type, o
        if not quiet:
            click.echo(f"  ✓ Added {added} annotation(s)")
    except Exception as e:
        raise click.ClickException(f"CUSUM detection failed: {e}")
    save_recording_auto(recording, output, input, quiet, overwrite)
    if not quiet:
        click.echo("  ✓ Saved")
    except Exception as e:
        raise click.ClickException(f"CUSUM detection failed: {e}")
 # ============================================================================
@ -675,10 +700,11 @@ def cusum(input, label, min_duration, window_size, tolerance, annotation_type, o
    help="Annotation type",
 )
@click.option("--channel", type=int, default=0, help="Channel index to annotate (default: 0)")
@click.option("--sample-rate", type=float, default=None, help="Sample rate in Hz (overrides metadata; required if not in file)")
@click.option("--output", "-o", type=click.Path(), help="Output file path")
@click.option("--overwrite", is_flag=True, help="Overwrite input file (non-SigMF only)")
@click.option("--quiet", is_flag=True, help="Quiet mode")
-def threshold(input, threshold, label, window_size, annotation_type, channel, output, overwrite, quiet):
+def threshold(input, threshold, label, window_size, annotation_type, channel, sample_rate, output, overwrite, quiet):
    """Auto-detect signals using threshold method.
    Detects samples above a percentage of maximum magnitude. Best for simple
@ -688,10 +714,13 @@ def threshold(input, threshold, label, window_size, annotation_type, channel, ou
    Examples:
      ria annotate threshold signal.sigmf-data --threshold 0.7 --label wifi
      ria annotate threshold data.npy --threshold 0.5 --window-size 2048
      ria annotate threshold data.npy --threshold 0.4 --sample-rate 1e6
    """
    if not (0.0 <= threshold <= 1.0):
        raise click.ClickException(f"--threshold must be between 0.0 and 1.0, got {threshold}")
    check_output_available(input, output, overwrite)
    try:
        recording = load_recording(input)
        if not quiet:
@ -699,11 +728,21 @@ def threshold(input, threshold, label, window_size, annotation_type, channel, ou
    except Exception as e:
        raise click.ClickException(f"Failed to load recording: {e}")
    if sample_rate is not None:
        recording.sample_rate = sample_rate
    if recording.sample_rate is None:
        raise click.ClickException(
            "Recording metadata does not contain a sample rate. "
            "Provide it with --sample-rate (e.g. --sample-rate 1e6)."
        )
    if not quiet:
        click.echo("\nDetecting signals using threshold qualifier...")
        click.echo(f"  Threshold: {threshold * 100:.1f}% of max magnitude")
        click.echo(f"  Window size: {'auto (1ms)' if window_size is None else f'{window_size} samples'}")
        click.echo(f"  Channel: {channel}")
        click.echo(f"  Sample rate: {recording.sample_rate:.0f} Hz")
    try:
        initial_count = len(recording.annotations)
@ -719,12 +758,12 @@ def threshold(input, threshold, label, window_size, annotation_type, channel, ou
        if not quiet:
            click.echo(f"  ✓ Added {added} annotation(s)")
    except Exception as e:
        raise click.ClickException(f"Threshold detection failed: {e}")
    save_recording_auto(recording, output, input, quiet, overwrite)
    if not quiet:
        click.echo("  ✓ Saved")
    except Exception as e:
        raise click.ClickException(f"Threshold detection failed: {e}")
 # ============================================================================
@ -738,11 +777,12 @@ def threshold(input, threshold, label, window_size, annotation_type, channel, ou
@click.option("--nfft", type=int, default=65536, help="FFT size for spectral analysis")
@click.option("--noise-threshold-db", type=float, help="Noise floor threshold in dB (auto-estimated if not specified)")
@click.option("--min-component-bw", type=float, default=50e3, help="Min component bandwidth in Hz")
@click.option("--sample-rate", type=float, default=None, help="Sample rate in Hz (overrides metadata; required if not in file)")
@click.option("--output", "-o", type=click.Path(), help="Output file path")
@click.option("--overwrite", is_flag=True, help="Overwrite input file (non-SigMF only)")
@click.option("--quiet", is_flag=True, help="Quiet mode")
@click.option("--verbose", is_flag=True, help="Verbose output (show detected components)")
-def separate(input, indices, nfft, noise_threshold_db, min_component_bw, output, overwrite, quiet, verbose):
+def separate(input, indices, nfft, noise_threshold_db, min_component_bw, sample_rate, output, overwrite, quiet, verbose):
    """
    Auto-detect parallel frequency-offset signals and split into sub-bands.
@ -768,6 +808,8 @@ def separate(input, indices, nfft, noise_threshold_db, min_component_bw, output,
      ria annotate separate signal.npy --min-component-bw 100000
    """
    check_output_available(input, output, overwrite)
    try:
        recording = load_recording(input)
        if not quiet:
@ -775,6 +817,15 @@ def separate(input, indices, nfft, noise_threshold_db, min_component_bw, output,
    except Exception as e:
        raise click.ClickException(f"Failed to load recording: {e}")
    if sample_rate is not None:
        recording.sample_rate = sample_rate
    if recording.sample_rate is None:
        raise click.ClickException(
            "Recording metadata does not contain a sample rate. "
            "Provide it with --sample-rate (e.g. --sample-rate 1e6)."
        )
    # Parse indices if specified
    indices_list = get_indices_list(indices=indices, recording=recording)
@ -821,8 +872,9 @@ def separate(input, indices, nfft, noise_threshold_db, min_component_bw, output,
                        f"{format_sample_count(ann.sample_start + ann.sample_count)}: {freq_range}"
                    )
    except Exception as e:
        raise click.ClickException(f"Spectral separation failed: {e}")
    save_recording_auto(recording, output, input, quiet, overwrite)
    if not quiet:
        click.echo("  ✓ Saved")
    except Exception as e:
        raise click.ClickException(f"Spectral separation failed: {e}")