Validation & Methodology

Cough Monitoring Accuracy Designed for the Real World

A deep dive into how Hyfe quantifies agreement with ground truth at the level decisions are actually made: hourly and daily cough rates.

24h Continuous Recording
Dual Independent Annotation
≥0.90 Target Concordance
01

What Does "Accurate" Mean?

Accuracy depends on what you are measuring. Event-level metrics — sensitivity, precision, false positives per hour — describe detection performance at the level of individual cough events.

But clinicians and researchers rely on rates: hourly and daily cough burden. A system can detect events well and still misrepresent the rate that matters.

This page focuses on agreement in rates.

Phase 2 — Interactive
Metric Selector Module

Interactive toggle revealing the appropriate metrics view. Default: Hourly Agreement.

02

Correlation Is Not Enough

A sensor can be perfectly correlated with reality, and still be systematically wrong.

Pearson's r measures linear association — how tightly values move together. It does not measure agreement with the line of perfect equality (y = x).

For measurement systems deployed in clinical contexts, the relevant question is not whether readings co-vary. It is whether they match.

"Do readings match reality, or just move in the same direction?"

Pearson r Rewards clustering around any line Insufficient
Lin's CCC Rewards clustering around y = x Clinically valid
Phase 2 — Interactive
Correlation vs Agreement Scatterplot

Toggle: Pearson line · Line of perfect agreement (y = x) · Display Pearson r and Lin's CCC live.

Pearson rewards tight clustering around any line.

Lin's CCC rewards clustering around the correct line.

03

Agreement at the Hourly Level

Each data point represents one person-hour of continuous recording. Ground truth is established through independent dual annotation and adjudication — not automated labeling.

High concordance indicates both precision and minimal systematic bias: the two properties that matter for clinical deployment.

Lin's CCC
Concordance correlation coefficient
Pearson r
Linear association
Slope
Regression coefficient
Intercept
Systematic offset
Mean Bias
Bland–Altman
95% LoA
Limits of agreement
Phase 2 — Interactive
Hour-by-Hour Scatterplot

Hover individual person-hours · Toggle: regression line, y = x, residuals · Secondary tab: Bland–Altman view

04

Agreement at the Daily Level

Clinical decisions and trial endpoints often operate at daily aggregation. Each point below represents one person-day.

Daily aggregation reduces stochastic noise while preserving signal integrity — and consistency between hourly and daily concordance reinforces credibility.

Lin's CCC
Daily concordance
Pearson r
Linear association
Mean Bias
Systematic offset
95% LoA
Limits of agreement
Phase 2 — Interactive
Day-by-Day Scatterplot

Same interface as hourly · Hover individual person-days · Toggle: regression line, y = x, residuals · Bland–Altman tab

05

Event-Level Context

For completeness, event detection metrics are reported here. These describe model behavior at the level of individual cough events rather than cough rates.

Event metrics inform model behavior. Rate-level concordance informs clinical utility.

Metric Value Definition
Sensitivity Proportion of true coughs detected
Precision Proportion of detections that are true coughs
False Positives / Hour Non-cough events classified as coughs per hour
Positive Predictive Value Equivalent to precision in binary classification
06

How "Reality" Was Defined

Ground truth is not assumed — it is constructed through a rigorous annotation protocol. Human annotation establishes the reference standard, but also defines the ceiling of agreement any automated system can theoretically achieve.

STEP 01
Continuous Recording
24-hour acoustic recordings capturing ambient and subject-generated sound in real-world conditions.
STEP 02
Independent Dual Annotation
Two independent human annotators review recordings blind to each other's labels.
STEP 03
Adjudication
Discrepancies between annotators are resolved by a third expert reviewer to produce a single adjudicated ground truth.
STEP 04
Cough Seconds
Cough seconds serve as the unit of analysis, enabling continuous rate quantification rather than binary event detection.
View Methodology Details

The annotation protocol is designed to minimize inter-rater variability and ensure that the reference standard reflects genuine cough burden rather than labeling artifacts.

  • Annotators operate under standardized labeling guidelines developed from prior clinical audio annotation work.
  • Inter-rater agreement is measured using Cohen's κ prior to adjudication; sessions with low agreement are flagged for re-review.
  • Recordings span a representative range of acoustic environments — home, workplace, and public settings — to ensure that ground truth reflects the real-world distribution Hyfe's system encounters at deployment.
  • Cough seconds, rather than cough counts, are used as the primary unit because they are more robust to the boundary ambiguity inherent in longer cough bouts.
07

Implications for Drug Development

⚖️
Endpoint Validity
Rate-based agreement — not event-level detection — supports the validity of cough rate as a primary or secondary trial endpoint.
🎯
Bias Reduction
Concordance metrics quantify and bound systematic bias, reducing the risk of confounded treatment effect estimates.
📈
Longitudinal Stability
Hourly and daily agreement stability across recording sessions supports the use of Hyfe in longitudinal monitoring arms.
🌍
Real-World Dataset
Acoustic variability from real-world recording environments is reflected in the validation dataset — not controlled studio conditions.

Hyfe's validation approach is designed to meet the evidentiary standards expected by regulatory reviewers and pharma endpoints committees. The metrics reported here are those that determine whether a measurement instrument is fit for purpose — not merely whether it functions.