Validation & Methodology

Cough Monitoring Accuracy Designed for the Real World

A deep dive into how Hyfe quantifies agreement with ground truth at the level decisions are actually made: hourly and daily cough rates.

24h Continuous Recording
Dual Independent Annotation
0.90 Target Concordance
01

What Does "Accurate" Mean?

Accuracy depends on what you are measuring. Event-level metrics — sensitivity, precision, false positives per hour — describe detection performance at the level of individual cough events.

But clinicians and researchers rely on rates: hourly and daily cough burden. A system can detect events well and still misrepresent the rate that matters.

This page focuses on agreement in rates.

Phase 2 — Interactive
Metric Selector Module

Interactive toggle revealing the appropriate metrics view. Default: Hourly Agreement.

02

Correlation Is Not Enough

A sensor can be perfectly correlated with reality, and still be systematically wrong.

Pearson's r measures linear association — how tightly values move together. It does not measure agreement with the line of perfect equality (y = x).

For measurement systems deployed in clinical contexts, the relevant question is not whether readings co-vary. It is whether they match.

"Do readings match reality, or just move in the same direction?"

Pearson r Rewards clustering around any line Insufficient
Lin's CCC Rewards clustering around y = x Clinically valid
Phase 2 — Interactive
Correlation vs Agreement Scatterplot

Toggle: Pearson line · Line of perfect agreement (y = x) · Display Pearson r and Lin's CCC live.

Pearson rewards tight clustering around any line.

Lin's CCC rewards clustering around the correct line.

03

Agreement at the Hourly Level

Each data point represents one person-hour of continuous recording. Ground truth is established through independent dual annotation and adjudication — not automated labeling.

High concordance indicates both precision and minimal systematic bias: the two properties that matter for clinical deployment.

Pearson r
0.99
Linear association (477 person-hours)
95% CI
0.962–0.996
Clustered bootstrap
Slope
0.94
OLS regression (95% CI 0.91–0.97)
Intercept
0.74
OLS intercept (95% CI 0.50–0.99)
Mean Bias
0.23 /hr
Bland–Altman (95% CI −0.04 to 0.51)
95% LoA
−3.7 to +4.8
coughs/hr
Lin's LCCC — Hourly Coughs
0.9748
Lin's LCCC — Hourly Cough-Seconds
0.9683
Lin's LCCC — Daily Coughs
0.971
Lin's LCCC — Daily Cough-Seconds
0.956
Phase 2 — Interactive
Hour-by-Hour Scatterplot

Hover individual person-hours · Toggle: regression line, y = x, residuals · Secondary tab: Bland–Altman view

04

Performance Across Individuals

Aggregate statistics summarize 23 participants into a single number. This chart shows every participant individually — ground truth count and Hyfe's count side by side, sorted by cough burden.

Agreement holds across the full range: from participants with fewer than 100 coughs to those with nearly 1,000.

05

Event-Level Context

For completeness, event detection metrics are reported here. These describe model behavior at the level of individual cough events rather than cough rates.

Event metrics inform model behavior. Rate-level concordance informs clinical utility.

Metric Value Definition
Sensitivity
90.4%
Proportion of true coughs detected (95% CI 88.3–92.2%)
Positive Predictive Value
87.5%
Proportion of detections that are true coughs (95% CI 81.9–91.6%)
False Positives / Hour
1.03 / hr
Non-cough events classified as coughs per hour (95% CI 0.84–1.24)
06

How "Reality" Was Defined

Ground truth is not assumed — it is constructed through a rigorous annotation protocol. Human annotation establishes the reference standard, but also defines the ceiling of agreement any automated system can theoretically achieve.

STEP 01
Continuous Recording
24-hour acoustic recordings capturing ambient and subject-generated sound in real-world conditions.
STEP 02
Independent Dual Annotation
Two independent human annotators review recordings blind to each other's labels.
STEP 03
Adjudication
Discrepancies between annotators are resolved by a third expert reviewer to produce a single adjudicated ground truth.
STEP 04
Cough Seconds
Cough seconds serve as the unit of analysis, enabling continuous rate quantification rather than binary event detection.
View Methodology Details

The annotation protocol is designed to minimize inter-rater variability and ensure that the reference standard reflects genuine cough burden rather than labeling artifacts.

  • Annotators operate under standardized labeling guidelines developed from prior clinical audio annotation work.
  • Inter-rater agreement is measured using Cohen's κ prior to adjudication; sessions with low agreement are flagged for re-review.
  • Recordings span a representative range of acoustic environments — home, workplace, and public settings — to ensure that ground truth reflects the real-world distribution Hyfe's system encounters at deployment.
  • Cough seconds, rather than cough counts, are used as the primary unit because they are more robust to the boundary ambiguity inherent in longer cough bouts.
07

Implications for Drug Development

Endpoint Validity
Rate-based agreement — not event-level detection — supports the validity of cough rate as a primary or secondary trial endpoint.
Bias Reduction
Concordance metrics quantify and bound systematic bias, reducing the risk of confounded treatment effect estimates.
Longitudinal Stability
Hourly and daily agreement stability across recording sessions supports the use of Hyfe in longitudinal monitoring arms.
Real-World Dataset
Acoustic variability from real-world recording environments is reflected in the validation dataset — not controlled studio conditions.

Hyfe's validation approach is designed to meet the evidentiary standards expected by regulatory reviewers and pharma endpoints committees. The metrics reported here are those that determine whether a measurement instrument is fit for purpose — not merely whether it functions.