What Does "Accurate" Mean?

Accuracy depends on what you are measuring. Event-level metrics — sensitivity, precision, false positives per hour — describe detection performance at the level of individual cough events.

But clinicians and researchers rely on rates: hourly and daily cough burden. A system can detect events well and still misrepresent the rate that matters.

This page focuses on agreement in rates.

Phase 2 — Interactive

Metric Selector Module

Interactive toggle revealing the appropriate metrics view. Default: Hourly Agreement.

Correlation Is Not Enough

A sensor can be perfectly correlated with reality, and still be systematically wrong.

Pearson's r measures linear association — how tightly values move together. It does not measure agreement with the line of perfect equality (y = x).

For measurement systems deployed in clinical contexts, the relevant question is not whether readings co-vary. It is whether they match.

"Do readings match reality, or just move in the same direction?"

Pearson r Rewards clustering around any line Insufficient

Lin's CCC Rewards clustering around y = x Clinically valid

Phase 2 — Interactive

Correlation vs Agreement Scatterplot

Toggle: Pearson line · Line of perfect agreement (y = x) · Display Pearson r and Lin's CCC live.

Pearson rewards tight clustering around any line.

Lin's CCC rewards clustering around the correct line.

Agreement at the Hourly Level

Each data point represents one person-hour of continuous recording. Ground truth is established through independent dual annotation and adjudication — not automated labeling.

High concordance indicates both precision and minimal systematic bias: the two properties that matter for clinical deployment.

Pearson r

0.99

Linear association (477 person-hours)

95% CI

0.962–0.996

Clustered bootstrap

Slope

0.94

OLS regression (95% CI 0.91–0.97)

Intercept

0.74

OLS intercept (95% CI 0.50–0.99)

Mean Bias

0.23 /hr

Bland–Altman (95% CI −0.04 to 0.51)

95% LoA

−3.7 to +4.8

coughs/hr

Lin's LCCC — Hourly Coughs

0.9748

Lin's LCCC — Hourly Cough-Seconds

0.9683

Lin's LCCC — Daily Coughs

0.971

Lin's LCCC — Daily Cough-Seconds

0.956

Phase 2 — Interactive

Hour-by-Hour Scatterplot

Hover individual person-hours · Toggle: regression line, y = x, residuals · Secondary tab: Bland–Altman view

Performance Across Individuals

Aggregate statistics summarize 23 participants into a single number. This chart shows every participant individually — ground truth count and Hyfe's count side by side, sorted by cough burden.

Agreement holds across the full range: from participants with fewer than 100 coughs to those with nearly 1,000.

Event-Level Context

For completeness, event detection metrics are reported here. These describe model behavior at the level of individual cough events rather than cough rates.

Event metrics inform model behavior. Rate-level concordance informs clinical utility.

Metric	Value	Definition
Sensitivity	90.4%	Proportion of true coughs detected (95% CI 88.3–92.2%)
Positive Predictive Value	87.5%	Proportion of detections that are true coughs (95% CI 81.9–91.6%)
False Positives / Hour	1.03 / hr	Non-cough events classified as coughs per hour (95% CI 0.84–1.24)

How "Reality" Was Defined

Ground truth is not assumed — it is constructed through a rigorous annotation protocol. Human annotation establishes the reference standard, but also defines the ceiling of agreement any automated system can theoretically achieve.

STEP 01

Continuous Recording

24-hour acoustic recordings capturing ambient and subject-generated sound in real-world conditions.

STEP 02

Independent Dual Annotation

Two independent human annotators review recordings blind to each other's labels.

STEP 03

Adjudication

Discrepancies between annotators are resolved by a third expert reviewer to produce a single adjudicated ground truth.

STEP 04

Cough Seconds

Cough seconds serve as the unit of analysis, enabling continuous rate quantification rather than binary event detection.

View Methodology Details

The annotation protocol is designed to minimize inter-rater variability and ensure that the reference standard reflects genuine cough burden rather than labeling artifacts.

Annotators operate under standardized labeling guidelines developed from prior clinical audio annotation work.
Inter-rater agreement is measured using Cohen's κ prior to adjudication; sessions with low agreement are flagged for re-review.
Recordings span a representative range of acoustic environments — home, workplace, and public settings — to ensure that ground truth reflects the real-world distribution Hyfe's system encounters at deployment.
Cough seconds, rather than cough counts, are used as the primary unit because they are more robust to the boundary ambiguity inherent in longer cough bouts.

Implications for Drug Development

Endpoint Validity

Rate-based agreement — not event-level detection — supports the validity of cough rate as a primary or secondary trial endpoint.

Bias Reduction

Concordance metrics quantify and bound systematic bias, reducing the risk of confounded treatment effect estimates.

Longitudinal Stability

Hourly and daily agreement stability across recording sessions supports the use of Hyfe in longitudinal monitoring arms.

Real-World Dataset

Acoustic variability from real-world recording environments is reflected in the validation dataset — not controlled studio conditions.

Hyfe's validation approach is designed to meet the evidentiary standards expected by regulatory reviewers and pharma endpoints committees. The metrics reported here are those that determine whether a measurement instrument is fit for purpose — not merely whether it functions.

Cough Monitoring Accuracy Designed for the Real World

What Does "Accurate" Mean?

Correlation Is Not Enough

Agreement at the Hourly Level

Performance Across Individuals

Event-Level Context

How "Reality" Was Defined

Implications for Drug Development