A deep dive into how Hyfe quantifies agreement with ground truth at the level decisions are actually made: hourly and daily cough rates.
Accuracy depends on what you are measuring. Event-level metrics — sensitivity, precision, false positives per hour — describe detection performance at the level of individual cough events.
But clinicians and researchers rely on rates: hourly and daily cough burden. A system can detect events well and still misrepresent the rate that matters.
This page focuses on agreement in rates.
Interactive toggle revealing the appropriate metrics view. Default: Hourly Agreement.
A sensor can be perfectly correlated with reality, and still be systematically wrong.
Pearson's r measures linear association — how tightly values move together. It does not measure agreement with the line of perfect equality (y = x).
For measurement systems deployed in clinical contexts, the relevant question is not whether readings co-vary. It is whether they match.
"Do readings match reality, or just move in the same direction?"
Toggle: Pearson line · Line of perfect agreement (y = x) · Display Pearson r and Lin's CCC live.
Pearson rewards tight clustering around any line.
Lin's CCC rewards clustering around the correct line.
Each data point represents one person-hour of continuous recording. Ground truth is established through independent dual annotation and adjudication — not automated labeling.
High concordance indicates both precision and minimal systematic bias: the two properties that matter for clinical deployment.
Hover individual person-hours · Toggle: regression line, y = x, residuals · Secondary tab: Bland–Altman view
Clinical decisions and trial endpoints often operate at daily aggregation. Each point below represents one person-day.
Daily aggregation reduces stochastic noise while preserving signal integrity — and consistency between hourly and daily concordance reinforces credibility.
Same interface as hourly · Hover individual person-days · Toggle: regression line, y = x, residuals · Bland–Altman tab
For completeness, event detection metrics are reported here. These describe model behavior at the level of individual cough events rather than cough rates.
Event metrics inform model behavior. Rate-level concordance informs clinical utility.
| Metric | Value | Definition |
|---|---|---|
| Sensitivity | — | Proportion of true coughs detected |
| Precision | — | Proportion of detections that are true coughs |
| False Positives / Hour | — | Non-cough events classified as coughs per hour |
| Positive Predictive Value | — | Equivalent to precision in binary classification |
Ground truth is not assumed — it is constructed through a rigorous annotation protocol. Human annotation establishes the reference standard, but also defines the ceiling of agreement any automated system can theoretically achieve.
The annotation protocol is designed to minimize inter-rater variability and ensure that the reference standard reflects genuine cough burden rather than labeling artifacts.
Hyfe's validation approach is designed to meet the evidentiary standards expected by regulatory reviewers and pharma endpoints committees. The metrics reported here are those that determine whether a measurement instrument is fit for purpose — not merely whether it functions.