We tried to beat the best open chord-recognition model. Here's the honest result.
We set out to improve open-source automatic chord recognition (chord detection) — turning audio into a time-stamped chord progression. The strongest open model is BTC (Park et al., ISMIR 2019). We built a Harmonic-CQT (HCQT) variant and a stack of other levers to beat it. Honest result: our best variant ties baseline BTC on held-out public benchmarks — it does not clearly beat it. That is a useful finding, and we've open-sourced everything to reproduce it.
The numbers (held-out, public data)
| Model | GuitarSet root / 7ths | Schubert root / 7ths / mirex |
|---|---|---|
| baseline BTC | 80.9 / 64.6 | 73.1 / 55.3 / 64.1 |
| ours (BTC+HCQT, Beatles-FT) | 80.5 / 63.0 | 73.8 / 55.6 / 65.3 |
A dead heat — BTC noses ahead on guitar, we nose ahead on classical, every gap within the 95% confidence intervals. The whole accessible field (BTC, CREMA, Chordino, and our variants) clusters around 77–82% root accuracy. Nobody is running away with it.
What we tried — and the lesson
Starting from baseline BTC, we tested an HCQT front-end, training from scratch on license-clean audio, real-audio fine-tuning, two- and three-model ensembles, and a reimplementation of the published BTC-FDAA-FGF additions (2025). Every lever landed at parity, never clearly past baseline.
The most useful takeaway is a cautionary one: an in-house metric showed HCQT doubling seventh-chord detection (27% → 48%). It was a recall artifact — a 7th-saturated training set taught the model to over-call 7ths. On frame-wise mir_eval the ranking flipped, and baseline BTC, which calls fewer 7ths but gets them right, came out best. Never trust a bespoke recall metric for a "we improved it" claim — use frame-wise mir_eval with confidence intervals.
A sanity check that the direction was sound: the published 2025 state of the art over BTC (BTC-FDAA-FGF) is itself built on an HCQT front-end — the same representation we chose — adding two further modules for a +1.2–2.2% MIREX gain. HCQT wasn't a wrong turn; closing the last point or two just takes a system, not a front-end swap.
What we're sharing
- A reproducible mir_eval benchmark harness over public datasets.
- The HCQT variant of BTC — code and weights (it ties baseline BTC).
- A concrete extension guide for taking the HCQT base to melody, bass, and transcription.
- A small, license-clean CC0 chord dataset sample (auto-annotated; coming alongside the repo).
Why we did this
We're Selekt — we build cleared-sample and music-analysis tools for producers and composers, and chord recognition powers features like our chord-progression search. We needed good chord analysis, so we went deep — and we're sharing the honest result because a reproducible "here's where the field actually stands" is more useful than another unverified state-of-the-art claim.
FAQ
- What is the best open-source chord recognition model?
- On our reproducible mir_eval benchmark (GuitarSet and Schubert, held out from training), BTC (Park et al., ISMIR 2019) is the strongest open chord-recognition model, with CREMA close behind. Our HCQT variant of BTC ties it but does not clearly beat it — the accessible field has plateaued around 77–82% root accuracy.
- Does HCQT (Harmonic CQT) improve chord recognition?
- In our tests an HCQT front-end on BTC ties baseline BTC on held-out public data; it does not clearly beat it. An apparent early gain on seventh chords turned out to be a recall-metric artifact (the model over-calling 7ths). HCQT is, however, a strong substrate for extending a chord model to melody, bass, and note transcription.
- Is there a reproducible open chord-recognition benchmark?
- Yes. Our open repository ships a mir_eval harness (root, thirds, triads, sevenths, majmin, mirex, with per-track bootstrap confidence intervals) over public datasets (GuitarSet, Schubert Winterreise), plus the model and weights. See github.com/marcusfkelley/btc-hcqt.
- How accurate is automatic chord recognition?
- The best open models — BTC, CREMA, Chordino, and our HCQT variant — all cluster around 77–82% root accuracy on held-out audio, a plateau. Seventh and extended chords remain the hard part.
