What is the best open-source chord recognition model?

On our reproducible mir_eval benchmark (GuitarSet and Schubert, held out from training), BTC (Park et al., ISMIR 2019) is the strongest open chord-recognition model, with CREMA close behind. Our HCQT variant of BTC ties it but does not clearly beat it — the accessible field has plateaued around 77–82% root accuracy.

Does HCQT (Harmonic CQT) improve chord recognition?

In our tests an HCQT front-end on BTC ties baseline BTC on held-out public data; it does not clearly beat it. An apparent early gain on seventh chords turned out to be a recall-metric artifact (the model over-calling 7ths). HCQT is, however, a strong substrate for extending a chord model to melody, bass, and note transcription.

Is there a reproducible open chord-recognition benchmark?

Yes. Our open repository ships a mir_eval harness (root, thirds, triads, sevenths, majmin, mirex, with per-track bootstrap confidence intervals) over public datasets (GuitarSet, Schubert Winterreise), plus the model and weights. See github.com/marcusfkelley/btc-hcqt.

How accurate is automatic chord recognition?

The best open models — BTC, CREMA, Chordino, and our HCQT variant — all cluster around 77–82% root accuracy on held-out audio, a plateau. Seventh and extended chords remain the hard part.

We tried to beat the best open chord-recognition model. Here's the honest result.

We set out to improve open-source automatic chord recognition (chord detection) — turning audio into a time-stamped chord progression. The strongest open model is BTC (Park et al., ISMIR 2019). We built a Harmonic-CQT (HCQT) variant and a stack of other levers to beat it. Honest result: our best variant ties baseline BTC on held-out public benchmarks — it does not clearly beat it. That is a useful finding, and we've open-sourced everything to reproduce it.

View the open model + benchmark on GitHub →

The numbers (held-out, public data)

Model	GuitarSet root / 7ths	Schubert root / 7ths / mirex
baseline BTC	80.9 / 64.6	73.1 / 55.3 / 64.1
ours (BTC+HCQT, Beatles-FT)	80.5 / 63.0	73.8 / 55.6 / 65.3

A dead heat — BTC noses ahead on guitar, we nose ahead on classical, every gap within the 95% confidence intervals. The whole accessible field (BTC, CREMA, Chordino, and our variants) clusters around 77–82% root accuracy. Nobody is running away with it.

What we tried — and the lesson

Starting from baseline BTC, we tested an HCQT front-end, training from scratch on license-clean audio, real-audio fine-tuning, two- and three-model ensembles, and a reimplementation of the published BTC-FDAA-FGF additions (2025). Every lever landed at parity, never clearly past baseline.

The most useful takeaway is a cautionary one: an in-house metric showed HCQT doubling seventh-chord detection (27% → 48%). It was a recall artifact — a 7th-saturated training set taught the model to over-call 7ths. On frame-wise mir_eval the ranking flipped, and baseline BTC, which calls fewer 7ths but gets them right, came out best. Never trust a bespoke recall metric for a "we improved it" claim — use frame-wise mir_eval with confidence intervals.

A sanity check that the direction was sound: the published 2025 state of the art over BTC (BTC-FDAA-FGF) is itself built on an HCQT front-end — the same representation we chose — adding two further modules for a +1.2–2.2% MIREX gain. HCQT wasn't a wrong turn; closing the last point or two just takes a system, not a front-end swap.

What we're sharing

A reproducible mir_eval benchmark harness over public datasets.
The HCQT variant of BTC — code and weights (it ties baseline BTC).
A concrete extension guide for taking the HCQT base to melody, bass, and transcription.
A small, license-clean CC0 chord dataset sample — 400 public-domain tracks auto-annotated with chord progressions (see below).

github.com/marcusfkelley/btc-hcqt

Open dataset: 400 CC0 chord progressions

Alongside the model we're releasing a small, public chord-progression dataset: 400 public-domain recordings from the U.S. Library of Congress (Citizen DJ), each auto-annotated with a time-stamped chord progression plus detected key and tempo. It's a deliberately random, representative slice — not a curated best-of — and it ships as pointers and metadata only (no audio is redistributed; you fetch each recording from its source URL).

Read this first: the chord labels are BTC-predicted, not human-verified. That makes this useful for training, weak supervision, and exploration — not a gold-standard benchmark (you can't fairly score BTC against BTC's own labels). The annotations are released CC0; the underlying audio is public domain via the Library of Congress.

Download the dataset (JSON, CC0) →Explore the full, live catalog in the Chord Finder →

Why we did this

We're Selekt — we build cleared-sample and music-analysis tools for producers and composers, and chord recognition powers features like our chord-progression search. We needed good chord analysis, so we went deep — and we're sharing the honest result because a reproducible "here's where the field actually stands" is more useful than another unverified state-of-the-art claim.

FAQ

What is the best open-source chord recognition model?: On our reproducible mir_eval benchmark (GuitarSet and Schubert, held out from training), BTC (Park et al., ISMIR 2019) is the strongest open chord-recognition model, with CREMA close behind. Our HCQT variant of BTC ties it but does not clearly beat it — the accessible field has plateaued around 77–82% root accuracy.
Does HCQT (Harmonic CQT) improve chord recognition?: In our tests an HCQT front-end on BTC ties baseline BTC on held-out public data; it does not clearly beat it. An apparent early gain on seventh chords turned out to be a recall-metric artifact (the model over-calling 7ths). HCQT is, however, a strong substrate for extending a chord model to melody, bass, and note transcription.
Is there a reproducible open chord-recognition benchmark?: Yes. Our open repository ships a mir_eval harness (root, thirds, triads, sevenths, majmin, mirex, with per-track bootstrap confidence intervals) over public datasets (GuitarSet, Schubert Winterreise), plus the model and weights. See github.com/marcusfkelley/btc-hcqt.
How accurate is automatic chord recognition?: The best open models — BTC, CREMA, Chordino, and our HCQT variant — all cluster around 77–82% root accuracy on held-out audio, a plateau. Seventh and extended chords remain the hard part.