Benchmarks¶
All benchmarks on Apple Silicon. Reproducible via just bench-fm-all and just bench-* (see Running the benchmarks yourself).
Methodology¶
Hardware & software¶
- CPU: Apple M2 Pro
- RAM: 16 GB
- OS: macOS 15.5 (arm64)
- Rust: 1.92.0
- Python: 3.12.8
- Node.js: 25.8.0
Package versions (canonical run, 2026-04-06)¶
| Package | Version |
|---|---|
| fast-suffix-array | 0.1.0 |
| pyahocorasick | 2.3.0 |
| ahocorasick-rs | 1.0.3 |
| pylcs | 0.1.1 |
| pydivsufsort | 0.0.20 |
| fm-index | 2.3.4 |
Competitor library versions are recorded automatically by each benchmark script. See benchmark-results/*.json for exact versions from each run.
Measurement protocol¶
- Python: 3 warmup iterations (discarded), 3–20 measurement iterations (varies by input size), median selected.
- JavaScript: 3 warmup iterations (discarded), 3-5 measurement iterations, median selected.
- All timings are query-only unless noted — the index is pre-built before timing starts.
- GC is not explicitly controlled in Python or Node.js. Median selection mitigates GC spikes.
Data generation¶
- English text: 52-word vocabulary, uniform random selection, seeded PRNG (
seed=42). This is simpler than natural language — no Zipfian distribution, no punctuation variety, no paragraph structure. Pattern extraction: random 2–3 word phrases from the generated text (seed=99). - DNA: Uniform ACGT (
seed=42). Real genomes have GC-content bias, repeat regions, and tandem repeats. Uniform random is a conservative test case — real genomes may produce more motif hits. - Cross-language note: Python uses
random.Random(Mersenne Twister); JavaScript uses a custom LCG. Same seed does not produce identical text across languages.
Known caveats¶
Break-even extrapolation
The break-even total-time sweep (Table 3) is extrapolated from per-query medians multiplied by N, not measured end-to-end. Actual performance may differ due to cache effects and GC pressure at scale.
Pandas comparison
The pandas benchmarks compare different data models: FM-index operates on concatenated text and returns a single total, while pandas returns per-row results (a Series) and uses regex by default. Speedups reflect both the algorithmic advantage and the overhead of pandas' per-row Python object model.
Raw data & reproducibility¶
- Python benchmark scripts:
bindings/python/benchmarks/bench_*.py - JavaScript benchmark scripts:
benchmark/bench-*.mjs - Raw results:
benchmark-results/*.json(saved automatically when running benchmarks; includes environment metadata) - Benchmark index:
benchmark/README.mdlists every script with a 1-line description
FM-index: O(p log sigma) scaling¶
Source: bench_scaling.py · Raw data: bench_scaling.json
100 patterns, varying text size, query time only (index pre-built):
| Text size | str.count() | SA (rebuild each call) | fsa (ours) | fsa vs str.count |
|---|---|---|---|---|
| 1 KB | 0.06ms | 9.6ms | 0.16ms | 0.4x |
| 10 KB | 0.5ms | 28.1ms | 0.19ms | 2.8x |
| 100 KB | 4.2ms | 287ms | 0.22ms | 19x |
| 1 MB | 44.8ms | — | 0.23ms | 198x |
| 10 MB | 420.6ms | — | 0.25ms | 1,650x |
fsa (ours) goes from 0.16ms to 0.25ms across a 10,000x increase in text size. str.count() scales linearly to 421ms at 10MB — 1,650x slower than fsa.
FM-index vs Aho-Corasick (Python)¶
Source: bench_multipattern.py · Raw data: bench_multipattern.json
Aho-Corasick scans text once for all patterns simultaneously (O(n + matches)). FM-index queries each pattern independently (O(p log sigma) per pattern). Result: FM-index wins decisively at all scales.
Vary pattern count on 1MB corpus¶
Query time only, all indices pre-built.
| Patterns | str.count() | pyahocorasick | ahocorasick-rs | fsa (ours) | Fastest |
|---|---|---|---|---|---|
| 10 | 4.4ms | 6.3ms | 5.1ms | 0.03ms | fsa |
| 50 | 22.6ms | 8.4ms | 5.5ms | 0.1ms | fsa |
| 200 | 91.4ms | 9.7ms | 9.9ms | 0.5ms | fsa |
| 1000 | 447.1ms | 14.7ms | 15.5ms | 2.5ms | fsa |
Vary corpus size with 200 patterns¶
Query time only, all indices pre-built.
| Corpus | str.count() | pyahocorasick | ahocorasick-rs | fsa (ours) | Fastest |
|---|---|---|---|---|---|
| 100 KB | 9.0ms | 0.9ms | 0.9ms | 0.5ms | fsa |
| 1 MB | 88.7ms | 9.5ms | 11.3ms | 0.6ms | fsa |
| 10 MB | 914.0ms | 103.1ms | 100.1ms | 0.9ms | fsa |
At 10MB with 200 patterns: fsa (ours) is 111x faster than ahocorasick-rs and 1,016x faster than str.count().
Build time (1MB corpus)¶
| Method | Build time |
|---|---|
| pyahocorasick | 0.07ms |
| ahocorasick-rs | 0.12ms |
| fsa (ours) | 55.4ms |
Aho-Corasick automata build in microseconds (they only index the patterns, not the text). FM-index builds are slower because they index the full text -- but the payoff is O(p log sigma) queries that don't depend on text size.
FM-index vs fm-index (math-hiyoko)¶
Source: bench_fmindex.py · Raw data: bench_fmindex.json
Both are Rust-backed FM-indexes with PyO3 bindings. Head-to-head on 100 patterns, English-like text.
Build time (ms, median)¶
| Text size | fsa (ours) | fm-index (math-hiyoko) | fsa faster by |
|---|---|---|---|
| 100 KB | 5.4ms | 8.9ms | 1.6x |
| 1 MB | 59.4ms | 71.5ms | 1.2x |
| 10 MB | 819ms | 1,285ms | 1.6x |
Count query time (ms, median, both pre-built)¶
| Text size | fsa (ours) | fm-index (math-hiyoko) | fm-index faster by |
|---|---|---|---|
| 100 KB | 0.3ms | 0.2ms | 1.7x |
| 1 MB | 0.2ms | 0.1ms | 1.7x |
| 10 MB | 0.3ms | 0.2ms | 1.7x |
Memory¶
| Text size | fsa heap (ours) | fsa ratio | fm-index (pickle) | fm-index ratio |
|---|---|---|---|---|
| 100 KB | 161 KB | 1.6x | 101 KB | 1.0x |
| 1 MB | 1.68 MB | 1.7x | 1.02 MB | 1.0x |
| 10 MB | 18.0 MB | 1.8x | 11.0 MB | 1.1x |
Summary
fm-index (math-hiyoko) wins on count query speed (~1.7x) and memory (~1.0x vs ~1.8x text size). We win on build time (~1.2-1.6x). Our differentiators are the broader feature set (SA, LCP, BWT, LCS, LPF, duplicate detection, serialization, sentence splitting), WASM/browser support, and the CLI -- none of which fm-index offers.
FM-index in the browser (JS/WASM)¶
Source: bench-fm-index.mjs · Raw data: bench_fm_index_js.json
The WASM FM-index is the only JavaScript library that does true substring search with an index. There is no other FM-index on npm -- the last one was published in 2016 with 31 downloads/week.
vs indexOf and modern-ahocorasick (100 patterns, varying text size)¶
| Text size | indexOf loop | modern-ahocorasick | fsa (ours) | fsa vs indexOf |
|---|---|---|---|---|
| 10 KB | 0.5ms | 0.4ms | 0.3ms | 1.5x |
| 100 KB | 4.6ms | 3.1ms | 0.33ms | 14x |
| 1 MB | 45.9ms | 31.0ms | 0.36ms | 127x |
| 10 MB | 445ms | 299ms | 0.41ms | 1,074x |
FM-index query time stays flat at ~0.4ms regardless of text size. indexOf and Aho-Corasick scale linearly.
Vary pattern count on 1MB text¶
| Patterns | indexOf loop | modern-ahocorasick | fsa (ours) | fsa vs indexOf |
|---|---|---|---|---|
| 10 | 4.9ms | 22.1ms | 0.06ms | 81x |
| 50 | 23.3ms | 31.5ms | 0.18ms | 130x |
| 200 | 91.4ms | 31.2ms | 0.74ms | 124x |
| 1000 | 457ms | 40.7ms | 4.0ms | 115x |
At 10MB with 100 patterns: fsa (ours) is 1,074x faster than indexOf, 721x faster than Aho-Corasick. Build cost: 61ms for 1MB text, 1.6 MB memory (1.7x text size).
FM-index memory compression¶
Source: bench_memory.py · Raw data: bench_memory.json
FM-index replaces both the suffix array AND the original text with a compressed self-index.
English-like text¶
| Text size | Raw text | SA + text (pydivsufsort) | SA/Text | fsa (ours) | fsa/Text | Savings vs SA |
|---|---|---|---|---|---|---|
| 1 KB | 999 B | 4.9 KB | 5.0x | 3.7 KB | 3.8x | 25% |
| 10 KB | 9.8 KB | 48.8 KB | 5.0x | 16.8 KB | 1.7x | 66% |
| 100 KB | 97.7 KB | 488.3 KB | 5.0x | 156.8 KB | 1.6x | 68% |
| 1 MB | 976.6 KB | 4.8 MB | 5.0x | 1.6 MB | 1.7x | 66% |
| 10 MB | 9.5 MB | 47.7 MB | 5.0x | 17.2 MB | 1.8x | 64% |
Random bytes (high-entropy data)¶
| Text size | Raw text | SA + text (pydivsufsort) | SA/Text | fsa (ours) | fsa/Text | Savings vs SA |
|---|---|---|---|---|---|---|
| 1 KB | 1,000 B | 4.9 KB | 5.0x | 3.7 KB | 3.7x | 25% |
| 10 KB | 9.8 KB | 48.8 KB | 5.0x | 16.8 KB | 1.7x | 66% |
| 100 KB | 97.7 KB | 488.3 KB | 5.0x | 156.4 KB | 1.6x | 68% |
| 1 MB | 976.6 KB | 4.8 MB | 5.0x | 1.6 MB | 1.7x | 67% |
| 10 MB | 9.5 MB | 47.7 MB | 5.0x | 17.1 MB | 1.8x | 64% |
Suffix arrays require ~5x the raw text size (int64 SA + text). The FM-index compresses this to ~1.7-1.8x text size — consistently 64-68% smaller than SA + text.
FM-index break-even analysis¶
Source: bench_breakeven.py · Raw data: bench_breakeven.json
FM-index construction is an investment. Here is exactly when it pays off vs str.count().
Per-query cost¶
| Text size | str.count() per query | fsa per query (ours) | fsa build time |
|---|---|---|---|
| 100 KB | 0.043ms | 0.002ms | 5.2ms |
| 1 MB | 0.442ms | 0.002ms | 55.1ms |
| 10 MB | 4.264ms | 0.003ms | 680.2ms |
Queries to amortize build cost¶
| Text size | fsa vs str.count() (ours) |
|---|---|
| 100 KB | 126 queries |
| 1 MB | 125 queries |
| 10 MB | 160 queries |
Total time (build + N queries) on 1MB text¶
| Queries | str.count() | fsa (ours) | Winner |
|---|---|---|---|
| 1 | 0.4ms | 55.1ms | str.count() |
| 10 | 4.4ms | 55.1ms | str.count() |
| 100 | 44.2ms | 55.3ms | str.count() |
| 250 | 110.5ms | 55.7ms | fsa |
| 1000 | 442.0ms | 57.4ms | fsa |
FM-index vs pandas¶
Source: bench_pandas.py · Raw data: bench_pandas.json
Build the index once, then each pattern query runs in O(p log sigma) time. Pandas str.contains() and str.count() rescan the full text every time.
Caveat
These comparisons involve different data models. pandas returns per-row results (a Series) and uses regex by default, while the FM-index operates on concatenated text and returns a single total. The speedups reflect both the algorithmic advantage and the overhead of pandas' per-row Python object model.
vs pandas str.count() -- 50 patterns, varying DataFrame size (query time only)¶
| Rows | pandas | fsa (ours) | Speedup |
|---|---|---|---|
| 10K | 103ms | 0.1ms | 706x |
| 100K | 1,000ms | 0.1ms | 7,343x |
| 500K | 5,079ms | 0.2ms | 27,008x |
vs pandas str.contains() -- 100K rows (query time only)¶
| Patterns | pandas | fsa (ours) | Speedup |
|---|---|---|---|
| 5 | 71ms | 3ms | 26x |
| 10 | 147ms | 6ms | 26x |
| 50 | 724ms | 32ms | 23x |
| 100 | 1,410ms | 48ms | 29x |
FM-index: DNA motif search¶
Source: bench_dna.py · Raw data: bench_dna.json
The classic bioinformatics use case: index a genome once, search for motifs in O(p log sigma) time per query.
19 biological motifs (EcoRI, BamHI, HindIII, TATA box, E-box, etc.) on synthetic genomes:
| Genome size | bytes.count() | fsa (ours) | fsa vs bytes.count() |
|---|---|---|---|
| 1 MB | 35.2ms | 0.024ms | 1,450x |
| 10 MB | 347.5ms | 0.029ms | 11,795x |
| 50 MB | 1,770ms | 0.032ms | 55,040x |
On a 50MB genome: bytes.count() takes 1.8 seconds -- fsa (ours) takes 0.032 milliseconds. In practice, Python bioinformaticians shell out to C tools like BWA and Bowtie2 for indexed genome search. This library brings FM-index capability natively into Python without external tool dependencies.
Suffix array construction (WASM vs mnemonist)¶
Source: bench-vs-mnemonist.mjs · Raw data: bench_vs_mnemonist.json
vs mnemonist (the standard JS suffix array, 8.7M downloads/week, pure JS):
| Input size | fsa (ours) | mnemonist | Speedup |
|---|---|---|---|
| 10K chars | 0.6ms | 6.9ms | 12x |
| 100K chars | 4.9ms | 77.0ms | 16x |
| 500K chars | 24.0ms | 422ms | 18x |
| 1M chars | 51.4ms | 1,050ms | 20x |
There is no other WASM-based suffix array on npm. The alternatives are pure JavaScript (slow) or native C++ addons (not portable to browsers).
SA construction backends (Rust native)¶
Source: bench_sa_construction.py · Raw data: bench_sa_construction.json
The default SA-IS backend is pure Rust with no size limit. Optional C backends trade portability for speed:
| Input size | SA-IS (default) | divsufsort | libsais | Fastest |
|---|---|---|---|---|
| 1-5K | 0.02-0.08ms | 0.13-0.18ms | 0.02-0.10ms | SA-IS |
| 50K | 1.13ms | 0.91ms | 0.72ms | libsais (1.6x) |
| 500K | 10.9ms | 9.7ms | 6.9ms | libsais (1.6x) |
| 1M | 26.0ms | 20.6ms | 13.9ms | libsais (1.9x) |
| 5M | 257ms | 110ms | 66ms | libsais (3.9x) |
Longest common substring vs pylcs¶
Source: bench_sa_construction.py · Raw data: bench_sa_construction.json
vs pylcs (the most popular Python LCS library, C++ DP backend):
| Input size | fsa (ours) | pylcs | Speedup |
|---|---|---|---|
| 1 KB | 0.31ms | 4.02ms | 13x |
| 5 KB | 0.55ms | 109ms | 197x |
| 10 KB | 0.91ms | 444ms | 489x |
O(n+m) suffix array vs O(nm) dynamic programming -- the gap grows with input size.
Duplicate phrase detection performance¶
Source: bench_sa_construction.py · Raw data: bench_sa_construction.json
No other library provides word-boundary-aware duplicate phrase detection:
| Input size | Time | Duplicates found |
|---|---|---|
| 1 KB | 7ms | 41 |
| 5 KB | 60ms | 416 |
| 10 KB | 139ms | 875 |
King James Bible (824K words): 1.3 seconds. Pride and Prejudice (130K words): 95ms.
Verifying the substring gap¶
The comparison table on the home page claims fuse.js, lunr, and minisearch miss certain substring queries. To verify:
This runs benchmark/demo-substring-gap.mjs, which tests each library against 6 substring queries and prints a pass/fail matrix. The script installs all three libraries from npm and tests them against the WASM FM-index.
Running the benchmarks yourself¶
# Setup
cd bindings/python
uv run maturin develop --release
uv pip install pydivsufsort pylcs pandas rich biopython pyahocorasick ahocorasick_rs
# All commands below run from the project root via `just`:
# Python: FM-index vs top competitors
just bench-fm-scaling # O(p log sigma) scaling independence (the money chart)
just bench-fm-multipattern # FM-index vs Aho-Corasick
just bench-fm-memory # Memory/compression profile
just bench-fm-breakeven # Break-even analysis
just bench-fm-dna # DNA motif search (bioinformatics)
just bench-fm-all # Run all 5 above
# Python: original benchmarks
just bench-pandas # FM-index vs pandas
just bench-fmindex # FM-index vs str.count()
just bench-sa # SA/LCP/BWT vs pydivsufsort
# JavaScript / WASM
just bench-fm-js # FM-index vs modern-ahocorasick vs indexOf
just bench-fm-js-demo # What search libraries can't do (substring gap)
Regenerating charts¶
The SVG charts in docs/assets/ were generated from benchmark output. After running benchmarks, the raw JSON in benchmark-results/ contains all the data needed to regenerate them.
Note
Chart generation scripts are not yet automated — the current SVGs were created manually from benchmark output. This is a known gap.