Skip to content

Benchmarks

All benchmarks on Apple Silicon. Reproducible via just bench-fm-all and just bench-* (see Running the benchmarks yourself).

Methodology

Hardware & software

  • CPU: Apple M2 Pro
  • RAM: 16 GB
  • OS: macOS 15.5 (arm64)
  • Rust: 1.92.0
  • Python: 3.12.8
  • Node.js: 25.8.0

Package versions (canonical run, 2026-04-06)

Package Version
fast-suffix-array 0.1.0
pyahocorasick 2.3.0
ahocorasick-rs 1.0.3
pylcs 0.1.1
pydivsufsort 0.0.20
fm-index 2.3.4

Competitor library versions are recorded automatically by each benchmark script. See benchmark-results/*.json for exact versions from each run.

Measurement protocol

  • Python: 3 warmup iterations (discarded), 3–20 measurement iterations (varies by input size), median selected.
  • JavaScript: 3 warmup iterations (discarded), 3-5 measurement iterations, median selected.
  • All timings are query-only unless noted — the index is pre-built before timing starts.
  • GC is not explicitly controlled in Python or Node.js. Median selection mitigates GC spikes.

Data generation

  • English text: 52-word vocabulary, uniform random selection, seeded PRNG (seed=42). This is simpler than natural language — no Zipfian distribution, no punctuation variety, no paragraph structure. Pattern extraction: random 2–3 word phrases from the generated text (seed=99).
  • DNA: Uniform ACGT (seed=42). Real genomes have GC-content bias, repeat regions, and tandem repeats. Uniform random is a conservative test case — real genomes may produce more motif hits.
  • Cross-language note: Python uses random.Random (Mersenne Twister); JavaScript uses a custom LCG. Same seed does not produce identical text across languages.

Known caveats

Break-even extrapolation

The break-even total-time sweep (Table 3) is extrapolated from per-query medians multiplied by N, not measured end-to-end. Actual performance may differ due to cache effects and GC pressure at scale.

Pandas comparison

The pandas benchmarks compare different data models: FM-index operates on concatenated text and returns a single total, while pandas returns per-row results (a Series) and uses regex by default. Speedups reflect both the algorithmic advantage and the overhead of pandas' per-row Python object model.

Raw data & reproducibility

  • Python benchmark scripts: bindings/python/benchmarks/bench_*.py
  • JavaScript benchmark scripts: benchmark/bench-*.mjs
  • Raw results: benchmark-results/*.json (saved automatically when running benchmarks; includes environment metadata)
  • Benchmark index: benchmark/README.md lists every script with a 1-line description

FM-index: O(p log sigma) scaling

Source: bench_scaling.py · Raw data: bench_scaling.json

100 patterns, varying text size, query time only (index pre-built):

Text size str.count() SA (rebuild each call) fsa (ours) fsa vs str.count
1 KB 0.06ms 9.6ms 0.16ms 0.4x
10 KB 0.5ms 28.1ms 0.19ms 2.8x
100 KB 4.2ms 287ms 0.22ms 19x
1 MB 44.8ms 0.23ms 198x
10 MB 420.6ms 0.25ms 1,650x

fsa (ours) goes from 0.16ms to 0.25ms across a 10,000x increase in text size. str.count() scales linearly to 421ms at 10MB — 1,650x slower than fsa.

FM-index vs Aho-Corasick (Python)

Source: bench_multipattern.py · Raw data: bench_multipattern.json

Aho-Corasick scans text once for all patterns simultaneously (O(n + matches)). FM-index queries each pattern independently (O(p log sigma) per pattern). Result: FM-index wins decisively at all scales.

Vary pattern count on 1MB corpus

Query time only, all indices pre-built.

Patterns str.count() pyahocorasick ahocorasick-rs fsa (ours) Fastest
10 4.4ms 6.3ms 5.1ms 0.03ms fsa
50 22.6ms 8.4ms 5.5ms 0.1ms fsa
200 91.4ms 9.7ms 9.9ms 0.5ms fsa
1000 447.1ms 14.7ms 15.5ms 2.5ms fsa

Vary corpus size with 200 patterns

Query time only, all indices pre-built.

Corpus str.count() pyahocorasick ahocorasick-rs fsa (ours) Fastest
100 KB 9.0ms 0.9ms 0.9ms 0.5ms fsa
1 MB 88.7ms 9.5ms 11.3ms 0.6ms fsa
10 MB 914.0ms 103.1ms 100.1ms 0.9ms fsa

At 10MB with 200 patterns: fsa (ours) is 111x faster than ahocorasick-rs and 1,016x faster than str.count().

Build time (1MB corpus)

Method Build time
pyahocorasick 0.07ms
ahocorasick-rs 0.12ms
fsa (ours) 55.4ms

Aho-Corasick automata build in microseconds (they only index the patterns, not the text). FM-index builds are slower because they index the full text -- but the payoff is O(p log sigma) queries that don't depend on text size.

FM-index vs fm-index (math-hiyoko)

Source: bench_fmindex.py · Raw data: bench_fmindex.json

Both are Rust-backed FM-indexes with PyO3 bindings. Head-to-head on 100 patterns, English-like text.

Build time (ms, median)

Text size fsa (ours) fm-index (math-hiyoko) fsa faster by
100 KB 5.4ms 8.9ms 1.6x
1 MB 59.4ms 71.5ms 1.2x
10 MB 819ms 1,285ms 1.6x

Count query time (ms, median, both pre-built)

Text size fsa (ours) fm-index (math-hiyoko) fm-index faster by
100 KB 0.3ms 0.2ms 1.7x
1 MB 0.2ms 0.1ms 1.7x
10 MB 0.3ms 0.2ms 1.7x

Memory

Text size fsa heap (ours) fsa ratio fm-index (pickle) fm-index ratio
100 KB 161 KB 1.6x 101 KB 1.0x
1 MB 1.68 MB 1.7x 1.02 MB 1.0x
10 MB 18.0 MB 1.8x 11.0 MB 1.1x

Summary

fm-index (math-hiyoko) wins on count query speed (~1.7x) and memory (~1.0x vs ~1.8x text size). We win on build time (~1.2-1.6x). Our differentiators are the broader feature set (SA, LCP, BWT, LCS, LPF, duplicate detection, serialization, sentence splitting), WASM/browser support, and the CLI -- none of which fm-index offers.

FM-index in the browser (JS/WASM)

Source: bench-fm-index.mjs · Raw data: bench_fm_index_js.json

The WASM FM-index is the only JavaScript library that does true substring search with an index. There is no other FM-index on npm -- the last one was published in 2016 with 31 downloads/week.

vs indexOf and modern-ahocorasick (100 patterns, varying text size)

Text size indexOf loop modern-ahocorasick fsa (ours) fsa vs indexOf
10 KB 0.5ms 0.4ms 0.3ms 1.5x
100 KB 4.6ms 3.1ms 0.33ms 14x
1 MB 45.9ms 31.0ms 0.36ms 127x
10 MB 445ms 299ms 0.41ms 1,074x

FM-index query time stays flat at ~0.4ms regardless of text size. indexOf and Aho-Corasick scale linearly.

Vary pattern count on 1MB text

Patterns indexOf loop modern-ahocorasick fsa (ours) fsa vs indexOf
10 4.9ms 22.1ms 0.06ms 81x
50 23.3ms 31.5ms 0.18ms 130x
200 91.4ms 31.2ms 0.74ms 124x
1000 457ms 40.7ms 4.0ms 115x

At 10MB with 100 patterns: fsa (ours) is 1,074x faster than indexOf, 721x faster than Aho-Corasick. Build cost: 61ms for 1MB text, 1.6 MB memory (1.7x text size).

FM-index memory compression

Source: bench_memory.py · Raw data: bench_memory.json

FM-index replaces both the suffix array AND the original text with a compressed self-index.

English-like text

Text size Raw text SA + text (pydivsufsort) SA/Text fsa (ours) fsa/Text Savings vs SA
1 KB 999 B 4.9 KB 5.0x 3.7 KB 3.8x 25%
10 KB 9.8 KB 48.8 KB 5.0x 16.8 KB 1.7x 66%
100 KB 97.7 KB 488.3 KB 5.0x 156.8 KB 1.6x 68%
1 MB 976.6 KB 4.8 MB 5.0x 1.6 MB 1.7x 66%
10 MB 9.5 MB 47.7 MB 5.0x 17.2 MB 1.8x 64%

Random bytes (high-entropy data)

Text size Raw text SA + text (pydivsufsort) SA/Text fsa (ours) fsa/Text Savings vs SA
1 KB 1,000 B 4.9 KB 5.0x 3.7 KB 3.7x 25%
10 KB 9.8 KB 48.8 KB 5.0x 16.8 KB 1.7x 66%
100 KB 97.7 KB 488.3 KB 5.0x 156.4 KB 1.6x 68%
1 MB 976.6 KB 4.8 MB 5.0x 1.6 MB 1.7x 67%
10 MB 9.5 MB 47.7 MB 5.0x 17.1 MB 1.8x 64%

Suffix arrays require ~5x the raw text size (int64 SA + text). The FM-index compresses this to ~1.7-1.8x text size — consistently 64-68% smaller than SA + text.

FM-index break-even analysis

Source: bench_breakeven.py · Raw data: bench_breakeven.json

FM-index construction is an investment. Here is exactly when it pays off vs str.count().

Per-query cost

Text size str.count() per query fsa per query (ours) fsa build time
100 KB 0.043ms 0.002ms 5.2ms
1 MB 0.442ms 0.002ms 55.1ms
10 MB 4.264ms 0.003ms 680.2ms

Queries to amortize build cost

Text size fsa vs str.count() (ours)
100 KB 126 queries
1 MB 125 queries
10 MB 160 queries

Total time (build + N queries) on 1MB text

Queries str.count() fsa (ours) Winner
1 0.4ms 55.1ms str.count()
10 4.4ms 55.1ms str.count()
100 44.2ms 55.3ms str.count()
250 110.5ms 55.7ms fsa
1000 442.0ms 57.4ms fsa

FM-index vs pandas

Source: bench_pandas.py · Raw data: bench_pandas.json

Build the index once, then each pattern query runs in O(p log sigma) time. Pandas str.contains() and str.count() rescan the full text every time.

Caveat

These comparisons involve different data models. pandas returns per-row results (a Series) and uses regex by default, while the FM-index operates on concatenated text and returns a single total. The speedups reflect both the algorithmic advantage and the overhead of pandas' per-row Python object model.

vs pandas str.count() -- 50 patterns, varying DataFrame size (query time only)

Rows pandas fsa (ours) Speedup
10K 103ms 0.1ms 706x
100K 1,000ms 0.1ms 7,343x
500K 5,079ms 0.2ms 27,008x

vs pandas str.contains() -- 100K rows (query time only)

Patterns pandas fsa (ours) Speedup
5 71ms 3ms 26x
10 147ms 6ms 26x
50 724ms 32ms 23x
100 1,410ms 48ms 29x

Source: bench_dna.py · Raw data: bench_dna.json

The classic bioinformatics use case: index a genome once, search for motifs in O(p log sigma) time per query.

19 biological motifs (EcoRI, BamHI, HindIII, TATA box, E-box, etc.) on synthetic genomes:

Genome size bytes.count() fsa (ours) fsa vs bytes.count()
1 MB 35.2ms 0.024ms 1,450x
10 MB 347.5ms 0.029ms 11,795x
50 MB 1,770ms 0.032ms 55,040x

On a 50MB genome: bytes.count() takes 1.8 seconds -- fsa (ours) takes 0.032 milliseconds. In practice, Python bioinformaticians shell out to C tools like BWA and Bowtie2 for indexed genome search. This library brings FM-index capability natively into Python without external tool dependencies.

Suffix array construction (WASM vs mnemonist)

Source: bench-vs-mnemonist.mjs · Raw data: bench_vs_mnemonist.json

vs mnemonist (the standard JS suffix array, 8.7M downloads/week, pure JS):

Input size fsa (ours) mnemonist Speedup
10K chars 0.6ms 6.9ms 12x
100K chars 4.9ms 77.0ms 16x
500K chars 24.0ms 422ms 18x
1M chars 51.4ms 1,050ms 20x

There is no other WASM-based suffix array on npm. The alternatives are pure JavaScript (slow) or native C++ addons (not portable to browsers).

SA construction backends (Rust native)

Source: bench_sa_construction.py · Raw data: bench_sa_construction.json

The default SA-IS backend is pure Rust with no size limit. Optional C backends trade portability for speed:

Input size SA-IS (default) divsufsort libsais Fastest
1-5K 0.02-0.08ms 0.13-0.18ms 0.02-0.10ms SA-IS
50K 1.13ms 0.91ms 0.72ms libsais (1.6x)
500K 10.9ms 9.7ms 6.9ms libsais (1.6x)
1M 26.0ms 20.6ms 13.9ms libsais (1.9x)
5M 257ms 110ms 66ms libsais (3.9x)

Longest common substring vs pylcs

Source: bench_sa_construction.py · Raw data: bench_sa_construction.json

vs pylcs (the most popular Python LCS library, C++ DP backend):

Input size fsa (ours) pylcs Speedup
1 KB 0.31ms 4.02ms 13x
5 KB 0.55ms 109ms 197x
10 KB 0.91ms 444ms 489x

O(n+m) suffix array vs O(nm) dynamic programming -- the gap grows with input size.

Duplicate phrase detection performance

Source: bench_sa_construction.py · Raw data: bench_sa_construction.json

No other library provides word-boundary-aware duplicate phrase detection:

Input size Time Duplicates found
1 KB 7ms 41
5 KB 60ms 416
10 KB 139ms 875

King James Bible (824K words): 1.3 seconds. Pride and Prejudice (130K words): 95ms.

Verifying the substring gap

The comparison table on the home page claims fuse.js, lunr, and minisearch miss certain substring queries. To verify:

just bench-fm-js-demo

This runs benchmark/demo-substring-gap.mjs, which tests each library against 6 substring queries and prints a pass/fail matrix. The script installs all three libraries from npm and tests them against the WASM FM-index.


Running the benchmarks yourself

# Setup
cd bindings/python
uv run maturin develop --release
uv pip install pydivsufsort pylcs pandas rich biopython pyahocorasick ahocorasick_rs

# All commands below run from the project root via `just`:

# Python: FM-index vs top competitors
just bench-fm-scaling       # O(p log sigma) scaling independence (the money chart)
just bench-fm-multipattern  # FM-index vs Aho-Corasick
just bench-fm-memory        # Memory/compression profile
just bench-fm-breakeven     # Break-even analysis
just bench-fm-dna           # DNA motif search (bioinformatics)
just bench-fm-all           # Run all 5 above

# Python: original benchmarks
just bench-pandas           # FM-index vs pandas
just bench-fmindex          # FM-index vs str.count()
just bench-sa               # SA/LCP/BWT vs pydivsufsort

# JavaScript / WASM
just bench-fm-js            # FM-index vs modern-ahocorasick vs indexOf
just bench-fm-js-demo       # What search libraries can't do (substring gap)

Regenerating charts

The SVG charts in docs/assets/ were generated from benchmark output. After running benchmarks, the raw JSON in benchmark-results/ contains all the data needed to regenerate them.

Note

Chart generation scripts are not yet automated — the current SVGs were created manually from benchmark output. This is a known gap.