Our Benchmarking Approach

Benchmarks should help clarify performance comparisons, but in vector databases, they often create more confusion than clarity. At Cosdata, we believe in a different approach focused on transparency, real-world relevance, and understandable results.

Our benchmarks are designed with three core principles:

Understandable by Design - We provide context for each metric and explain their real-world implications, not just isolated numbers.
Open and Transparent - Our methodology, configurations, and testing environment are fully documented to enable reproducible results.
Real-World Relevance - We test both vector search and full-text search capabilities with scenarios that mirror actual production usage.

No benchmark can perfectly simulate every use case, but we strive to make performance trade-offs more transparent and easier to reason about. The benchmarks on this page represent our ongoing commitment to honest, meaningful performance reporting.

Dense Vector Search Benchmarks

We benchmarked Cosdata against leading vector databases to evaluate performance across key metrics, including indexing time, query throughput, precision, and latency.

Benchmark Methodology

We conducted dense vector search benchmarks using a standardized approach to ensure fair and meaningful comparisons between different vector database solutions.

Dataset

Source: DbPedia dataset
Size: 1 million records
Dimensions: 1536-dimensional vectors
Format: Embedding vectors generated from titles and descriptions

Metrics

Indexing Time: Time to build the index (minutes)
QPS: Queries per second (higher is better)
Precision: Accuracy of results compared to ground truth
Latency: Response time at p50 (median) and p95 percentiles

Note: All benchmarks were conducted on identical hardware (8 vCPUs, 32GB RAM) with default configurations optimized for each database system. Each test was run 5 times and averaged to ensure consistency.

Performance Highlights

Cosdata demonstrates industry-leading performance across all key metrics:

Industry-leading 1758+ QPS on 1M record datasets with 1536-dimensional vectors
42% faster than Qdrant
54% faster than Weaviate
146% faster than Elastic Search
Consistent 97% precision across challenging search tasks
Significantly faster indexing than Elastic Search while maintaining superior query performance

Benchmark Data

Vector DB	Indexing Time (m)	QPS	Precision	p50 (ms)	p95 (ms)
CosdataFastest	16.32	1758	0.97	7	8
Qdrant	24.43	1238	0.99	4	5
Weaviate	13.94	1142	0.97	5	7
Elastic Search	83.72	716	0.98	22	73

Cosdata HNSW Configuration:

ef_construction=128,
ef_search=64,
neighbors_count=32,
level_0_neighbors_count=32

Full-Text Search Benchmarks

We benchmarked Cosdata's full-text search capabilities against ElasticSearch across multiple datasets to evaluate performance, accuracy, and efficiency.

Benchmark Methodology

Our full-text search benchmarks were conducted using the BEIR benchmark suite, which provides a diverse set of information retrieval datasets. All benchmarks were run on identical hardware configurations to ensure fair comparison. Cosdata's custom BM25 implementation was used for these benchmarks, showcasing our optimized approach to lexical search. Read our technical blog post to learn how our BM25 implementation outperforms ElasticSearch.

Metrics

QPS: Queries per second (higher is better)
NDCG@10: Normalized Discounted Cumulative Gain at 10 results
Latency: Response time at p50 (median) and p95 percentiles (lower is better)
Insertion Time: Time taken to index the entire corpus

Datasets

arguana: Argument quality and convincingness assessment dataset
climate-fever: Climate fact-checking dataset based on FEVER methodology
fever: Fact Extraction and Verification dataset
fiqa: Financial opinion mining and question answering dataset
msmarco: Microsoft Machine Reading Comprehension dataset
nq: Natural Questions dataset for open-domain question answering
quora: Quora duplicate questions dataset
scidocs: Scientific document classification and recommendation dataset
scifact: Scientific fact checking dataset
trec-covid: COVID-19 information retrieval dataset
webis-touche2020: Argument retrieval dataset on controversial topics

Performance Highlights

Cosdata's custom BM25 implementation achieves up to 151x faster QPS than ElasticSearch on the scifact dataset, with ~44x average improvement across all datasets
Cosdata maintains similar ranking quality (NDCG) to ElasticSearch while delivering superior performance
Index creation is significantly faster with Cosdata, up to 12x faster on large datasets
Cosdata shows lower latency at both p50 and p95 percentiles across all tested datasets

Benchmark Data

Select a dataset category and comparison metric to visualize performance differences:

Cosdata delivers 44.5x faster query throughput across small datasets (<500k documents).

(higher is better)

*Dataset labels include corpus size in documents

Dataset (Corpus Size)	System	Insertion Time (seconds)	QPS	NDCG@10	p50 Latency (ms)	p95 Latency (ms)
arguana (8,674)	Cosdata	0.1	2167	0.40	9	15
arguana (8,674)	ElasticSearch	1.4	263	0.48	44	74
climate-fever (5,416,593)	Cosdata	40.6	135	0.13	106	379
climate-fever (5,416,593)	ElasticSearch	522.8	84	0.14	162	263
fever (5,416,568)	Cosdata	40.3	314	0.47	52	157
fever (5,416,568)	ElasticSearch	525.7	154	0.52	80	138
fiqa (57,638)	Cosdata	0.5	4942	0.25	7	12
fiqa (57,638)	ElasticSearch	6.7	251	0.25	39	60
msmarco (8,841,823)	Cosdata	57.7	315	0.23	46	162
msmarco (8,841,823)	ElasticSearch	714.7	166	0.23	73	129
nq (2,681,468)	Cosdata	19.3	483	0.29	30	81
nq (2,681,468)	ElasticSearch	243.2	197	0.29	59	100
quora (522,931)	Cosdata	2.7	1425	0.81	11	36
quora (522,931)	ElasticSearch	30.2	323	0.81	39	55
scidocs (25,657)	Cosdata	0.3	13338	0.16	7	12
scidocs (25,657)	ElasticSearch	3.6	319	0.15	33	48
scifact (5,183)	Cosdata	0.1	40909	0.69	7	13
scifact (5,183)	ElasticSearch	1.0	271	0.68	34	51
trec-covid (171,332)	Cosdata	1.7	2219	0.61	10	18
trec-covid (171,332)	ElasticSearch	22.1	110	0.62	57	88
webis-touche2020 (382,545)	Cosdata	5.5	2789	0.34	10	18
webis-touche2020 (382,545)	ElasticSearch	63.1	108	0.34	62	99

Try Cosdata

Experience the performance benefits of Cosdata in your own applications. Our open-source implementation is available on GitHub.

View on GitHub Learn More