logo

Our Benchmarking Approach

Benchmarks should help clarify performance comparisons, but in vector databases, they often create more confusion than clarity. At Cosdata, we believe in a different approach focused on transparency, real-world relevance, and understandable results.

Our benchmarks are designed with three core principles:

  • Understandable by Design - We provide context for each metric and explain their real-world implications, not just isolated numbers.
  • Open and Transparent - Our methodology, configurations, and testing environment are fully documented to enable reproducible results.
  • Real-World Relevance - We test both vector search and full-text search capabilities with scenarios that mirror actual production usage.

No benchmark can perfectly simulate every use case, but we strive to make performance trade-offs more transparent and easier to reason about. The benchmarks on this page represent our ongoing commitment to honest, meaningful performance reporting.

Dense Vector Search Benchmarks

We benchmarked Cosdata against leading vector databases to evaluate performance across key metrics, including indexing time, query throughput, precision, and latency.

Benchmark Methodology

We conducted dense vector search benchmarks using a standardized approach to ensure fair and meaningful comparisons between different vector database solutions.

Dataset

  • Source: DbPedia dataset
  • Size: 1 million records
  • Dimensions: 1536-dimensional vectors
  • Format: Embedding vectors generated from titles and descriptions

Metrics

  • Indexing Time: Time to build the index (minutes)
  • QPS: Queries per second (higher is better)
  • Precision: Accuracy of results compared to ground truth
  • Latency: Response time at p50 (median) and p95 percentiles

Note: All benchmarks were conducted on identical hardware (8 vCPUs, 32GB RAM) with default configurations optimized for each database system. Each test was run 5 times and averaged to ensure consistency.

Performance Highlights

Cosdata demonstrates industry-leading performance across all key metrics:

  • Industry-leading 1758+ QPS on 1M record datasets with 1536-dimensional vectors
  • 42% faster than Qdrant
  • 54% faster than Weaviate
  • 146% faster than Elastic Search
  • Consistent 97% precision across challenging search tasks
  • Significantly faster indexing than Elastic Search while maintaining superior query performance

Benchmark Data

Vector DBIndexing Time (m)QPSPrecisionp50 (ms)p95 (ms)
CosdataFastest16.3217580.9778
Qdrant24.4312380.9945
Weaviate13.9411420.9757
Elastic Search83.727160.982273

Full-Text Search Benchmarks

We benchmarked Cosdata's full-text search capabilities against ElasticSearch across multiple datasets to evaluate performance, accuracy, and efficiency.

Benchmark Methodology

Our full-text search benchmarks were conducted using the BEIR benchmark suite, which provides a diverse set of information retrieval datasets. All benchmarks were run on identical hardware configurations to ensure fair comparison. Cosdata's custom BM25 implementation was used for these benchmarks, showcasing our optimized approach to lexical search. Read our technical blog post to learn how our BM25 implementation outperforms ElasticSearch.

Metrics

  • QPS: Queries per second (higher is better)
  • NDCG@10: Normalized Discounted Cumulative Gain at 10 results
  • Latency: Response time at p50 (median) and p95 percentiles (lower is better)
  • Insertion Time: Time taken to index the entire corpus

Datasets

  • arguana: Argument quality and convincingness assessment dataset
  • climate-fever: Climate fact-checking dataset based on FEVER methodology
  • fever: Fact Extraction and Verification dataset
  • fiqa: Financial opinion mining and question answering dataset
  • msmarco: Microsoft Machine Reading Comprehension dataset
  • nq: Natural Questions dataset for open-domain question answering
  • quora: Quora duplicate questions dataset
  • scidocs: Scientific document classification and recommendation dataset
  • scifact: Scientific fact checking dataset
  • trec-covid: COVID-19 information retrieval dataset
  • webis-touche2020: Argument retrieval dataset on controversial topics

Performance Highlights

  • Cosdata's custom BM25 implementation achieves up to 151x faster QPS than ElasticSearch on the scifact dataset, with ~44x average improvement across all datasets
  • Cosdata maintains similar ranking quality (NDCG) to ElasticSearch while delivering superior performance
  • Index creation is significantly faster with Cosdata, up to 12x faster on large datasets
  • Cosdata shows lower latency at both p50 and p95 percentiles across all tested datasets

Benchmark Data

Select a dataset category and comparison metric to visualize performance differences:

Dataset Category:

Comparison Metric:

Cosdata delivers 44.5x faster query throughput across small datasets (<500k documents).

(higher is better)
*Dataset labels include corpus size in documents
Dataset (Corpus Size)SystemInsertion Time (seconds)QPSNDCG@10p50 Latency (ms)p95 Latency (ms)
arguana (8,674)Cosdata0.121670.40915
ElasticSearch1.42630.484474
climate-fever (5,416,593)Cosdata40.61350.13106379
ElasticSearch522.8840.14162263
fever (5,416,568)Cosdata40.33140.4752157
ElasticSearch525.71540.5280138
fiqa (57,638)Cosdata0.549420.25712
ElasticSearch6.72510.253960
msmarco (8,841,823)Cosdata57.73150.2346162
ElasticSearch714.71660.2373129
nq (2,681,468)Cosdata19.34830.293081
ElasticSearch243.21970.2959100
quora (522,931)Cosdata2.714250.811136
ElasticSearch30.23230.813955
scidocs (25,657)Cosdata0.3133380.16712
ElasticSearch3.63190.153348
scifact (5,183)Cosdata0.1409090.69713
ElasticSearch1.02710.683451
trec-covid (171,332)Cosdata1.722190.611018
ElasticSearch22.11100.625788
webis-touche2020 (382,545)Cosdata5.527890.341018
ElasticSearch63.11080.346299

Try Cosdata

Experience the performance benefits of Cosdata in your own applications. Our open-source implementation is available on GitHub.