Our Benchmarking Approach
Benchmarks should help clarify performance comparisons, but in vector databases, they often create more confusion than clarity. At Cosdata, we believe in a different approach focused on transparency, real-world relevance, and understandable results.
Our benchmarks are designed with three core principles:
- Understandable by Design - We provide context for each metric and explain their real-world implications, not just isolated numbers.
- Open and Transparent - Our methodology, configurations, and testing environment are fully documented to enable reproducible results.
- Real-World Relevance - We test both vector search and full-text search capabilities with scenarios that mirror actual production usage.
No benchmark can perfectly simulate every use case, but we strive to make performance trade-offs more transparent and easier to reason about. The benchmarks on this page represent our ongoing commitment to honest, meaningful performance reporting.
Dense Vector Search Benchmarks
We benchmarked Cosdata against leading vector databases to evaluate performance across key metrics, including indexing time, query throughput, precision, and latency.
Benchmark Methodology
We conducted dense vector search benchmarks using a standardized approach to ensure fair and meaningful comparisons between different vector database solutions.
Dataset
- Source: DbPedia dataset
- Size: 1 million records
- Dimensions: 1536-dimensional vectors
- Format: Embedding vectors generated from titles and descriptions
Metrics
- Indexing Time: Time to build the index (minutes)
- QPS: Queries per second (higher is better)
- Precision: Accuracy of results compared to ground truth
- Latency: Response time at p50 (median) and p95 percentiles
Note: All benchmarks were conducted on identical hardware (8 vCPUs, 32GB RAM) with default configurations optimized for each database system. Each test was run 5 times and averaged to ensure consistency.
Performance Highlights
Cosdata demonstrates industry-leading performance across all key metrics:
- Industry-leading 1758+ QPS on 1M record datasets with 1536-dimensional vectors
- 42% faster than Qdrant
- 54% faster than Weaviate
- 146% faster than Elastic Search
- Consistent 97% precision across challenging search tasks
- Significantly faster indexing than Elastic Search while maintaining superior query performance
Benchmark Data
Vector DB | Indexing Time (m) | QPS | Precision | p50 (ms) | p95 (ms) |
---|---|---|---|---|---|
CosdataFastest | 16.32 | 1758 | 0.97 | 7 | 8 |
Qdrant | 24.43 | 1238 | 0.99 | 4 | 5 |
Weaviate | 13.94 | 1142 | 0.97 | 5 | 7 |
Elastic Search | 83.72 | 716 | 0.98 | 22 | 73 |
Full-Text Search Benchmarks
We benchmarked Cosdata's full-text search capabilities against ElasticSearch across multiple datasets to evaluate performance, accuracy, and efficiency.
Benchmark Methodology
Our full-text search benchmarks were conducted using the BEIR benchmark suite, which provides a diverse set of information retrieval datasets. All benchmarks were run on identical hardware configurations to ensure fair comparison. Cosdata's custom BM25 implementation was used for these benchmarks, showcasing our optimized approach to lexical search. Read our technical blog post to learn how our BM25 implementation outperforms ElasticSearch.
Metrics
- QPS: Queries per second (higher is better)
- NDCG@10: Normalized Discounted Cumulative Gain at 10 results
- Latency: Response time at p50 (median) and p95 percentiles (lower is better)
- Insertion Time: Time taken to index the entire corpus
Datasets
- arguana: Argument quality and convincingness assessment dataset
- climate-fever: Climate fact-checking dataset based on FEVER methodology
- fever: Fact Extraction and Verification dataset
- fiqa: Financial opinion mining and question answering dataset
- msmarco: Microsoft Machine Reading Comprehension dataset
- nq: Natural Questions dataset for open-domain question answering
- quora: Quora duplicate questions dataset
- scidocs: Scientific document classification and recommendation dataset
- scifact: Scientific fact checking dataset
- trec-covid: COVID-19 information retrieval dataset
- webis-touche2020: Argument retrieval dataset on controversial topics
Performance Highlights
- Cosdata's custom BM25 implementation achieves up to 151x faster QPS than ElasticSearch on the scifact dataset, with ~44x average improvement across all datasets
- Cosdata maintains similar ranking quality (NDCG) to ElasticSearch while delivering superior performance
- Index creation is significantly faster with Cosdata, up to 12x faster on large datasets
- Cosdata shows lower latency at both p50 and p95 percentiles across all tested datasets
Benchmark Data
Select a dataset category and comparison metric to visualize performance differences:
Dataset Category:
Comparison Metric:
Cosdata delivers 44.5x faster query throughput across small datasets (<500k documents).
Dataset (Corpus Size) | System | Insertion Time (seconds) | QPS | NDCG@10 | p50 Latency (ms) | p95 Latency (ms) |
---|---|---|---|---|---|---|
arguana (8,674) | Cosdata | 0.1 | 2167 | 0.40 | 9 | 15 |
ElasticSearch | 1.4 | 263 | 0.48 | 44 | 74 | |
climate-fever (5,416,593) | Cosdata | 40.6 | 135 | 0.13 | 106 | 379 |
ElasticSearch | 522.8 | 84 | 0.14 | 162 | 263 | |
fever (5,416,568) | Cosdata | 40.3 | 314 | 0.47 | 52 | 157 |
ElasticSearch | 525.7 | 154 | 0.52 | 80 | 138 | |
fiqa (57,638) | Cosdata | 0.5 | 4942 | 0.25 | 7 | 12 |
ElasticSearch | 6.7 | 251 | 0.25 | 39 | 60 | |
msmarco (8,841,823) | Cosdata | 57.7 | 315 | 0.23 | 46 | 162 |
ElasticSearch | 714.7 | 166 | 0.23 | 73 | 129 | |
nq (2,681,468) | Cosdata | 19.3 | 483 | 0.29 | 30 | 81 |
ElasticSearch | 243.2 | 197 | 0.29 | 59 | 100 | |
quora (522,931) | Cosdata | 2.7 | 1425 | 0.81 | 11 | 36 |
ElasticSearch | 30.2 | 323 | 0.81 | 39 | 55 | |
scidocs (25,657) | Cosdata | 0.3 | 13338 | 0.16 | 7 | 12 |
ElasticSearch | 3.6 | 319 | 0.15 | 33 | 48 | |
scifact (5,183) | Cosdata | 0.1 | 40909 | 0.69 | 7 | 13 |
ElasticSearch | 1.0 | 271 | 0.68 | 34 | 51 | |
trec-covid (171,332) | Cosdata | 1.7 | 2219 | 0.61 | 10 | 18 |
ElasticSearch | 22.1 | 110 | 0.62 | 57 | 88 | |
webis-touche2020 (382,545) | Cosdata | 5.5 | 2789 | 0.34 | 10 | 18 |
ElasticSearch | 63.1 | 108 | 0.34 | 62 | 99 |
Try Cosdata
Experience the performance benefits of Cosdata in your own applications. Our open-source implementation is available on GitHub.