Optimize Probe for large codebases and demanding workloads.
# Optimized search
PROBE_PARSER_POOL_SIZE=8 probe search "query" ./ \
--language rust \
--max-results 50 \
--session my-search
# Skip warmup for scripts
PROBE_NO_PARSER_WARMUP=1 probe search "query" ./
# Debug performance
DEBUG=1 probe search "query" ./Probe uses multiple optimization techniques:
┌─────────────────────────────────────────────────────┐
│ Search Pipeline │
├─────────────────────────────────────────────────────┤
│ 1. File Scanning (ripgrep) │ Parallel │
│ 2. File Filtering (.gitignore) │ O(1) lookup │
│ 3. Content Matching (SIMD) │ Vectorized │
│ 4. AST Parsing (tree-sitter) │ Cached │
│ 5. Block Extraction │ Streamed │
│ 6. Ranking (BM25/SIMD) │ Vectorized │
│ 7. Result Assembly │ Lazy │
└─────────────────────────────────────────────────────┘
Control parallel parsing threads:
# Default: number of CPU cores
PROBE_PARSER_POOL_SIZE=8 probe search "query" ./
# For I/O bound workloads (network drives, SSDs)
PROBE_PARSER_POOL_SIZE=16 probe search "query" ./
# For CPU bound workloads (complex AST patterns)
PROBE_PARSER_POOL_SIZE=4 probe search "query" ./Recommendation:
- SSD: 2x CPU cores
- HDD: 1x CPU cores
- Network: 4x CPU cores (hide latency)
Cache parsed ASTs for repeated queries:
# Increase cache for large codebases
PROBE_TREE_CACHE_SIZE=1000 probe search "query" ./
# Disable caching (low memory environments)
PROBE_TREE_CACHE_SIZE=0 probe search "query" ./Default: Automatic based on available memory
Skip warmup for faster cold starts:
# Skip warmup (faster first query, slower subsequent)
PROBE_NO_PARSER_WARMUP=1 probe search "query" ./
# Default behavior: warm up parsers on first use
probe search "query" ./Use cases:
- CI/CD pipelines (single query)
- Script automation
- Quick one-off searches
Probe uses SIMD (Single Instruction, Multiple Data) for:
- Tokenization: memchr, aho-corasick
- Ranking: simsimd for dot products
- Pattern Matching: Vectorized string operations
For debugging or compatibility:
# Disable all SIMD
DISABLE_SIMD_TOKENIZATION=1 \
DISABLE_SIMD_RANKING=1 \
DISABLE_SIMD_PATTERN_MATCHING=1 \
probe search "query" ./
# Disable specific features
DISABLE_SIMD_RANKING=1 probe search "query" ./Note: Disabling SIMD reduces performance by 2-10x.
# Stop after 20 results (fastest)
probe search "function" ./ --max-results 20
# Limit by token count (for AI)
probe search "function" ./ --max-tokens 8000
# Limit by bytes
probe search "function" ./ --max-bytes 50000# Search only Rust files (skip parsing others)
probe search "struct" ./ --language rust
# Use search hints
probe search "interface AND lang:typescript" ./
# Filter by extension
probe search "class AND ext:py" ./# First batch
probe search "api" ./ --session api-search --max-results 100
# Subsequent batches (deduplicated)
probe search "api" ./ --session api-search --max-results 100Sessions cache:
- Parsed file list
- MD5 hashes for invalidation
- Previously returned blocks
For maximum speed, disable block merging:
probe search "function" ./ --no-merge# Limit results to prevent memory issues
probe search "query" ./ --max-results 100
# Use pagination
probe search "query" ./ --session my-search --max-results 50
# Process files sequentially
PROBE_PARSER_POOL_SIZE=1 probe search "query" ./# On Linux
/usr/bin/time -v probe search "query" ./
# On macOS
/usr/bin/time -l probe search "query" ./# Run all benchmarks
probe benchmark
# Run specific benchmark
probe benchmark --bench search
# Quick benchmarks only
probe benchmark --fast
# Save results
probe benchmark --format json --output results.json
# Compare with baseline
probe benchmark --compare --baseline previous.json# Verbose output with timing
DEBUG=1 probe search "query" ./
# Time multiple runs
for i in {1..5}; do
time probe search "query" ./ --max-results 10 2>&1 | tail -1
done# With cargo flamegraph (requires installation)
cargo flamegraph -- search "query" ./path
# With perf (Linux)
perf record probe search "query" ./
perf report| Reranker | Speed | Quality | Use Case |
|---|---|---|---|
bm25 |
Fast | Good | Default, most queries |
tfidf |
Fast | Good | Exact term matching |
hybrid |
Medium | Better | Balanced |
hybrid2 |
Medium | Better | File-aware ranking |
ms-marco-* |
Slow | Best | When quality matters |
# Default (fast)
probe search "auth" ./
# High quality (slower)
probe search "auth" ./ --reranker ms-marco-tinybert# Bad: search everything
probe search "function" ./
# Good: search specific directories
probe search "function" ./src
# Better: multiple specific paths
probe search "function" ./src/core ./src/api# Exclude large directories
probe search "config" ./ \
--ignore "node_modules/*" \
--ignore "vendor/*" \
--ignore "*.min.js"| Format | Speed | Use Case |
|---|---|---|
outline |
Fastest | Interactive use |
plain |
Fast | Piping to tools |
json |
Medium | Machine processing |
color |
Medium | Terminal display |
markdown |
Slow | Documentation |
# Fast output for scripts
probe search "query" ./ --format plain
# Machine readable
probe search "query" ./ --format jsonTypical performance on a medium codebase (100k lines):
| Operation | Cold Start | Warm |
|---|---|---|
| Simple search | 200ms | 50ms |
| With AST parsing | 500ms | 100ms |
| Full ranking | 800ms | 200ms |
| BERT reranking | 2000ms | 1500ms |
| Tool | 100k files | Memory |
|---|---|---|
| grep | 2s | 10MB |
| ripgrep | 0.5s | 50MB |
| Probe (search) | 0.8s | 100MB |
| Probe (query) | 1.5s | 200MB |
DEBUG=1 probe search "query" ./Output shows:
- File scanning time
- Parsing time per file type
- Ranking time
- Total result count
- Memory usage
[DEBUG] Scan: 150ms (1234 files)
[DEBUG] Filter: 20ms (890 files)
[DEBUG] Parse: 400ms (890 files) ← Parser pool too small?
[DEBUG] Rank: 50ms (234 blocks)
[DEBUG] Output: 30ms
[DEBUG] Total: 650ms
| Bottleneck | Symptom | Solution |
|---|---|---|
| File I/O | Slow scan phase | Use SSD, increase pool |
| Parsing | Slow parse phase | Increase PARSER_POOL_SIZE |
| Memory | OOM errors | Limit results, use sessions |
| Network | Very slow scan | Cache locally, increase pool |
- Search Command - Search options
- Query Command - Query optimization
- Environment Variables - All variables
- Troubleshooting - Common issues