A high-performance command-line interface for working with attention mechanisms.
- Multiple Attention Types: Scaled dot-product, multi-head, hyperbolic, flash, linear, and MoE
- Compute: Process attention on input data with various configurations
- Benchmark: Performance testing across different dimensions and attention types
- Convert: Transform data between JSON, binary, MessagePack, CSV formats
- Serve: HTTP server with REST API for attention computation
- REPL: Interactive shell for exploratory analysis
cargo install --path .# Scaled dot-product attention
ruvector-attention compute -i input.json -o output.json -a scaled_dot
# Multi-head attention with 16 heads
ruvector-attention compute -i input.json -a multi_head --num-heads 16
# Hyperbolic attention with custom curvature
ruvector-attention compute -i input.json -a hyperbolic --curvature 2.0
# Flash attention (memory-efficient)
ruvector-attention compute -i input.json -a flash
# Mixture of Experts attention
ruvector-attention compute -i input.json -a moe --num-experts 8 --top-k 2# Benchmark all attention types
ruvector-attention benchmark
# Benchmark specific types
ruvector-attention benchmark -a scaled_dot,multi_head,flash
# Custom dimensions
ruvector-attention benchmark -d 256,512,1024 -i 1000
# Output to CSV
ruvector-attention benchmark -o results.csv -f csv# JSON to MessagePack
ruvector-attention convert -i data.json -o data.msgpack --to msgpack
# Binary to JSON (pretty-printed)
ruvector-attention convert -i data.bin -o data.json --to json --pretty
# Auto-detect input format
ruvector-attention convert -i input.dat -o output.json --to json# Default (localhost:8080)
ruvector-attention serve
# Custom host and port
ruvector-attention serve -H 0.0.0.0 -p 3000
# With CORS enabled
ruvector-attention serve --cors# Start REPL
ruvector-attention repl
# Commands within REPL:
attention> help
attention> load data.json
attention> type multi_head
attention> compute
attention> config
attention> quitWhen running the server, the following endpoints are available:
GET /health- Health checkPOST /attention/scaled_dot- Scaled dot-product attentionPOST /attention/multi_head- Multi-head attentionPOST /attention/hyperbolic- Hyperbolic attentionPOST /attention/flash- Flash attentionPOST /attention/linear- Linear attentionPOST /attention/moe- Mixture of Experts attentionPOST /batch- Batch computation
curl -X POST http://localhost:8080/attention/scaled_dot \
-H "Content-Type: application/json" \
-d '{
"query": [[0.1, 0.2, 0.3]],
"keys": [[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]],
"values": [[0.7, 0.8, 0.9], [1.0, 1.1, 1.2]]
}'Create a ruvector-attention.toml file:
[attention]
default_dim = 512
default_heads = 8
default_type = "scaled_dot"
[server]
host = "0.0.0.0"
port = 8080
max_batch_size = 32
[output]
format = "json"
pretty = true
[benchmark]
iterations = 100
dimensions = [128, 256, 512, 1024]Input files should contain:
{
"query": [[...], [...], ...],
"keys": [[...], [...], ...],
"values": [[...], [...], ...],
"dim": 512
}Benchmark results on typical hardware:
| Attention Type | 512-dim | 1024-dim | 2048-dim |
|---|---|---|---|
| Scaled Dot | 0.5ms | 1.2ms | 4.8ms |
| Multi-Head | 1.2ms | 3.5ms | 14.2ms |
| Flash | 0.3ms | 0.8ms | 3.1ms |
| Linear | 0.4ms | 1.0ms | 3.9ms |
MIT OR Apache-2.0