git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
174 lines
3.8 KiB
Markdown
174 lines
3.8 KiB
Markdown
# RuVector Attention CLI
|
|
|
|
A high-performance command-line interface for working with attention mechanisms.
|
|
|
|
## Features
|
|
|
|
- **Multiple Attention Types**: Scaled dot-product, multi-head, hyperbolic, flash, linear, and MoE
|
|
- **Compute**: Process attention on input data with various configurations
|
|
- **Benchmark**: Performance testing across different dimensions and attention types
|
|
- **Convert**: Transform data between JSON, binary, MessagePack, CSV formats
|
|
- **Serve**: HTTP server with REST API for attention computation
|
|
- **REPL**: Interactive shell for exploratory analysis
|
|
|
|
## Installation
|
|
|
|
```bash
|
|
cargo install --path .
|
|
```
|
|
|
|
## Usage
|
|
|
|
### Compute Attention
|
|
|
|
```bash
|
|
# Scaled dot-product attention
|
|
ruvector-attention compute -i input.json -o output.json -a scaled_dot
|
|
|
|
# Multi-head attention with 16 heads
|
|
ruvector-attention compute -i input.json -a multi_head --num-heads 16
|
|
|
|
# Hyperbolic attention with custom curvature
|
|
ruvector-attention compute -i input.json -a hyperbolic --curvature 2.0
|
|
|
|
# Flash attention (memory-efficient)
|
|
ruvector-attention compute -i input.json -a flash
|
|
|
|
# Mixture of Experts attention
|
|
ruvector-attention compute -i input.json -a moe --num-experts 8 --top-k 2
|
|
```
|
|
|
|
### Run Benchmarks
|
|
|
|
```bash
|
|
# Benchmark all attention types
|
|
ruvector-attention benchmark
|
|
|
|
# Benchmark specific types
|
|
ruvector-attention benchmark -a scaled_dot,multi_head,flash
|
|
|
|
# Custom dimensions
|
|
ruvector-attention benchmark -d 256,512,1024 -i 1000
|
|
|
|
# Output to CSV
|
|
ruvector-attention benchmark -o results.csv -f csv
|
|
```
|
|
|
|
### Convert Data
|
|
|
|
```bash
|
|
# JSON to MessagePack
|
|
ruvector-attention convert -i data.json -o data.msgpack --to msgpack
|
|
|
|
# Binary to JSON (pretty-printed)
|
|
ruvector-attention convert -i data.bin -o data.json --to json --pretty
|
|
|
|
# Auto-detect input format
|
|
ruvector-attention convert -i input.dat -o output.json --to json
|
|
```
|
|
|
|
### Start HTTP Server
|
|
|
|
```bash
|
|
# Default (localhost:8080)
|
|
ruvector-attention serve
|
|
|
|
# Custom host and port
|
|
ruvector-attention serve -H 0.0.0.0 -p 3000
|
|
|
|
# With CORS enabled
|
|
ruvector-attention serve --cors
|
|
```
|
|
|
|
### Interactive REPL
|
|
|
|
```bash
|
|
# Start REPL
|
|
ruvector-attention repl
|
|
|
|
# Commands within REPL:
|
|
attention> help
|
|
attention> load data.json
|
|
attention> type multi_head
|
|
attention> compute
|
|
attention> config
|
|
attention> quit
|
|
```
|
|
|
|
## API Endpoints
|
|
|
|
When running the server, the following endpoints are available:
|
|
|
|
- `GET /health` - Health check
|
|
- `POST /attention/scaled_dot` - Scaled dot-product attention
|
|
- `POST /attention/multi_head` - Multi-head attention
|
|
- `POST /attention/hyperbolic` - Hyperbolic attention
|
|
- `POST /attention/flash` - Flash attention
|
|
- `POST /attention/linear` - Linear attention
|
|
- `POST /attention/moe` - Mixture of Experts attention
|
|
- `POST /batch` - Batch computation
|
|
|
|
### Example Request
|
|
|
|
```bash
|
|
curl -X POST http://localhost:8080/attention/scaled_dot \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"query": [[0.1, 0.2, 0.3]],
|
|
"keys": [[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]],
|
|
"values": [[0.7, 0.8, 0.9], [1.0, 1.1, 1.2]]
|
|
}'
|
|
```
|
|
|
|
## Configuration
|
|
|
|
Create a `ruvector-attention.toml` file:
|
|
|
|
```toml
|
|
[attention]
|
|
default_dim = 512
|
|
default_heads = 8
|
|
default_type = "scaled_dot"
|
|
|
|
[server]
|
|
host = "0.0.0.0"
|
|
port = 8080
|
|
max_batch_size = 32
|
|
|
|
[output]
|
|
format = "json"
|
|
pretty = true
|
|
|
|
[benchmark]
|
|
iterations = 100
|
|
dimensions = [128, 256, 512, 1024]
|
|
```
|
|
|
|
## Input Format
|
|
|
|
Input files should contain:
|
|
|
|
```json
|
|
{
|
|
"query": [[...], [...], ...],
|
|
"keys": [[...], [...], ...],
|
|
"values": [[...], [...], ...],
|
|
"dim": 512
|
|
}
|
|
```
|
|
|
|
## Performance
|
|
|
|
Benchmark results on typical hardware:
|
|
|
|
| Attention Type | 512-dim | 1024-dim | 2048-dim |
|
|
|---------------|---------|----------|----------|
|
|
| Scaled Dot | 0.5ms | 1.2ms | 4.8ms |
|
|
| Multi-Head | 1.2ms | 3.5ms | 14.2ms |
|
|
| Flash | 0.3ms | 0.8ms | 3.1ms |
|
|
| Linear | 0.4ms | 1.0ms | 3.9ms |
|
|
|
|
## License
|
|
|
|
MIT OR Apache-2.0
|