Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'
This commit is contained in:
173
vendor/ruvector/crates/ruvector-attention-cli/README.md
vendored
Normal file
173
vendor/ruvector/crates/ruvector-attention-cli/README.md
vendored
Normal file
@@ -0,0 +1,173 @@
|
||||
# RuVector Attention CLI
|
||||
|
||||
A high-performance command-line interface for working with attention mechanisms.
|
||||
|
||||
## Features
|
||||
|
||||
- **Multiple Attention Types**: Scaled dot-product, multi-head, hyperbolic, flash, linear, and MoE
|
||||
- **Compute**: Process attention on input data with various configurations
|
||||
- **Benchmark**: Performance testing across different dimensions and attention types
|
||||
- **Convert**: Transform data between JSON, binary, MessagePack, CSV formats
|
||||
- **Serve**: HTTP server with REST API for attention computation
|
||||
- **REPL**: Interactive shell for exploratory analysis
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
cargo install --path .
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### Compute Attention
|
||||
|
||||
```bash
|
||||
# Scaled dot-product attention
|
||||
ruvector-attention compute -i input.json -o output.json -a scaled_dot
|
||||
|
||||
# Multi-head attention with 16 heads
|
||||
ruvector-attention compute -i input.json -a multi_head --num-heads 16
|
||||
|
||||
# Hyperbolic attention with custom curvature
|
||||
ruvector-attention compute -i input.json -a hyperbolic --curvature 2.0
|
||||
|
||||
# Flash attention (memory-efficient)
|
||||
ruvector-attention compute -i input.json -a flash
|
||||
|
||||
# Mixture of Experts attention
|
||||
ruvector-attention compute -i input.json -a moe --num-experts 8 --top-k 2
|
||||
```
|
||||
|
||||
### Run Benchmarks
|
||||
|
||||
```bash
|
||||
# Benchmark all attention types
|
||||
ruvector-attention benchmark
|
||||
|
||||
# Benchmark specific types
|
||||
ruvector-attention benchmark -a scaled_dot,multi_head,flash
|
||||
|
||||
# Custom dimensions
|
||||
ruvector-attention benchmark -d 256,512,1024 -i 1000
|
||||
|
||||
# Output to CSV
|
||||
ruvector-attention benchmark -o results.csv -f csv
|
||||
```
|
||||
|
||||
### Convert Data
|
||||
|
||||
```bash
|
||||
# JSON to MessagePack
|
||||
ruvector-attention convert -i data.json -o data.msgpack --to msgpack
|
||||
|
||||
# Binary to JSON (pretty-printed)
|
||||
ruvector-attention convert -i data.bin -o data.json --to json --pretty
|
||||
|
||||
# Auto-detect input format
|
||||
ruvector-attention convert -i input.dat -o output.json --to json
|
||||
```
|
||||
|
||||
### Start HTTP Server
|
||||
|
||||
```bash
|
||||
# Default (localhost:8080)
|
||||
ruvector-attention serve
|
||||
|
||||
# Custom host and port
|
||||
ruvector-attention serve -H 0.0.0.0 -p 3000
|
||||
|
||||
# With CORS enabled
|
||||
ruvector-attention serve --cors
|
||||
```
|
||||
|
||||
### Interactive REPL
|
||||
|
||||
```bash
|
||||
# Start REPL
|
||||
ruvector-attention repl
|
||||
|
||||
# Commands within REPL:
|
||||
attention> help
|
||||
attention> load data.json
|
||||
attention> type multi_head
|
||||
attention> compute
|
||||
attention> config
|
||||
attention> quit
|
||||
```
|
||||
|
||||
## API Endpoints
|
||||
|
||||
When running the server, the following endpoints are available:
|
||||
|
||||
- `GET /health` - Health check
|
||||
- `POST /attention/scaled_dot` - Scaled dot-product attention
|
||||
- `POST /attention/multi_head` - Multi-head attention
|
||||
- `POST /attention/hyperbolic` - Hyperbolic attention
|
||||
- `POST /attention/flash` - Flash attention
|
||||
- `POST /attention/linear` - Linear attention
|
||||
- `POST /attention/moe` - Mixture of Experts attention
|
||||
- `POST /batch` - Batch computation
|
||||
|
||||
### Example Request
|
||||
|
||||
```bash
|
||||
curl -X POST http://localhost:8080/attention/scaled_dot \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"query": [[0.1, 0.2, 0.3]],
|
||||
"keys": [[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]],
|
||||
"values": [[0.7, 0.8, 0.9], [1.0, 1.1, 1.2]]
|
||||
}'
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
Create a `ruvector-attention.toml` file:
|
||||
|
||||
```toml
|
||||
[attention]
|
||||
default_dim = 512
|
||||
default_heads = 8
|
||||
default_type = "scaled_dot"
|
||||
|
||||
[server]
|
||||
host = "0.0.0.0"
|
||||
port = 8080
|
||||
max_batch_size = 32
|
||||
|
||||
[output]
|
||||
format = "json"
|
||||
pretty = true
|
||||
|
||||
[benchmark]
|
||||
iterations = 100
|
||||
dimensions = [128, 256, 512, 1024]
|
||||
```
|
||||
|
||||
## Input Format
|
||||
|
||||
Input files should contain:
|
||||
|
||||
```json
|
||||
{
|
||||
"query": [[...], [...], ...],
|
||||
"keys": [[...], [...], ...],
|
||||
"values": [[...], [...], ...],
|
||||
"dim": 512
|
||||
}
|
||||
```
|
||||
|
||||
## Performance
|
||||
|
||||
Benchmark results on typical hardware:
|
||||
|
||||
| Attention Type | 512-dim | 1024-dim | 2048-dim |
|
||||
|---------------|---------|----------|----------|
|
||||
| Scaled Dot | 0.5ms | 1.2ms | 4.8ms |
|
||||
| Multi-Head | 1.2ms | 3.5ms | 14.2ms |
|
||||
| Flash | 0.3ms | 0.8ms | 3.1ms |
|
||||
| Linear | 0.4ms | 1.0ms | 3.9ms |
|
||||
|
||||
## License
|
||||
|
||||
MIT OR Apache-2.0
|
||||
Reference in New Issue
Block a user