wifi-densepose/crates/ruvector-attention-cli/README.md

# RuVector Attention CLI

A high-performance command-line interface for working with attention mechanisms.

## Features

- **Multiple Attention Types**: Scaled dot-product, multi-head, hyperbolic, flash, linear, and MoE
- **Compute**: Process attention on input data with various configurations
- **Benchmark**: Performance testing across different dimensions and attention types
- **Convert**: Transform data between JSON, binary, MessagePack, CSV formats
- **Serve**: HTTP server with REST API for attention computation
- **REPL**: Interactive shell for exploratory analysis

## Installation

```bash
cargo install --path .
```

## Usage

### Compute Attention

```bash
# Scaled dot-product attention
ruvector-attention compute -i input.json -o output.json -a scaled_dot

# Multi-head attention with 16 heads
ruvector-attention compute -i input.json -a multi_head --num-heads 16

# Hyperbolic attention with custom curvature
ruvector-attention compute -i input.json -a hyperbolic --curvature 2.0

# Flash attention (memory-efficient)
ruvector-attention compute -i input.json -a flash

# Mixture of Experts attention
ruvector-attention compute -i input.json -a moe --num-experts 8 --top-k 2
```

### Run Benchmarks

```bash
# Benchmark all attention types
ruvector-attention benchmark

# Benchmark specific types
ruvector-attention benchmark -a scaled_dot,multi_head,flash

# Custom dimensions
ruvector-attention benchmark -d 256,512,1024 -i 1000

# Output to CSV
ruvector-attention benchmark -o results.csv -f csv
```

### Convert Data

```bash
# JSON to MessagePack
ruvector-attention convert -i data.json -o data.msgpack --to msgpack

# Binary to JSON (pretty-printed)
ruvector-attention convert -i data.bin -o data.json --to json --pretty

# Auto-detect input format
ruvector-attention convert -i input.dat -o output.json --to json
```

### Start HTTP Server

```bash
# Default (localhost:8080)
ruvector-attention serve

# Custom host and port
ruvector-attention serve -H 0.0.0.0 -p 3000

# With CORS enabled
ruvector-attention serve --cors
```

### Interactive REPL

```bash
# Start REPL
ruvector-attention repl

# Commands within REPL:
attention> help
attention> load data.json
attention> type multi_head
attention> compute
attention> config
attention> quit
```

## API Endpoints

When running the server, the following endpoints are available:

- `GET /health` - Health check
- `POST /attention/scaled_dot` - Scaled dot-product attention
- `POST /attention/multi_head` - Multi-head attention
- `POST /attention/hyperbolic` - Hyperbolic attention
- `POST /attention/flash` - Flash attention
- `POST /attention/linear` - Linear attention
- `POST /attention/moe` - Mixture of Experts attention
- `POST /batch` - Batch computation

### Example Request

```bash
curl -X POST http://localhost:8080/attention/scaled_dot \
  -H "Content-Type: application/json" \
  -d '{
    "query": [[0.1, 0.2, 0.3]],
    "keys": [[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]],
    "values": [[0.7, 0.8, 0.9], [1.0, 1.1, 1.2]]
  }'
```

## Configuration

Create a `ruvector-attention.toml` file:

```toml
[attention]
default_dim = 512
default_heads = 8
default_type = "scaled_dot"

[server]
host = "0.0.0.0"
port = 8080
max_batch_size = 32

[output]
format = "json"
pretty = true

[benchmark]
iterations = 100
dimensions = [128, 256, 512, 1024]
```

## Input Format

Input files should contain:

```json
{
  "query": [[...], [...], ...],
  "keys": [[...], [...], ...],
  "values": [[...], [...], ...],
  "dim": 512
}
```

## Performance

Benchmark results on typical hardware:

| Attention Type | 512-dim | 1024-dim | 2048-dim |
|---------------|---------|----------|----------|
| Scaled Dot    | 0.5ms   | 1.2ms    | 4.8ms    |
| Multi-Head    | 1.2ms   | 3.5ms    | 14.2ms   |
| Flash         | 0.3ms   | 0.8ms    | 3.1ms    |
| Linear        | 0.4ms   | 1.0ms    | 3.9ms    |

## License

MIT OR Apache-2.0