wifi-densepose/vendor/ruvector/docs/adr/ADR-013-huggingface-publishing.md

# ADR-013: HuggingFace Model Publishing Strategy

## Status
**Accepted** - 2026-01-20

## Context

RuvLTRA models need to be distributed to users efficiently. HuggingFace Hub is the industry standard for model hosting with:
- High-speed CDN for global distribution
- Git-based versioning
- Model cards for documentation
- API for programmatic access
- Integration with major ML frameworks

## Decision

### 1. Repository Structure

All models consolidated under a single HuggingFace repository:

| Repository | Purpose | Models |
|------------|---------|--------|
| **`ruv/ruvltra`** | All RuvLTRA models | Claude Code, Small, Medium, Large |

**URL**: https://huggingface.co/ruv/ruvltra

### 2. File Naming Convention

```
ruvltra-{size}-{quant}.gguf
```

Examples:
- `ruvltra-0.5b-q4_k_m.gguf`
- `ruvltra-3b-q8_0.gguf`
- `ruvltra-claude-code-0.5b-q4_k_m.gguf`

### 3. Authentication

Support multiple environment variable names for HuggingFace token:
- `HF_TOKEN` (primary)
- `HUGGING_FACE_HUB_TOKEN` (legacy)
- `HUGGINGFACE_API_KEY` (common alternative)

### 4. Upload Workflow

```rust
// Using ModelUploader
let uploader = ModelUploader::new(get_hf_token().unwrap());
uploader.upload(
    "./model.gguf",
    "ruv/ruvltra-small",
    Some(metadata),
)?;
```

### 5. Model Card Requirements

Each repository must include:
- YAML frontmatter with tags, license, language
- Model description and capabilities
- Hardware requirements table
- Usage examples (Rust, Python, CLI)
- Benchmark results (when available)
- License information

### 6. Versioning Strategy

- Use HuggingFace's built-in Git versioning
- Tag major releases (e.g., `v1.0.0`)
- Maintain `main` branch for latest stable
- Use branches for experimental variants

## Consequences

### Positive
- **Accessibility**: Models available via standard HuggingFace APIs
- **Discoverability**: Indexed in HuggingFace model search
- **Versioning**: Full Git history for model evolution
- **CDN**: Fast global downloads via Cloudflare
- **Documentation**: Model cards provide user guidance

### Negative
- **Storage Costs**: Large models require HuggingFace Pro for private repos
- **Dependency**: Reliance on external service availability
- **Sync Complexity**: Must keep registry.rs in sync with HuggingFace

### Mitigations
- Use public repos (free unlimited storage)
- Implement fallback to direct URL downloads
- Automate registry updates via CI/CD

## Implementation

### Phase 1: Initial Publishing (Complete)
- [x] Create consolidated `ruv/ruvltra` repository
- [x] Upload Claude Code, Small, and Medium models
- [x] Upload Q4_K_M quantized models
- [x] Add comprehensive model card with badges, tutorials, architecture

### Phase 2: Enhanced Distribution
- [ ] Add Q8 quantization variants
- [ ] Add FP16 variants for fine-tuning
- [ ] Implement automated CI/CD publishing
- [ ] Add SONA weight exports

### Phase 3: Ecosystem Integration
- [ ] Add to llama.cpp model zoo
- [ ] Create Ollama modelfile
- [ ] Publish to alternative registries (ModelScope)

## References

- HuggingFace Hub Documentation: https://huggingface.co/docs/hub
- GGUF Format Specification: https://github.com/ggerganov/ggml/blob/master/docs/gguf.md
- RuvLTRA Registry: `crates/ruvllm/src/hub/registry.rs`
- Related Issue: #121