3.2 KiB
3.2 KiB
ADR-013: HuggingFace Model Publishing Strategy
Status
Accepted - 2026-01-20
Context
RuvLTRA models need to be distributed to users efficiently. HuggingFace Hub is the industry standard for model hosting with:
- High-speed CDN for global distribution
- Git-based versioning
- Model cards for documentation
- API for programmatic access
- Integration with major ML frameworks
Decision
1. Repository Structure
All models consolidated under a single HuggingFace repository:
| Repository | Purpose | Models |
|---|---|---|
ruv/ruvltra |
All RuvLTRA models | Claude Code, Small, Medium, Large |
URL: https://huggingface.co/ruv/ruvltra
2. File Naming Convention
ruvltra-{size}-{quant}.gguf
Examples:
ruvltra-0.5b-q4_k_m.ggufruvltra-3b-q8_0.ggufruvltra-claude-code-0.5b-q4_k_m.gguf
3. Authentication
Support multiple environment variable names for HuggingFace token:
HF_TOKEN(primary)HUGGING_FACE_HUB_TOKEN(legacy)HUGGINGFACE_API_KEY(common alternative)
4. Upload Workflow
// Using ModelUploader
let uploader = ModelUploader::new(get_hf_token().unwrap());
uploader.upload(
"./model.gguf",
"ruv/ruvltra-small",
Some(metadata),
)?;
5. Model Card Requirements
Each repository must include:
- YAML frontmatter with tags, license, language
- Model description and capabilities
- Hardware requirements table
- Usage examples (Rust, Python, CLI)
- Benchmark results (when available)
- License information
6. Versioning Strategy
- Use HuggingFace's built-in Git versioning
- Tag major releases (e.g.,
v1.0.0) - Maintain
mainbranch for latest stable - Use branches for experimental variants
Consequences
Positive
- Accessibility: Models available via standard HuggingFace APIs
- Discoverability: Indexed in HuggingFace model search
- Versioning: Full Git history for model evolution
- CDN: Fast global downloads via Cloudflare
- Documentation: Model cards provide user guidance
Negative
- Storage Costs: Large models require HuggingFace Pro for private repos
- Dependency: Reliance on external service availability
- Sync Complexity: Must keep registry.rs in sync with HuggingFace
Mitigations
- Use public repos (free unlimited storage)
- Implement fallback to direct URL downloads
- Automate registry updates via CI/CD
Implementation
Phase 1: Initial Publishing (Complete)
- Create consolidated
ruv/ruvltrarepository - Upload Claude Code, Small, and Medium models
- Upload Q4_K_M quantized models
- Add comprehensive model card with badges, tutorials, architecture
Phase 2: Enhanced Distribution
- Add Q8 quantization variants
- Add FP16 variants for fine-tuning
- Implement automated CI/CD publishing
- Add SONA weight exports
Phase 3: Ecosystem Integration
- Add to llama.cpp model zoo
- Create Ollama modelfile
- Publish to alternative registries (ModelScope)
References
- HuggingFace Hub Documentation: https://huggingface.co/docs/hub
- GGUF Format Specification: https://github.com/ggerganov/ggml/blob/master/docs/gguf.md
- RuvLTRA Registry:
crates/ruvllm/src/hub/registry.rs - Related Issue: #121