Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
This commit is contained in:
348
examples/data/framework/EXPORT_GUIDE.md
Normal file
348
examples/data/framework/EXPORT_GUIDE.md
Normal file
@@ -0,0 +1,348 @@
|
||||
# RuVector Discovery Framework - Export Guide
|
||||
|
||||
## Overview
|
||||
|
||||
The export module provides comprehensive export functionality for RuVector Discovery Framework results. Export graphs, patterns, and coherence data in multiple industry-standard formats.
|
||||
|
||||
## Supported Formats
|
||||
|
||||
### 1. GraphML (`.graphml`)
|
||||
- **Use Case**: Import into Gephi, Cytoscape, yEd
|
||||
- **Features**: Full graph structure with node/edge attributes
|
||||
- **Best For**: Visual network analysis, community detection
|
||||
|
||||
### 2. DOT (`.dot`)
|
||||
- **Use Case**: Render with Graphviz (dot, neato, fdp, sfdp)
|
||||
- **Features**: Hierarchical or force-directed layouts
|
||||
- **Best For**: Publication-quality graph visualizations
|
||||
|
||||
### 3. CSV (`.csv`)
|
||||
- **Use Case**: Analysis in Excel, R, Python, Julia
|
||||
- **Features**: Tabular data with full pattern/coherence details
|
||||
- **Best For**: Statistical analysis, time-series analysis
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Basic Export
|
||||
|
||||
```rust
|
||||
use ruvector_data_framework::export::{export_graphml, export_dot, export_patterns_csv};
|
||||
|
||||
// Export graph to GraphML (for Gephi)
|
||||
export_graphml(&engine, "graph.graphml", None)?;
|
||||
|
||||
// Export graph to DOT (for Graphviz)
|
||||
export_dot(&engine, "graph.dot", None)?;
|
||||
|
||||
// Export patterns to CSV
|
||||
export_patterns_csv(&patterns, "patterns.csv")?;
|
||||
```
|
||||
|
||||
### Filtered Export
|
||||
|
||||
```rust
|
||||
use ruvector_data_framework::export::ExportFilter;
|
||||
use ruvector_data_framework::ruvector_native::Domain;
|
||||
|
||||
// Export only climate domain
|
||||
let filter = ExportFilter::domain(Domain::Climate);
|
||||
export_graphml(&engine, "climate.graphml", Some(filter))?;
|
||||
|
||||
// Export only strong edges
|
||||
let filter = ExportFilter::min_weight(0.8);
|
||||
export_graphml(&engine, "strong_edges.graphml", Some(filter))?;
|
||||
|
||||
// Combine filters
|
||||
let filter = ExportFilter::domain(Domain::Finance)
|
||||
.and(ExportFilter::min_weight(0.7));
|
||||
export_graphml(&engine, "finance_strong.graphml", Some(filter))?;
|
||||
```
|
||||
|
||||
### Export Everything
|
||||
|
||||
```rust
|
||||
use ruvector_data_framework::export::export_all;
|
||||
|
||||
// Export all data to a directory
|
||||
export_all(&engine, &patterns, &coherence_history, "output")?;
|
||||
```
|
||||
|
||||
## Export Functions
|
||||
|
||||
### Graph Export
|
||||
|
||||
#### `export_graphml(engine, path, filter)`
|
||||
Exports graph in GraphML format (XML-based).
|
||||
|
||||
**Node Attributes:**
|
||||
- `domain`: Climate, Finance, Research, CrossDomain
|
||||
- `external_id`: External identifier
|
||||
- `weight`: Node weight
|
||||
- `timestamp`: When node was created
|
||||
|
||||
**Edge Attributes:**
|
||||
- `weight`: Edge weight (similarity/correlation)
|
||||
- `type`: EdgeType (similarity, correlation, citation, causal, cross_domain)
|
||||
- `timestamp`: When edge was created
|
||||
- `cross_domain`: Boolean indicating cross-domain connection
|
||||
|
||||
#### `export_dot(engine, path, filter)`
|
||||
Exports graph in DOT format (text-based).
|
||||
|
||||
**Features:**
|
||||
- Domain-specific colors
|
||||
- Layout hints for Graphviz
|
||||
- Edge weights as labels
|
||||
- Node shapes by domain
|
||||
|
||||
### Pattern Export
|
||||
|
||||
#### `export_patterns_csv(patterns, path)`
|
||||
Exports detected patterns to CSV.
|
||||
|
||||
**Columns:**
|
||||
- `id`: Pattern identifier
|
||||
- `pattern_type`: Type (consolidation, coherence_break, etc.)
|
||||
- `confidence`: Confidence score (0-1)
|
||||
- `p_value`: Statistical significance
|
||||
- `effect_size`: Effect size (Cohen's d)
|
||||
- `ci_lower`, `ci_upper`: 95% confidence interval
|
||||
- `is_significant`: Boolean
|
||||
- `detected_at`: ISO 8601 timestamp
|
||||
- `description`: Human-readable description
|
||||
- `affected_nodes_count`: Number of affected nodes
|
||||
- `evidence_count`: Number of evidence items
|
||||
|
||||
#### `export_patterns_with_evidence_csv(patterns, path)`
|
||||
Exports patterns with detailed evidence.
|
||||
|
||||
**Columns:**
|
||||
- `pattern_id`: Pattern identifier
|
||||
- `pattern_type`: Type of pattern
|
||||
- `evidence_type`: Type of evidence
|
||||
- `evidence_value`: Numeric value
|
||||
- `evidence_description`: Description
|
||||
- `detected_at`: ISO 8601 timestamp
|
||||
|
||||
### Coherence Export
|
||||
|
||||
#### `export_coherence_csv(history, path)`
|
||||
Exports coherence history over time.
|
||||
|
||||
**Columns:**
|
||||
- `timestamp`: ISO 8601 timestamp
|
||||
- `mincut_value`: Minimum cut value (coherence measure)
|
||||
- `node_count`: Number of nodes
|
||||
- `edge_count`: Number of edges
|
||||
- `avg_edge_weight`: Average edge weight
|
||||
- `partition_size_a`, `partition_size_b`: Partition sizes
|
||||
- `boundary_nodes_count`: Nodes on cut boundary
|
||||
|
||||
## Visualization Workflows
|
||||
|
||||
### Gephi (Network Visualization)
|
||||
|
||||
1. **Import GraphML:**
|
||||
```
|
||||
File → Open → graph.graphml
|
||||
```
|
||||
|
||||
2. **Apply Layout:**
|
||||
- Force Atlas 2 (recommended)
|
||||
- Fruchterman Reingold
|
||||
- OpenORD (for large graphs)
|
||||
|
||||
3. **Color by Domain:**
|
||||
- Appearance → Nodes → Color → Partition
|
||||
- Select "domain" attribute
|
||||
- Apply
|
||||
|
||||
4. **Size by Centrality:**
|
||||
- Statistics → Network Diameter
|
||||
- Appearance → Nodes → Size → Ranking
|
||||
- Select betweenness centrality
|
||||
|
||||
### Graphviz (Publication Graphics)
|
||||
|
||||
```bash
|
||||
# Force-directed layout
|
||||
neato -Tpng graph.dot -o graph.png
|
||||
|
||||
# Hierarchical layout
|
||||
dot -Tsvg graph.dot -o graph.svg
|
||||
|
||||
# Spring-electric layout (large graphs)
|
||||
sfdp -Tpdf graph.dot -o graph.pdf
|
||||
|
||||
# Radial layout
|
||||
twopi -Tsvg graph.dot -o graph.svg
|
||||
```
|
||||
|
||||
### Python Analysis
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
import networkx as nx
|
||||
|
||||
# Load patterns
|
||||
patterns = pd.read_csv('patterns.csv')
|
||||
significant = patterns[patterns['is_significant'] == True]
|
||||
|
||||
# Load coherence
|
||||
coherence = pd.read_csv('coherence.csv')
|
||||
coherence['timestamp'] = pd.to_datetime(coherence['timestamp'])
|
||||
|
||||
# Plot coherence over time
|
||||
import matplotlib.pyplot as plt
|
||||
plt.plot(coherence['timestamp'], coherence['mincut_value'])
|
||||
plt.xlabel('Time')
|
||||
plt.ylabel('Min-Cut Value')
|
||||
plt.title('Network Coherence Over Time')
|
||||
plt.show()
|
||||
|
||||
# Load GraphML
|
||||
G = nx.read_graphml('graph.graphml')
|
||||
print(f"Nodes: {G.number_of_nodes()}")
|
||||
print(f"Edges: {G.number_of_edges()}")
|
||||
```
|
||||
|
||||
### R Analysis
|
||||
|
||||
```r
|
||||
library(tidyverse)
|
||||
library(igraph)
|
||||
|
||||
# Load patterns
|
||||
patterns <- read_csv('patterns.csv')
|
||||
significant <- filter(patterns, is_significant == TRUE)
|
||||
|
||||
# Load coherence
|
||||
coherence <- read_csv('coherence.csv') %>%
|
||||
mutate(timestamp = as.POSIXct(timestamp))
|
||||
|
||||
# Plot
|
||||
ggplot(coherence, aes(x=timestamp, y=mincut_value)) +
|
||||
geom_line() +
|
||||
labs(title="Network Coherence Over Time",
|
||||
x="Time", y="Min-Cut Value")
|
||||
|
||||
# Load graph
|
||||
g <- read_graph('graph.graphml', format='graphml')
|
||||
summary(g)
|
||||
```
|
||||
|
||||
## Export Filter Options
|
||||
|
||||
### Domain Filter
|
||||
```rust
|
||||
ExportFilter::domain(Domain::Climate)
|
||||
```
|
||||
|
||||
### Weight Filter
|
||||
```rust
|
||||
ExportFilter::min_weight(0.7)
|
||||
```
|
||||
|
||||
### Time Range Filter
|
||||
```rust
|
||||
use chrono::Utc;
|
||||
|
||||
let start = Utc::now() - chrono::Duration::days(30);
|
||||
let end = Utc::now();
|
||||
ExportFilter::time_range(start, end)
|
||||
```
|
||||
|
||||
### Combined Filters
|
||||
```rust
|
||||
ExportFilter::domain(Domain::Finance)
|
||||
.and(ExportFilter::min_weight(0.8))
|
||||
.and(ExportFilter::time_range(start, end))
|
||||
```
|
||||
|
||||
## Example Output
|
||||
|
||||
Running the export demo:
|
||||
|
||||
```bash
|
||||
cargo run --example export_demo --features parallel
|
||||
```
|
||||
|
||||
Creates:
|
||||
```
|
||||
discovery_exports/
|
||||
├── graph.graphml # Full graph (Gephi)
|
||||
├── graph.dot # Full graph (Graphviz)
|
||||
├── climate_only.graphml # Climate domain only
|
||||
└── full_export/
|
||||
├── README.md # Documentation
|
||||
├── graph.graphml # Full graph
|
||||
├── graph.dot # Full graph
|
||||
├── patterns.csv # Detected patterns
|
||||
├── patterns_evidence.csv # Pattern evidence
|
||||
└── coherence.csv # Coherence history
|
||||
```
|
||||
|
||||
## Advanced Usage
|
||||
|
||||
### Custom Export Pipeline
|
||||
|
||||
```rust
|
||||
use ruvector_data_framework::export::*;
|
||||
|
||||
// 1. Export full graph
|
||||
export_graphml(&engine, "full_graph.graphml", None)?;
|
||||
|
||||
// 2. Export each domain separately
|
||||
for domain in [Domain::Climate, Domain::Finance, Domain::Research] {
|
||||
let filter = ExportFilter::domain(domain);
|
||||
let filename = format!("{:?}_graph.graphml", domain);
|
||||
export_graphml(&engine, &filename, Some(filter))?;
|
||||
}
|
||||
|
||||
// 3. Export significant patterns only
|
||||
let significant_patterns: Vec<_> = patterns.iter()
|
||||
.filter(|p| p.is_significant)
|
||||
.cloned()
|
||||
.collect();
|
||||
export_patterns_csv(&significant_patterns, "significant_patterns.csv")?;
|
||||
|
||||
// 4. Export time-windowed coherence
|
||||
let recent_history: Vec<_> = coherence_history.iter()
|
||||
.rev()
|
||||
.take(100)
|
||||
.cloned()
|
||||
.collect();
|
||||
export_coherence_csv(&recent_history, "recent_coherence.csv")?;
|
||||
```
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
- **Large Graphs**: Use filters to reduce export size
|
||||
- **GraphML**: XML parsing can be slow for >100K nodes
|
||||
- **DOT**: Graphviz rendering slows down at >10K nodes
|
||||
- **CSV**: Very efficient for patterns and coherence data
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
The export module currently provides a foundation. To access the full graph data (nodes and edges), the `OptimizedDiscoveryEngine` will need to expose:
|
||||
|
||||
```rust
|
||||
pub fn nodes(&self) -> &HashMap<u32, GraphNode>
|
||||
pub fn edges(&self) -> &[GraphEdge]
|
||||
pub fn get_node(&self, id: u32) -> Option<&GraphNode>
|
||||
```
|
||||
|
||||
Once these methods are added, the GraphML and DOT exports will include actual node and edge data.
|
||||
|
||||
## Related Examples
|
||||
|
||||
- `examples/export_demo.rs` - Basic export demonstration
|
||||
- `examples/cross_domain_discovery.rs` - Cross-domain pattern detection
|
||||
- `examples/discovery_hunter.rs` - Advanced pattern hunting
|
||||
- `examples/optimized_benchmark.rs` - Performance testing
|
||||
|
||||
## Support
|
||||
|
||||
For issues or questions:
|
||||
- GitHub: https://github.com/ruvnet/ruvector
|
||||
- Documentation: See framework README
|
||||
Reference in New Issue
Block a user