Files

ruv d803bfe2b1 Squashed 'vendor/ruvector/' content from commit b64c2172

git-subtree-dir: vendor/ruvector
git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900

2026-02-28 14:39:40 -05:00

8.9 KiB

Raw Blame History

RuVector Discovery Framework - Export Guide

Overview

The export module provides comprehensive export functionality for RuVector Discovery Framework results. Export graphs, patterns, and coherence data in multiple industry-standard formats.

Supported Formats

1. GraphML (`.graphml`)

Use Case: Import into Gephi, Cytoscape, yEd
Features: Full graph structure with node/edge attributes
Best For: Visual network analysis, community detection

2. DOT (`.dot`)

Use Case: Render with Graphviz (dot, neato, fdp, sfdp)
Features: Hierarchical or force-directed layouts
Best For: Publication-quality graph visualizations

3. CSV (`.csv`)

Use Case: Analysis in Excel, R, Python, Julia
Features: Tabular data with full pattern/coherence details
Best For: Statistical analysis, time-series analysis

Quick Start

Basic Export

use ruvector_data_framework::export::{export_graphml, export_dot, export_patterns_csv};

// Export graph to GraphML (for Gephi)
export_graphml(&engine, "graph.graphml", None)?;

// Export graph to DOT (for Graphviz)
export_dot(&engine, "graph.dot", None)?;

// Export patterns to CSV
export_patterns_csv(&patterns, "patterns.csv")?;

Filtered Export

use ruvector_data_framework::export::ExportFilter;
use ruvector_data_framework::ruvector_native::Domain;

// Export only climate domain
let filter = ExportFilter::domain(Domain::Climate);
export_graphml(&engine, "climate.graphml", Some(filter))?;

// Export only strong edges
let filter = ExportFilter::min_weight(0.8);
export_graphml(&engine, "strong_edges.graphml", Some(filter))?;

// Combine filters
let filter = ExportFilter::domain(Domain::Finance)
    .and(ExportFilter::min_weight(0.7));
export_graphml(&engine, "finance_strong.graphml", Some(filter))?;

Export Everything

use ruvector_data_framework::export::export_all;

// Export all data to a directory
export_all(&engine, &patterns, &coherence_history, "output")?;

Export Functions

Graph Export

`export_graphml(engine, path, filter)`

Exports graph in GraphML format (XML-based).

Node Attributes:

domain: Climate, Finance, Research, CrossDomain
external_id: External identifier
weight: Node weight
timestamp: When node was created

Edge Attributes:

weight: Edge weight (similarity/correlation)
type: EdgeType (similarity, correlation, citation, causal, cross_domain)
timestamp: When edge was created
cross_domain: Boolean indicating cross-domain connection

`export_dot(engine, path, filter)`

Exports graph in DOT format (text-based).

Features:

Domain-specific colors
Layout hints for Graphviz
Edge weights as labels
Node shapes by domain

Pattern Export

`export_patterns_csv(patterns, path)`

Exports detected patterns to CSV.

Columns:

id: Pattern identifier
pattern_type: Type (consolidation, coherence_break, etc.)
confidence: Confidence score (0-1)
p_value: Statistical significance
effect_size: Effect size (Cohen's d)
ci_lower, ci_upper: 95% confidence interval
is_significant: Boolean
detected_at: ISO 8601 timestamp
description: Human-readable description
affected_nodes_count: Number of affected nodes
evidence_count: Number of evidence items

`export_patterns_with_evidence_csv(patterns, path)`

Exports patterns with detailed evidence.

Columns:

pattern_id: Pattern identifier
pattern_type: Type of pattern
evidence_type: Type of evidence
evidence_value: Numeric value
evidence_description: Description
detected_at: ISO 8601 timestamp

Coherence Export

`export_coherence_csv(history, path)`

Exports coherence history over time.

Columns:

timestamp: ISO 8601 timestamp
mincut_value: Minimum cut value (coherence measure)
node_count: Number of nodes
edge_count: Number of edges
avg_edge_weight: Average edge weight
partition_size_a, partition_size_b: Partition sizes
boundary_nodes_count: Nodes on cut boundary

Visualization Workflows

Gephi (Network Visualization)

Import GraphML:
```
File → Open → graph.graphml
```
Apply Layout:
- Force Atlas 2 (recommended)
- Fruchterman Reingold
- OpenORD (for large graphs)
Color by Domain:
- Appearance → Nodes → Color → Partition
- Select "domain" attribute
- Apply
Size by Centrality:
- Statistics → Network Diameter
- Appearance → Nodes → Size → Ranking
- Select betweenness centrality

Graphviz (Publication Graphics)

# Force-directed layout
neato -Tpng graph.dot -o graph.png

# Hierarchical layout
dot -Tsvg graph.dot -o graph.svg

# Spring-electric layout (large graphs)
sfdp -Tpdf graph.dot -o graph.pdf

# Radial layout
twopi -Tsvg graph.dot -o graph.svg

Python Analysis

import pandas as pd
import networkx as nx

# Load patterns
patterns = pd.read_csv('patterns.csv')
significant = patterns[patterns['is_significant'] == True]

# Load coherence
coherence = pd.read_csv('coherence.csv')
coherence['timestamp'] = pd.to_datetime(coherence['timestamp'])

# Plot coherence over time
import matplotlib.pyplot as plt
plt.plot(coherence['timestamp'], coherence['mincut_value'])
plt.xlabel('Time')
plt.ylabel('Min-Cut Value')
plt.title('Network Coherence Over Time')
plt.show()

# Load GraphML
G = nx.read_graphml('graph.graphml')
print(f"Nodes: {G.number_of_nodes()}")
print(f"Edges: {G.number_of_edges()}")

R Analysis

library(tidyverse)
library(igraph)

# Load patterns
patterns <- read_csv('patterns.csv')
significant <- filter(patterns, is_significant == TRUE)

# Load coherence
coherence <- read_csv('coherence.csv') %>%
  mutate(timestamp = as.POSIXct(timestamp))

# Plot
ggplot(coherence, aes(x=timestamp, y=mincut_value)) +
  geom_line() +
  labs(title="Network Coherence Over Time",
       x="Time", y="Min-Cut Value")

# Load graph
g <- read_graph('graph.graphml', format='graphml')
summary(g)

Domain Filter

ExportFilter::domain(Domain::Climate)

Weight Filter

ExportFilter::min_weight(0.7)

Time Range Filter

use chrono::Utc;

let start = Utc::now() - chrono::Duration::days(30);
let end = Utc::now();
ExportFilter::time_range(start, end)

Combined Filters

ExportFilter::domain(Domain::Finance)
    .and(ExportFilter::min_weight(0.8))
    .and(ExportFilter::time_range(start, end))

Example Output

Running the export demo:

cargo run --example export_demo --features parallel

Creates:

discovery_exports/
├── graph.graphml          # Full graph (Gephi)
├── graph.dot              # Full graph (Graphviz)
├── climate_only.graphml   # Climate domain only
└── full_export/
    ├── README.md          # Documentation
    ├── graph.graphml      # Full graph
    ├── graph.dot          # Full graph
    ├── patterns.csv       # Detected patterns
    ├── patterns_evidence.csv  # Pattern evidence
    └── coherence.csv      # Coherence history

Advanced Usage

Custom Export Pipeline

use ruvector_data_framework::export::*;

// 1. Export full graph
export_graphml(&engine, "full_graph.graphml", None)?;

// 2. Export each domain separately
for domain in [Domain::Climate, Domain::Finance, Domain::Research] {
    let filter = ExportFilter::domain(domain);
    let filename = format!("{:?}_graph.graphml", domain);
    export_graphml(&engine, &filename, Some(filter))?;
}

// 3. Export significant patterns only
let significant_patterns: Vec<_> = patterns.iter()
    .filter(|p| p.is_significant)
    .cloned()
    .collect();
export_patterns_csv(&significant_patterns, "significant_patterns.csv")?;

// 4. Export time-windowed coherence
let recent_history: Vec<_> = coherence_history.iter()
    .rev()
    .take(100)
    .cloned()
    .collect();
export_coherence_csv(&recent_history, "recent_coherence.csv")?;

Performance Considerations

Large Graphs: Use filters to reduce export size
GraphML: XML parsing can be slow for >100K nodes
DOT: Graphviz rendering slows down at >10K nodes
CSV: Very efficient for patterns and coherence data

Future Enhancements

The export module currently provides a foundation. To access the full graph data (nodes and edges), the OptimizedDiscoveryEngine will need to expose:

pub fn nodes(&self) -> &HashMap<u32, GraphNode>
pub fn edges(&self) -> &[GraphEdge]
pub fn get_node(&self, id: u32) -> Option<&GraphNode>

Once these methods are added, the GraphML and DOT exports will include actual node and edge data.

examples/export_demo.rs - Basic export demonstration
examples/cross_domain_discovery.rs - Cross-domain pattern detection
examples/discovery_hunter.rs - Advanced pattern hunting
examples/optimized_benchmark.rs - Performance testing

Support

For issues or questions:

GitHub: https://github.com/ruvnet/ruvector
Documentation: See framework README

8.9 KiB Raw Blame History

RuVector Discovery Framework - Export Guide

Overview

Supported Formats

1. GraphML (.graphml)

2. DOT (.dot)

3. CSV (.csv)

Quick Start

Basic Export

Filtered Export

Export Everything

Export Functions

Graph Export

export_graphml(engine, path, filter)

export_dot(engine, path, filter)

Pattern Export

export_patterns_csv(patterns, path)

export_patterns_with_evidence_csv(patterns, path)

Coherence Export

export_coherence_csv(history, path)

Visualization Workflows

Gephi (Network Visualization)

Graphviz (Publication Graphics)

Python Analysis

R Analysis

Export Filter Options

Domain Filter

Weight Filter

Time Range Filter

Combined Filters

Example Output

Advanced Usage

Custom Export Pipeline

Performance Considerations

Future Enhancements

Related Examples

Support

8.9 KiB

Raw Blame History

1. GraphML (`.graphml`)

2. DOT (`.dot`)

3. CSV (`.csv`)

`export_graphml(engine, path, filter)`

`export_dot(engine, path, filter)`

`export_patterns_csv(patterns, path)`

`export_patterns_with_evidence_csv(patterns, path)`

`export_coherence_csv(history, path)`