Files
wifi-densepose/npm/packages/ruvector-extensions/docs/GRAPH_EXPORT_GUIDE.md
ruv d803bfe2b1 Squashed 'vendor/ruvector/' content from commit b64c2172
git-subtree-dir: vendor/ruvector
git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
2026-02-28 14:39:40 -05:00

14 KiB

Graph Export Module - Complete Guide

Overview

The Graph Export module provides powerful tools for exporting vector similarity graphs to multiple formats for visualization, analysis, and graph database integration.

Supported Formats

Format Description Use Cases
GraphML XML-based graph format Gephi, yEd, NetworkX, igraph, Cytoscape
GEXF Graph Exchange XML Format Gephi visualization (recommended)
Neo4j Cypher queries Graph database import and queries
D3.js JSON for web visualization Interactive web-based force graphs
NetworkX Python graph library format Network analysis in Python

Quick Examples

1. Basic Export to All Formats

import { buildGraphFromEntries, exportGraph } from 'ruvector-extensions';

const entries = [
  { id: 'doc1', vector: [0.1, 0.2, 0.3], metadata: { title: 'AI' } },
  { id: 'doc2', vector: [0.15, 0.25, 0.35], metadata: { title: 'ML' } },
  { id: 'doc3', vector: [0.8, 0.1, 0.05], metadata: { title: 'History' } }
];

const graph = buildGraphFromEntries(entries, {
  maxNeighbors: 5,
  threshold: 0.7
});

// Export to different formats
const graphml = exportGraph(graph, 'graphml');
const gexf = exportGraph(graph, 'gexf');
const neo4j = exportGraph(graph, 'neo4j');
const d3 = exportGraph(graph, 'd3');
const networkx = exportGraph(graph, 'networkx');

2. GraphML Export for Gephi

import { exportToGraphML } from 'ruvector-extensions';
import { writeFile } from 'fs/promises';

const graphml = exportToGraphML(graph, {
  graphName: 'Document Similarity Network',
  includeMetadata: true,
  includeVectors: false
});

await writeFile('network.graphml', graphml);

Import into Gephi:

  1. Open Gephi
  2. File → Open → Select network.graphml
  3. Choose "Undirected" or "Directed" graph
  4. Apply layout (ForceAtlas2 recommended)
  5. Analyze with built-in metrics

3. GEXF Export for Advanced Gephi Features

import { exportToGEXF } from 'ruvector-extensions';

const gexf = exportToGEXF(graph, {
  graphName: 'Knowledge Graph',
  graphDescription: 'Vector embeddings similarity network',
  includeMetadata: true
});

await writeFile('network.gexf', gexf);

Gephi Workflow:

  • Import the GEXF file
  • Use Statistics panel for centrality measures
  • Apply community detection (Modularity)
  • Color nodes by cluster
  • Size nodes by degree centrality
  • Export as PNG/SVG for publications

4. Neo4j Graph Database

import { exportToNeo4j } from 'ruvector-extensions';

const cypher = exportToNeo4j(graph, {
  includeVectors: true,
  includeMetadata: true
});

await writeFile('import.cypher', cypher);

Import into Neo4j:

# Option 1: Neo4j Browser
# Copy and paste the Cypher queries

# Option 2: cypher-shell
cypher-shell -f import.cypher

# Option 3: Node.js driver
import neo4j from 'neo4j-driver';

const driver = neo4j.driver('bolt://localhost:7687');
const session = driver.session();

await session.run(cypher);

Query Examples:

// Find most similar vectors
MATCH (v:Vector)-[r:SIMILAR_TO]->(other:Vector)
WHERE v.id = 'doc1'
RETURN other.label, r.weight
ORDER BY r.weight DESC
LIMIT 5;

// Find communities
CALL gds.louvain.stream('myGraph')
YIELD nodeId, communityId
RETURN gds.util.asNode(nodeId).label AS node, communityId;

// Path finding
MATCH path = shortestPath(
  (a:Vector {id: 'doc1'})-[*]-(b:Vector {id: 'doc10'})
)
RETURN path;

5. D3.js Web Visualization

import { exportToD3 } from 'ruvector-extensions';

const d3Data = exportToD3(graph, {
  includeMetadata: true
});

// Save for web app
await writeFile('public/graph-data.json', JSON.stringify(d3Data));

HTML Visualization:

<!DOCTYPE html>
<html>
<head>
  <script src="https://d3js.org/d3.v7.min.js"></script>
  <style>
    .links line { stroke: #999; stroke-opacity: 0.6; }
    .nodes circle { stroke: #fff; stroke-width: 1.5px; }
  </style>
</head>
<body>
  <svg width="960" height="600"></svg>
  <script>
    d3.json('graph-data.json').then(data => {
      const svg = d3.select("svg");
      const width = +svg.attr("width");
      const height = +svg.attr("height");

      const simulation = d3.forceSimulation(data.nodes)
        .force("link", d3.forceLink(data.links).id(d => d.id))
        .force("charge", d3.forceManyBody().strength(-300))
        .force("center", d3.forceCenter(width / 2, height / 2));

      const link = svg.append("g")
        .selectAll("line")
        .data(data.links)
        .enter().append("line")
        .attr("stroke-width", d => Math.sqrt(d.value));

      const node = svg.append("g")
        .selectAll("circle")
        .data(data.nodes)
        .enter().append("circle")
        .attr("r", 5)
        .call(d3.drag()
          .on("start", dragstarted)
          .on("drag", dragged)
          .on("end", dragended));

      node.append("title")
        .text(d => d.name);

      simulation.on("tick", () => {
        link
          .attr("x1", d => d.source.x)
          .attr("y1", d => d.source.y)
          .attr("x2", d => d.target.x)
          .attr("y2", d => d.target.y);

        node
          .attr("cx", d => d.x)
          .attr("cy", d => d.y);
      });

      function dragstarted(event) {
        if (!event.active) simulation.alphaTarget(0.3).restart();
        event.subject.fx = event.subject.x;
        event.subject.fy = event.subject.y;
      }

      function dragged(event) {
        event.subject.fx = event.x;
        event.subject.fy = event.y;
      }

      function dragended(event) {
        if (!event.active) simulation.alphaTarget(0);
        event.subject.fx = null;
        event.subject.fy = null;
      }
    });
  </script>
</body>
</html>

6. NetworkX Python Analysis

import { exportToNetworkX } from 'ruvector-extensions';

const nxData = exportToNetworkX(graph);
await writeFile('graph.json', JSON.stringify(nxData, null, 2));

Python Analysis:

import json
import networkx as nx
import matplotlib.pyplot as plt
import numpy as np

# Load graph
with open('graph.json', 'r') as f:
    data = json.load(f)

G = nx.node_link_graph(data)

print(f"Nodes: {G.number_of_nodes()}")
print(f"Edges: {G.number_of_edges()}")
print(f"Density: {nx.density(G):.4f}")

# Centrality analysis
degree_cent = nx.degree_centrality(G)
between_cent = nx.betweenness_centrality(G)
close_cent = nx.closeness_centrality(G)
eigen_cent = nx.eigenvector_centrality(G)

# Community detection
communities = nx.community.louvain_communities(G)
print(f"\nFound {len(communities)} communities")

# Visualize
plt.figure(figsize=(12, 8))
pos = nx.spring_layout(G, k=0.5, iterations=50)

# Color by community
color_map = []
for node in G:
    for i, comm in enumerate(communities):
        if node in comm:
            color_map.append(i)
            break

nx.draw(G, pos,
        node_color=color_map,
        node_size=[v * 1000 for v in degree_cent.values()],
        cmap=plt.cm.rainbow,
        with_labels=True,
        font_size=8,
        edge_color='gray',
        alpha=0.7)

plt.title('Network Graph with Communities')
plt.savefig('network.png', dpi=300, bbox_inches='tight')

# Export metrics
metrics = {
    'node': list(G.nodes()),
    'degree_centrality': [degree_cent[n] for n in G.nodes()],
    'betweenness_centrality': [between_cent[n] for n in G.nodes()],
    'closeness_centrality': [close_cent[n] for n in G.nodes()],
    'eigenvector_centrality': [eigen_cent[n] for n in G.nodes()]
}

import pandas as pd
df = pd.DataFrame(metrics)
df.to_csv('network_metrics.csv', index=False)
print("\nMetrics exported to network_metrics.csv")

Streaming Exports for Large Graphs

When dealing with millions of nodes, use streaming exporters:

GraphML Streaming

import { GraphMLStreamExporter } from 'ruvector-extensions';
import { createWriteStream } from 'fs';

const stream = createWriteStream('large-graph.graphml');
const exporter = new GraphMLStreamExporter(stream, {
  graphName: 'Large Network'
});

await exporter.start();

// Add nodes in batches
for (const batch of nodeBatches) {
  for (const node of batch) {
    await exporter.addNode(node);
  }
  console.log(`Processed ${batch.length} nodes`);
}

// Add edges
for (const batch of edgeBatches) {
  for (const edge of batch) {
    await exporter.addEdge(edge);
  }
}

await exporter.end();
stream.close();

D3.js Streaming

import { D3StreamExporter } from 'ruvector-extensions';

const stream = createWriteStream('large-d3-graph.json');
const exporter = new D3StreamExporter(stream);

await exporter.start();

// Process in chunks
for await (const node of nodeIterator) {
  await exporter.addNode(node);
}

for await (const edge of edgeIterator) {
  await exporter.addEdge(edge);
}

await exporter.end();

Configuration Options

Export Options

interface ExportOptions {
  includeVectors?: boolean;      // Include embeddings (default: false)
  includeMetadata?: boolean;     // Include node attributes (default: true)
  maxNeighbors?: number;         // Max edges per node (default: 10)
  threshold?: number;            // Min similarity (default: 0.0)
  graphName?: string;            // Graph title
  graphDescription?: string;     // Graph description
  streaming?: boolean;           // Enable streaming mode
  attributeMapping?: Record<string, string>; // Custom attribute names
}

Graph Building Options

const graph = buildGraphFromEntries(entries, {
  maxNeighbors: 5,        // Create at most 5 edges per node
  threshold: 0.7,         // Only connect if similarity > 0.7
  includeVectors: false,  // Don't export raw embeddings
  includeMetadata: true   // Export all metadata fields
});

Performance Tips

  1. Threshold Selection: Higher thresholds = fewer edges = smaller files
  2. maxNeighbors: Limit connections per node for cleaner graphs
  3. Streaming: Use for graphs > 100K nodes
  4. Compression: Compress output files (gzip recommended)
  5. Batch Processing: Process nodes/edges in batches

Use Cases

1. Document Similarity Network

const docs = await embedDocuments(documents);
const graph = buildGraphFromEntries(docs, {
  threshold: 0.8,
  maxNeighbors: 5
});

const gexf = exportToGEXF(graph);
// Visualize in Gephi to find document clusters

2. Knowledge Graph

const concepts = await embedConcepts(knowledgeBase);
const graph = buildGraphFromEntries(concepts, {
  threshold: 0.6,
  includeMetadata: true
});

const cypher = exportToNeo4j(graph);
// Import into Neo4j for graph queries

3. Semantic Search Visualization

const results = db.search({ vector: queryVector, k: 50 });
const graph = buildGraphFromEntries(results, {
  maxNeighbors: 3,
  threshold: 0.5
});

const d3Data = exportToD3(graph);
// Show interactive graph in web app

4. Research Network Analysis

const papers = await embedPapers(corpus);
const graph = buildGraphFromEntries(papers, {
  threshold: 0.75,
  includeMetadata: true
});

const nxData = exportToNetworkX(graph);
// Analyze citation patterns, communities, and influence in Python

Troubleshooting

Large Graphs Won't Export

Problem: Out of memory errors with large graphs.

Solution: Use streaming exporters:

const exporter = new GraphMLStreamExporter(stream);
await exporter.start();
// Process in batches
await exporter.end();

Neo4j Import Fails

Problem: Cypher queries fail or timeout.

Solution: Break into batches:

// Export in batches of 1000 nodes
const batches = chunkArray(graph.nodes, 1000);
for (const batch of batches) {
  const batchGraph = { nodes: batch, edges: filterEdges(batch) };
  const cypher = exportToNeo4j(batchGraph);
  await neo4jSession.run(cypher);
}

Gephi Import Issues

Problem: Attributes not showing correctly.

Solution: Ensure metadata is included:

const gexf = exportToGEXF(graph, {
  includeMetadata: true,  // ✓ Include all attributes
  graphName: 'My Network'
});

D3.js Performance

Problem: Web visualization lags with many nodes.

Solution: Limit nodes or use clustering:

// Filter to top nodes only
const topNodes = graph.nodes.slice(0, 100);
const filteredGraph = {
  nodes: topNodes,
  edges: graph.edges.filter(e =>
    topNodes.some(n => n.id === e.source || n.id === e.target)
  )
};

const d3Data = exportToD3(filteredGraph);

Best Practices

  1. Choose the Right Format:

    • GraphML: General purpose, wide tool support
    • GEXF: Best for Gephi visualization
    • Neo4j: For graph database queries
    • D3.js: Interactive web visualization
    • NetworkX: Python analysis
  2. Optimize Graph Size:

    • Use threshold to reduce edges
    • Limit maxNeighbors
    • Filter out low-quality connections
  3. Preserve Metadata:

    • Always include relevant metadata
    • Use descriptive labels
    • Add timestamps for temporal analysis
  4. Test with Small Samples:

    • Export a subset first
    • Verify format compatibility
    • Check visualization quality
  5. Document Your Process:

    • Record threshold and parameters
    • Save graph statistics
    • Version your exports

Additional Resources

Support

For issues and questions: