git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
11 KiB
Physics, Seismic, and Ocean Data Clients
Overview
This module provides async API clients for physics, seismic, and ocean data sources, enabling cross-disciplinary discoveries through RuVector's semantic vector search and graph coherence analysis.
New Domains
Three new domains have been added to Domain enum in ruvector_native.rs:
Domain::Physics- Particle physics, materials scienceDomain::Seismic- Earthquake data, seismic activityDomain::Ocean- Ocean temperature, salinity, depth profiles
Clients
1. UsgsEarthquakeClient
USGS Earthquake Hazards Program - Real-time and historical earthquake data worldwide.
Features
- No API key required (public data)
- Global earthquake coverage
- Magnitude, location, depth, tsunami warnings
- ~5 requests/second rate limit
Methods
use ruvector_data_framework::UsgsEarthquakeClient;
let client = UsgsEarthquakeClient::new()?;
// Get recent earthquakes above minimum magnitude
let recent = client.get_recent(4.5, 7).await?; // Mag 4.5+, last 7 days
// Search by geographic region
let la_quakes = client.search_by_region(
34.05, // latitude
-118.25, // longitude
200.0, // radius in km
30 // days back
).await?;
// Get significant earthquakes only
let significant = client.get_significant(30).await?;
// Filter by magnitude range
let moderate = client.get_by_magnitude_range(4.0, 6.0, 7).await?;
SemanticVector Metadata
Each earthquake is converted to a SemanticVector with:
metadata: {
"magnitude": "5.4",
"place": "Southern California",
"latitude": "34.05",
"longitude": "-118.25",
"depth_km": "10.5",
"tsunami": "0",
"significance": "450",
"status": "reviewed",
"alert": "green",
"source": "usgs"
}
2. CernOpenDataClient
CERN Open Data Portal - LHC experiment data, particle physics datasets.
Features
- No API key required
- CMS, ATLAS, LHCb, ALICE experiments
- Collision events, particle physics data
- Educational and research datasets
Methods
use ruvector_data_framework::CernOpenDataClient;
let client = CernOpenDataClient::new()?;
// Search datasets by keywords
let higgs = client.search_datasets("Higgs").await?;
let top_quark = client.search_datasets("top quark").await?;
// Get specific dataset by record ID
let dataset = client.get_dataset(5500).await?;
// Search by experiment
let cms_data = client.search_by_experiment("CMS").await?;
let atlas_data = client.search_by_experiment("ATLAS").await?;
Available Experiments
"CMS"- Compact Muon Solenoid"ATLAS"- A Toroidal LHC ApparatuS"LHCb"- Large Hadron Collider beauty"ALICE"- A Large Ion Collider Experiment
SemanticVector Metadata
metadata: {
"recid": "12345",
"title": "CMS 2011 Higgs to two photons dataset",
"experiment": "CMS",
"collision_energy": "7TeV",
"collision_type": "pp",
"data_type": "Dataset",
"source": "cern"
}
3. ArgoClient
Argo Float Ocean Data - Global ocean temperature, salinity, pressure profiles.
Features
- Global ocean coverage (4000+ floats)
- Temperature and salinity profiles
- Depth measurements (0-2000m typical)
- Free public data
Methods
use ruvector_data_framework::ArgoClient;
let client = ArgoClient::new()?;
// Get recent profiles (placeholder - requires Argo GDAC integration)
let recent = client.get_recent_profiles(30).await?;
// Search by region
let atlantic = client.search_by_region(
0.0, // latitude
-30.0, // longitude
500.0 // radius km
).await?;
// Temperature-focused profiles
let temp_data = client.get_temperature_profiles().await?;
// Create sample data for testing
let samples = client.create_sample_profiles(50)?;
Note on Implementation
The current Argo client includes a create_sample_profiles() method for demonstration. For production use, integrate with:
- Argo GDAC (Global Data Assembly Center): https://data-argo.ifremer.fr
- ArgoVis API: https://argovis-api.colorado.edu
- Direct netCDF file parsing
SemanticVector Metadata
metadata: {
"platform_number": "1900001",
"latitude": "35.5",
"longitude": "-45.2",
"temperature": "18.3",
"salinity": "35.1",
"depth_m": "500.0",
"source": "argo"
}
4. MaterialsProjectClient
Materials Project - Computational materials science database (150,000+ materials).
Features
- Crystal structures and properties
- Band gaps, formation energies
- Electronic and mechanical properties
- Requires free API key from https://materialsproject.org
Methods
use ruvector_data_framework::MaterialsProjectClient;
// API key required
let api_key = std::env::var("MATERIALS_PROJECT_API_KEY")?;
let client = MaterialsProjectClient::new(api_key)?;
// Search by chemical formula
let silicon = client.search_materials("Si").await?;
let iron_oxide = client.search_materials("Fe2O3").await?;
let battery = client.search_materials("LiFePO4").await?;
// Get specific material by ID
let mp_149 = client.get_material("mp-149").await?; // Silicon
// Search by property range
let semiconductors = client.search_by_property(
"band_gap",
1.0, // min eV
3.0 // max eV
).await?;
let stable = client.search_by_property(
"formation_energy_per_atom",
-2.0, // min eV/atom
0.0 // max eV/atom
).await?;
Common Properties
"band_gap"- Electronic band gap (eV)"formation_energy_per_atom"- Formation energy (eV/atom)"energy_per_atom"- Total energy per atom"density"- Density (g/cm³)"volume"- Volume per atom
SemanticVector Metadata
metadata: {
"material_id": "mp-149",
"formula": "Si",
"band_gap": "1.14",
"density": "2.33",
"formation_energy": "0.0",
"crystal_system": "cubic",
"elements": "Si",
"source": "materials_project"
}
Geographic Utilities
The GeoUtils helper provides geographic calculations:
use ruvector_data_framework::GeoUtils;
// Calculate distance between two points (Haversine formula)
let distance_km = GeoUtils::distance_km(
40.7128, -74.0060, // NYC
34.0522, -118.2437 // LA
);
// Returns: ~3936 km
// Check if point is within radius
let within = GeoUtils::within_radius(
34.05, -118.25, // Center (LA)
32.72, -117.16, // Point (San Diego)
200.0 // Radius in km
);
// Returns: true
Rate Limiting
All clients implement automatic rate limiting and retry logic:
| Client | Rate Limit | Max Retries | Retry Delay |
|---|---|---|---|
| USGS | 200ms (~5 req/s) | 3 | 1s exponential |
| CERN | 500ms (~2 req/s) | 3 | 1s exponential |
| Argo | 300ms (~3 req/s) | 3 | 1s exponential |
| Materials Project | 1000ms (1 req/s) | 3 | 1s exponential |
Cross-Domain Discovery Examples
1. Earthquake-Climate Correlations
use ruvector_data_framework::{
UsgsEarthquakeClient, NoaaClient,
NativeDiscoveryEngine, NativeEngineConfig
};
let mut engine = NativeDiscoveryEngine::new(NativeEngineConfig::default());
// Add earthquake data
let usgs = UsgsEarthquakeClient::new()?;
let earthquakes = usgs.get_recent(5.0, 30).await?;
for eq in earthquakes {
engine.add_vector(eq);
}
// Add climate data
let noaa = NoaaClient::new(None)?;
let climate = noaa.get_climate_data("GHCND:USW00023174", 30).await?;
for record in climate {
engine.add_vector(record);
}
// Discover patterns
let patterns = engine.detect_patterns();
for pattern in patterns {
if !pattern.cross_domain_links.is_empty() {
println!("Found cross-domain pattern: {}", pattern.description);
}
}
2. Materials for Particle Detectors
use ruvector_data_framework::{
CernOpenDataClient, MaterialsProjectClient
};
let cern = CernOpenDataClient::new()?;
let materials = MaterialsProjectClient::new(api_key)?;
// Get particle physics requirements
let detector_data = cern.search_datasets("detector").await?;
// Find materials with suitable properties
let semiconductors = materials.search_by_property("band_gap", 1.0, 3.0).await?;
// Add to discovery engine to find correlations
let mut engine = NativeDiscoveryEngine::new(config);
for data in detector_data {
engine.add_vector(data);
}
for material in semiconductors {
engine.add_vector(material);
}
let patterns = engine.detect_patterns();
3. Ocean Temperature & Seismic Activity
use ruvector_data_framework::{
ArgoClient, UsgsEarthquakeClient
};
let argo = ArgoClient::new()?;
let usgs = UsgsEarthquakeClient::new()?;
// Get ocean data for a region
let ocean = argo.search_by_region(0.0, -30.0, 1000.0).await?;
// Get earthquakes in same region
let quakes = usgs.search_by_region(0.0, -30.0, 1000.0, 90).await?;
// Discover correlations
let mut engine = NativeDiscoveryEngine::new(config);
for profile in ocean {
engine.add_vector(profile);
}
for eq in quakes {
engine.add_vector(eq);
}
// Look for cross-domain patterns
let patterns = engine.detect_patterns();
for pattern in patterns.iter().filter(|p| {
p.cross_domain_links.iter().any(|l|
(l.source_domain == Domain::Ocean && l.target_domain == Domain::Seismic) ||
(l.source_domain == Domain::Seismic && l.target_domain == Domain::Ocean)
)
}) {
println!("Ocean-Seismic correlation: {}", pattern.description);
}
Running the Example
# Basic example (no API keys required)
cargo run --example physics_discovery
# With Materials Project API key
export MATERIALS_PROJECT_API_KEY="your_key_here"
cargo run --example physics_discovery
Integration with RuVector
All clients convert data to SemanticVector format, enabling:
- Vector Similarity Search - Find similar earthquakes, materials, experiments
- Graph Coherence Analysis - Detect network fragmentation/consolidation
- Cross-Domain Pattern Discovery - Bridge physics, seismic, ocean domains
- Temporal Analysis - Track changes over time
- Spatial Analysis - Geographic clustering and correlation
Testing
# Run all physics client tests
cargo test physics_clients
# Run specific client tests
cargo test usgs_client
cargo test cern_client
cargo test argo_client
cargo test materials_project_client
# Run geographic utilities tests
cargo test geo_utils
API Documentation
USGS Earthquake API
- Docs: https://earthquake.usgs.gov/fdsnws/event/1/
- No registration required
- Global coverage
- Real-time updates
CERN Open Data Portal
- Portal: https://opendata.cern.ch
- API: https://opendata.cern.ch/docs/api
- No registration required
- Datasets from LHC experiments
Argo Data
- GDAC: https://data-argo.ifremer.fr
- ArgoVis: https://argovis.colorado.edu
- Free public access
- NetCDF and JSON formats
Materials Project
- Website: https://materialsproject.org
- API Docs: https://materialsproject.org/api
- Free API key required (easy registration)
- 150,000+ computed materials
Future Enhancements
- Full Argo GDAC Integration - Parse netCDF files directly
- CERN Data Caching - Local cache for large datasets
- USGS Historical Data - Access to complete historical catalog
- Materials Project Batch Queries - Optimize multi-material searches
- Real-time Earthquake Streaming - WebSocket for live data
- Ocean Current Prediction - ML models for temperature forecasting
License
Part of RuVector Data Discovery Framework. See main LICENSE file.