git-subtree-dir: vendor/ruvector git-subtree-split: b64c21726f2bb37286d9ee36a7869fef60cc6900
5.8 KiB
Patent Database API Client
A comprehensive patent data discovery client for the RuVector framework, providing access to USPTO and EPO patent databases.
Features
USPTO PatentsView API Client
The UsptoPatentClient provides free, unauthenticated access to the USPTO PatentsView API with the following capabilities:
Search Methods
-
Keyword Search -
search_patents(query, max_results)- Search patents by keywords in title and abstract
- Example:
client.search_patents("quantum computing", 100).await?
-
Assignee Search -
search_by_assignee(company_name, max_results)- Find all patents by a specific company or organization
- Example:
client.search_by_assignee("IBM", 50).await?
-
CPC Classification Search -
search_by_cpc(cpc_class, max_results)- Search by Cooperative Patent Classification code
- Example:
client.search_by_cpc("Y02E", 200).await?
-
Patent Details -
get_patent(patent_number)- Get detailed information for a specific patent
- Example:
client.get_patent("10000000").await?
-
Citation Analysis -
get_citations(patent_number)- Get both citing and cited patents for citation network analysis
- Returns:
(citing_patents, cited_patents)
CPC Classification Codes of Interest
-
Y02 - Climate Change Mitigation Technologies
Y02E- Energy generation, transmission, distributionY02T- Climate change mitigation technologies related to transportationY02P- Climate change mitigation technologies in production processes
-
G06N - Computing Arrangements Based on AI/ML
G06N3- Computing based on biological models (neural networks)G06N5- Computing based on knowledge-based modelsG06N20- Machine learning
-
A61 - Medical or Veterinary Science
A61K- Preparations for medical, dental, or toilet purposesA61P- Specific therapeutic activity of chemical compounds
-
H01 - Electric Elements
H01L- Semiconductor devicesH01M- Batteries, fuel cells, capacitors
Data Format
All patent data is converted to SemanticVector format:
SemanticVector {
id: "US10123456", // Patent number with US prefix
embedding: Vec<f32>, // 512-dimension embedding from title + abstract
domain: Domain::Research, // Could be Domain::Innovation if added
timestamp: DateTime<Utc>, // Grant date or filing date
metadata: HashMap {
"patent_number": "10123456",
"title": "Quantum computing system...",
"abstract": "A quantum computing system comprising...",
"assignee": "IBM Corporation",
"inventors": "John Doe, Jane Smith",
"cpc_codes": "G06N10/00, G06N99/00",
"citations_count": "42", // Number of patents citing this one
"cited_count": "15", // Number of patents cited by this one
"source": "uspto"
}
}
Rate Limiting
- USPTO: 200ms between requests (~5 req/sec) - follows PatentsView API guidelines
- EPO: 1000ms between requests (~1 req/sec) - conservative rate limiting
- Automatic retry with exponential backoff (max 3 retries)
Usage Example
use ruvector_data_framework::{UsptoPatentClient, Result};
#[tokio::main]
async fn main() -> Result<()> {
// Create client (no authentication required)
let client = UsptoPatentClient::new()?;
// Search for climate tech patents
let patents = client.search_by_cpc("Y02E", 100).await?;
for patent in patents {
println!("Patent: {} - {}",
patent.id,
patent.metadata.get("title").unwrap_or(&"Untitled".to_string())
);
}
// Search by company
let tesla_patents = client.search_by_assignee("Tesla", 50).await?;
// Get specific patent with citations
if let Some(patent) = client.get_patent("10000000").await? {
let (citing, cited) = client.get_citations(&patent.id[2..]).await?;
println!("Cited by {} patents, cites {} patents", citing.len(), cited.len());
}
Ok(())
}
Run the Example
cargo run --example patent_discovery
EPO Client (Future Development)
The EpoClient is a placeholder for European Patent Office integration. Implementation requires:
- OAuth authentication flow
- EPO developer registration at https://developers.epo.org/
- Consumer key and secret
API Documentation
- USPTO PatentsView: https://patentsview.org/apis/api-endpoints
- EPO Open Patent Services: https://developers.epo.org/
Testing
# Run all tests
cargo test --lib patent_clients
# Run integration tests (requires network)
cargo test --lib patent_clients -- --ignored
# Test specific functionality
cargo test --lib patent_clients::tests::test_cpc_classification_mapping
Integration with Discovery Framework
The patent client integrates seamlessly with the RuVector discovery framework:
use ruvector_data_framework::{
UsptoPatentClient,
DiscoveryPipeline,
PipelineConfig,
};
// 1. Fetch patent data
let client = UsptoPatentClient::new()?;
let vectors = client.search_by_cpc("G06N", 1000).await?;
// 2. Add to discovery engine
let mut pipeline = DiscoveryPipeline::new(PipelineConfig::default());
// ... add vectors to pipeline ...
// 3. Discover patterns across patent citation networks
let patterns = pipeline.run(source).await?;
Features
- ✅ Free API access (no authentication)
- ✅ Comprehensive patent metadata
- ✅ Citation network analysis
- ✅ CPC classification search
- ✅ Rate limiting and retry logic
- ✅ SemanticVector conversion with embeddings
- ✅ Unit and integration tests
- 🔄 EPO integration (planned)
Contributing
To add new patent sources:
- Implement the client following the pattern in
patent_clients.rs - Add conversion to
SemanticVectorformat - Implement rate limiting and error handling
- Add comprehensive tests
- Update this documentation