# Patent Database API Client A comprehensive patent data discovery client for the RuVector framework, providing access to USPTO and EPO patent databases. ## Features ### USPTO PatentsView API Client The `UsptoPatentClient` provides free, unauthenticated access to the USPTO PatentsView API with the following capabilities: #### Search Methods 1. **Keyword Search** - `search_patents(query, max_results)` - Search patents by keywords in title and abstract - Example: `client.search_patents("quantum computing", 100).await?` 2. **Assignee Search** - `search_by_assignee(company_name, max_results)` - Find all patents by a specific company or organization - Example: `client.search_by_assignee("IBM", 50).await?` 3. **CPC Classification Search** - `search_by_cpc(cpc_class, max_results)` - Search by Cooperative Patent Classification code - Example: `client.search_by_cpc("Y02E", 200).await?` 4. **Patent Details** - `get_patent(patent_number)` - Get detailed information for a specific patent - Example: `client.get_patent("10000000").await?` 5. **Citation Analysis** - `get_citations(patent_number)` - Get both citing and cited patents for citation network analysis - Returns: `(citing_patents, cited_patents)` #### CPC Classification Codes of Interest - **Y02** - Climate Change Mitigation Technologies - `Y02E` - Energy generation, transmission, distribution - `Y02T` - Climate change mitigation technologies related to transportation - `Y02P` - Climate change mitigation technologies in production processes - **G06N** - Computing Arrangements Based on AI/ML - `G06N3` - Computing based on biological models (neural networks) - `G06N5` - Computing based on knowledge-based models - `G06N20` - Machine learning - **A61** - Medical or Veterinary Science - `A61K` - Preparations for medical, dental, or toilet purposes - `A61P` - Specific therapeutic activity of chemical compounds - **H01** - Electric Elements - `H01L` - Semiconductor devices - `H01M` - Batteries, fuel cells, capacitors ## Data Format All patent data is converted to `SemanticVector` format: ```rust SemanticVector { id: "US10123456", // Patent number with US prefix embedding: Vec, // 512-dimension embedding from title + abstract domain: Domain::Research, // Could be Domain::Innovation if added timestamp: DateTime, // Grant date or filing date metadata: HashMap { "patent_number": "10123456", "title": "Quantum computing system...", "abstract": "A quantum computing system comprising...", "assignee": "IBM Corporation", "inventors": "John Doe, Jane Smith", "cpc_codes": "G06N10/00, G06N99/00", "citations_count": "42", // Number of patents citing this one "cited_count": "15", // Number of patents cited by this one "source": "uspto" } } ``` ## Rate Limiting - **USPTO**: 200ms between requests (~5 req/sec) - follows PatentsView API guidelines - **EPO**: 1000ms between requests (~1 req/sec) - conservative rate limiting - Automatic retry with exponential backoff (max 3 retries) ## Usage Example ```rust use ruvector_data_framework::{UsptoPatentClient, Result}; #[tokio::main] async fn main() -> Result<()> { // Create client (no authentication required) let client = UsptoPatentClient::new()?; // Search for climate tech patents let patents = client.search_by_cpc("Y02E", 100).await?; for patent in patents { println!("Patent: {} - {}", patent.id, patent.metadata.get("title").unwrap_or(&"Untitled".to_string()) ); } // Search by company let tesla_patents = client.search_by_assignee("Tesla", 50).await?; // Get specific patent with citations if let Some(patent) = client.get_patent("10000000").await? { let (citing, cited) = client.get_citations(&patent.id[2..]).await?; println!("Cited by {} patents, cites {} patents", citing.len(), cited.len()); } Ok(()) } ``` ## Run the Example ```bash cargo run --example patent_discovery ``` ## EPO Client (Future Development) The `EpoClient` is a placeholder for European Patent Office integration. Implementation requires: - OAuth authentication flow - EPO developer registration at https://developers.epo.org/ - Consumer key and secret ## API Documentation - **USPTO PatentsView**: https://patentsview.org/apis/api-endpoints - **EPO Open Patent Services**: https://developers.epo.org/ ## Testing ```bash # Run all tests cargo test --lib patent_clients # Run integration tests (requires network) cargo test --lib patent_clients -- --ignored # Test specific functionality cargo test --lib patent_clients::tests::test_cpc_classification_mapping ``` ## Integration with Discovery Framework The patent client integrates seamlessly with the RuVector discovery framework: ```rust use ruvector_data_framework::{ UsptoPatentClient, DiscoveryPipeline, PipelineConfig, }; // 1. Fetch patent data let client = UsptoPatentClient::new()?; let vectors = client.search_by_cpc("G06N", 1000).await?; // 2. Add to discovery engine let mut pipeline = DiscoveryPipeline::new(PipelineConfig::default()); // ... add vectors to pipeline ... // 3. Discover patterns across patent citation networks let patterns = pipeline.run(source).await?; ``` ## Features - ✅ Free API access (no authentication) - ✅ Comprehensive patent metadata - ✅ Citation network analysis - ✅ CPC classification search - ✅ Rate limiting and retry logic - ✅ SemanticVector conversion with embeddings - ✅ Unit and integration tests - 🔄 EPO integration (planned) ## Contributing To add new patent sources: 1. Implement the client following the pattern in `patent_clients.rs` 2. Add conversion to `SemanticVector` format 3. Implement rate limiting and error handling 4. Add comprehensive tests 5. Update this documentation