Merge commit 'd803bfe2b1fe7f5e219e50ac20d6801a0a58ac75' as 'vendor/ruvector'

This commit is contained in:
ruv
2026-02-28 14:39:40 -05:00
7854 changed files with 3522914 additions and 0 deletions

View File

@@ -0,0 +1,304 @@
# @ruvector/rvdna
**DNA analysis in JavaScript.** Encode sequences, translate proteins, search genomes by similarity, and read the `.rvdna` AI-native file format — all from Node.js or the browser.
Built on Rust via NAPI-RS for native speed. Falls back to pure JavaScript when native bindings aren't available.
```bash
npm install @ruvector/rvdna
```
## What It Does
| Function | What It Does | Native Required? |
|---|---|---|
| `encode2bit(seq)` | Pack DNA into 2-bit bytes (4 bases per byte) | No (JS fallback) |
| `decode2bit(buf, len)` | Unpack 2-bit bytes back to DNA string | No (JS fallback) |
| `translateDna(seq)` | Translate DNA to protein amino acids | No (JS fallback) |
| `cosineSimilarity(a, b)` | Cosine similarity between two vectors | No (JS fallback) |
| `fastaToRvdna(seq, opts)` | Convert FASTA to `.rvdna` binary format | Yes |
| `readRvdna(buf)` | Parse a `.rvdna` file from a Buffer | Yes |
| `isNativeAvailable()` | Check if native Rust bindings are loaded | No |
## Quick Start
```js
const { encode2bit, decode2bit, translateDna, cosineSimilarity } = require('@ruvector/rvdna');
// Encode DNA to compact 2-bit format (4 bases per byte)
const packed = encode2bit('ACGTACGTACGT');
console.log(packed); // <Buffer 1b 1b 1b>
// Decode it back — lossless round-trip
const dna = decode2bit(packed, 12);
console.log(dna); // 'ACGTACGTACGT'
// Translate DNA to protein (standard genetic code)
const protein = translateDna('ATGGCCATTGTAATG');
console.log(protein); // 'MAIV'
// Compare two k-mer vectors
const sim = cosineSimilarity([1, 2, 3], [1, 2, 3]);
console.log(sim); // 1.0 (identical)
```
## API Reference
### `encode2bit(sequence: string): Buffer`
Packs a DNA string into 2-bit bytes. Each byte holds 4 bases: A=00, C=01, G=10, T=11. Ambiguous bases (N) map to A.
```js
encode2bit('ACGT') // <Buffer 1b> — one byte for 4 bases
encode2bit('AAAA') // <Buffer 00>
encode2bit('TTTT') // <Buffer ff>
```
### `decode2bit(buffer: Buffer, length: number): string`
Decodes 2-bit packed bytes back to a DNA string. You must pass the original sequence length since the last byte may have padding.
```js
decode2bit(Buffer.from([0x1b]), 4) // 'ACGT'
```
### `translateDna(sequence: string): string`
Translates a DNA string to a protein amino acid string using the standard genetic code. Stops at the first stop codon (TAA, TAG, TGA).
```js
translateDna('ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGA')
// 'MAIVMGR' — stops at TGA stop codon
```
### `cosineSimilarity(a: number[], b: number[]): number`
Returns cosine similarity between two numeric arrays. Result is between -1 and 1.
```js
cosineSimilarity([1, 0, 0], [0, 1, 0]) // 0 (orthogonal)
cosineSimilarity([1, 2, 3], [2, 4, 6]) // 1 (parallel)
```
### `fastaToRvdna(sequence: string, options?: RvdnaOptions): Buffer`
Converts a raw DNA sequence to the `.rvdna` binary format with pre-computed k-mer vectors. **Requires native bindings.**
```js
const { fastaToRvdna, isNativeAvailable } = require('@ruvector/rvdna');
if (isNativeAvailable()) {
const rvdna = fastaToRvdna('ACGTACGT...', { k: 11, dims: 512, blockSize: 500 });
require('fs').writeFileSync('output.rvdna', rvdna);
}
```
| Option | Default | Description |
|---|---|---|
| `k` | 11 | K-mer size for vector encoding |
| `dims` | 512 | Vector dimensions per block |
| `blockSize` | 500 | Bases per vector block |
### `readRvdna(buffer: Buffer): RvdnaFile`
Parses a `.rvdna` file. Returns the decoded sequence, k-mer vectors, variants, metadata, and file statistics. **Requires native bindings.**
```js
const fs = require('fs');
const { readRvdna } = require('@ruvector/rvdna');
const file = readRvdna(fs.readFileSync('sample.rvdna'));
console.log(file.sequenceLength); // 430
console.log(file.sequence.slice(0, 20)); // 'ATGGTGCATCTGACTCCTGA'
console.log(file.kmerVectors.length); // number of vector blocks
console.log(file.stats.bitsPerBase); // ~3.2
console.log(file.stats.compressionRatio); // vs raw FASTA
```
**RvdnaFile fields:**
| Field | Type | Description |
|---|---|---|
| `version` | `number` | Format version |
| `sequenceLength` | `number` | Number of bases |
| `sequence` | `string` | Decoded DNA string |
| `kmerVectors` | `Array` | Pre-computed k-mer vector blocks |
| `variants` | `Array \| null` | Variant positions with genotype likelihoods |
| `metadata` | `Record \| null` | Key-value metadata |
| `stats.totalSize` | `number` | File size in bytes |
| `stats.bitsPerBase` | `number` | Storage efficiency |
| `stats.compressionRatio` | `number` | Compression vs raw |
## The `.rvdna` File Format
Traditional genomic formats (FASTA, FASTQ, BAM) store raw sequences. Every time an AI model needs that data, it re-encodes everything from scratch — vectors, attention matrices, features. This takes 30-120 seconds per file.
`.rvdna` stores the sequence **and** pre-computed AI features together. Open the file and everything is ready — no re-encoding.
```
.rvdna file layout:
[Magic: "RVDNA\x01\x00\x00"] 8 bytes — file identifier
[Header] 64 bytes — version, flags, offsets
[Section 0: Sequence] 2-bit packed DNA (4 bases/byte)
[Section 1: K-mer Vectors] HNSW-ready embeddings
[Section 2: Attention Weights] Sparse COO matrices
[Section 3: Variant Tensor] f16 genotype likelihoods
[Section 4: Protein Embeddings] GNN features + contact graphs
[Section 5: Epigenomic Tracks] Methylation + clock data
[Section 6: Metadata] JSON provenance + checksums
```
### Format Comparison
| | FASTA | FASTQ | BAM | CRAM | **.rvdna** |
|---|---|---|---|---|---|
| **Encoding** | ASCII (1 char/base) | ASCII + Phred | Binary + ref | Ref-compressed | 2-bit packed |
| **Bits per base** | 8 | 16 | 2-4 | 0.5-2 | **3.2** (seq only) |
| **Random access** | Scan from start | Scan from start | Index ~10 us | Decode ~50 us | **mmap <1 us** |
| **AI features included** | No | No | No | No | **Yes** |
| **Vector search ready** | No | No | No | No | **HNSW built-in** |
| **Zero-copy mmap** | No | No | Partial | No | **Full** |
| **Single file** | Yes | Yes | Needs .bai | Needs .crai | **Yes** |
## Platform Support
Native NAPI-RS bindings are available for these platforms:
| Platform | Architecture | Package |
|---|---|---|
| Linux | x64 (glibc) | `@ruvector/rvdna-linux-x64-gnu` |
| Linux | ARM64 (glibc) | `@ruvector/rvdna-linux-arm64-gnu` |
| macOS | x64 (Intel) | `@ruvector/rvdna-darwin-x64` |
| macOS | ARM64 (Apple Silicon) | `@ruvector/rvdna-darwin-arm64` |
| Windows | x64 | `@ruvector/rvdna-win32-x64-msvc` |
These install automatically as optional dependencies. On unsupported platforms, basic functions (`encode2bit`, `decode2bit`, `translateDna`, `cosineSimilarity`) still work via pure JavaScript fallbacks.
## WASM (WebAssembly)
rvDNA can run entirely in the browser via WebAssembly. No server needed, no data leaves the user's device.
### Browser Setup
```bash
# Build from the Rust source
cd examples/dna
wasm-pack build --target web --release
```
This produces a `pkg/` directory with `.wasm` and `.js` glue code.
### Using in HTML
```html
<script type="module">
import init, { encode2bit, translateDna } from './pkg/rvdna.js';
await init(); // Load the WASM module
// Encode DNA
const packed = encode2bit('ACGTACGTACGT');
console.log('Packed bytes:', packed);
// Translate to protein
const protein = translateDna('ATGGCCATTGTAATG');
console.log('Protein:', protein); // 'MAIV'
</script>
```
### Using with Bundlers (Webpack, Vite)
```bash
# For bundler targets
wasm-pack build --target bundler --release
```
```js
// In your app
import { encode2bit, translateDna, fastaToRvdna } from '@ruvector/rvdna-wasm';
const packed = encode2bit('ACGTACGT');
const protein = translateDna('ATGGCCATT');
```
### WASM Features
| Feature | Status | Description |
|---|---|---|
| 2-bit encode/decode | Available | Pack/unpack DNA sequences |
| Protein translation | Available | Standard genetic code |
| Cosine similarity | Available | Vector comparison |
| `.rvdna` read/write | Planned | Full format support in browser |
| HNSW search | Planned | K-mer similarity search |
| Variant calling | Planned | Client-side mutation detection |
**Target WASM binary size:** <2 MB gzipped
### Privacy
WASM runs entirely client-side. DNA data never leaves the browser. This makes it suitable for:
- Clinical genomics dashboards
- Patient-facing genetic reports
- Educational tools
- Offline/edge analysis on devices with no internet
## TypeScript
Full TypeScript definitions are included. Import types directly:
```ts
import {
encode2bit,
decode2bit,
translateDna,
cosineSimilarity,
fastaToRvdna,
readRvdna,
isNativeAvailable,
RvdnaOptions,
RvdnaFile,
} from '@ruvector/rvdna';
```
## Speed
The native (Rust) backend handles these operations on real human gene data:
| Operation | Time | What It Does |
|---|---|---|
| Single SNP call | **155 ns** | Bayesian genotyping at one position |
| Protein translation (1 kb) | **23 ns** | DNA to amino acids |
| K-mer vector (1 kb) | **591 us** | Full pipeline with HNSW indexing |
| Complete analysis (5 genes) | **12 ms** | All stages including `.rvdna` output |
### vs Traditional Tools
| Task | Traditional Tool | Their Time | rvDNA | Speedup |
|---|---|---|---|---|
| K-mer counting | Jellyfish | 15-30 min | 2-5 sec | **180-900x** |
| Sequence similarity | BLAST | 1-5 min | 5-50 ms | **1,200-60,000x** |
| Variant calling | GATK | 30-90 min | 3-10 min | **3-30x** |
| Methylation age | R/Bioconductor | 5-15 min | 0.1-0.5 sec | **600-9,000x** |
## Rust Crate
The full Rust crate with all algorithms is available on crates.io:
```toml
[dependencies]
rvdna = "0.1"
```
See the [Rust documentation](https://docs.rs/rvdna) for the complete API including Smith-Waterman alignment, Horvath clock, CYP2D6 pharmacogenomics, and more.
## Links
- [GitHub](https://github.com/ruvnet/ruvector/tree/main/examples/dna) - Source code
- [crates.io](https://crates.io/crates/rvdna) - Rust crate
- [RuVector](https://github.com/ruvnet/ruvector) - Parent vector computing platform
## License
MIT

View File

@@ -0,0 +1,359 @@
/**
* @ruvector/rvdna — AI-native genomic analysis and the .rvdna file format.
*
* Provides variant calling, protein translation, k-mer vector search,
* and the compact .rvdna binary format via Rust NAPI-RS bindings.
*/
/**
* Encode a DNA string to 2-bit packed bytes (4 bases per byte).
* A=00, C=01, G=10, T=11. Ambiguous bases (N) map to A.
*/
export function encode2bit(sequence: string): Buffer;
/**
* Decode 2-bit packed bytes back to a DNA string.
* @param buffer - The 2-bit packed buffer
* @param length - Number of bases to decode
*/
export function decode2bit(buffer: Buffer, length: number): string;
/**
* Translate a DNA string to a protein amino acid string.
* Uses the standard genetic code. Stops at the first stop codon.
*/
export function translateDna(sequence: string): string;
/**
* Compute cosine similarity between two numeric arrays.
* Returns a value between -1 and 1.
*/
export function cosineSimilarity(a: number[], b: number[]): number;
export interface RvdnaOptions {
/** K-mer size (default: 11) */
k?: number;
/** Vector dimensions (default: 512) */
dims?: number;
/** Block size in bases (default: 500) */
blockSize?: number;
}
/**
* Convert a FASTA sequence string to .rvdna binary format.
* Requires native bindings.
*/
export function fastaToRvdna(sequence: string, options?: RvdnaOptions): Buffer;
export interface RvdnaFile {
/** Format version */
version: number;
/** Sequence length in bases */
sequenceLength: number;
/** Decoded DNA sequence */
sequence: string;
/** Pre-computed k-mer vector blocks */
kmerVectors: Array<{
k: number;
dimensions: number;
startPos: number;
regionLen: number;
vector: Float32Array;
}>;
/** Variant positions and genotype likelihoods */
variants: Array<{
position: number;
refAllele: string;
altAllele: string;
likelihoods: [number, number, number];
quality: number;
}> | null;
/** Metadata key-value pairs */
metadata: Record<string, unknown> | null;
/** File statistics */
stats: {
totalSize: number;
bitsPerBase: number;
compressionRatio: number;
};
}
/**
* Read a .rvdna file from a Buffer. Returns parsed sections.
* Requires native bindings.
*/
export function readRvdna(buffer: Buffer): RvdnaFile;
/**
* Check if native bindings are available for the current platform.
*/
export function isNativeAvailable(): boolean;
/**
* Direct access to the native NAPI-RS module (null if not available).
*/
export const native: Record<string, Function> | null;
// -------------------------------------------------------------------
// 23andMe Genotyping Pipeline (v0.2.0)
// -------------------------------------------------------------------
/**
* Normalize a genotype string: uppercase, trim, sort allele pair.
* "ag" → "AG", "TC" → "CT", "DI" → "DI"
*/
export function normalizeGenotype(gt: string): string;
export interface Snp {
rsid: string;
chromosome: string;
position: number;
genotype: string;
}
export interface GenotypeData {
snps: Record<string, Snp>;
totalMarkers: number;
noCalls: number;
chrCounts: Record<string, number>;
build: 'GRCh37' | 'GRCh38' | 'Unknown';
}
/**
* Parse a 23andMe raw data file (v4/v5 tab-separated format).
* Normalizes all genotype strings on load.
*/
export function parse23andMe(text: string): {
snps: Map<string, Snp>;
totalMarkers: number;
noCalls: number;
chrCounts: Map<string, number>;
build: string;
};
export interface CypDiplotype {
gene: string;
allele1: string;
allele2: string;
activity: number;
phenotype: 'UltraRapid' | 'Normal' | 'Intermediate' | 'Poor';
confidence: 'Unsupported' | 'Weak' | 'Moderate' | 'Strong';
rsidsGenotyped: number;
rsidsMatched: number;
rsidsTotal: number;
notes: string[];
details: string[];
}
/** Call CYP2D6 diplotype from a genotype map */
export function callCyp2d6(gts: Map<string, string>): CypDiplotype;
/** Call CYP2C19 diplotype from a genotype map */
export function callCyp2c19(gts: Map<string, string>): CypDiplotype;
export interface ApoeResult {
genotype: string;
rs429358: string;
rs7412: string;
}
/** Determine APOE genotype from rs429358 + rs7412 */
export function determineApoe(gts: Map<string, string>): ApoeResult;
export interface AnalysisResult {
data: GenotypeData;
cyp2d6: CypDiplotype;
cyp2c19: CypDiplotype;
apoe: ApoeResult;
homozygous: number;
heterozygous: number;
indels: number;
hetRatio: number;
}
/**
* Run the full 23andMe analysis pipeline.
* @param text - Raw 23andMe file contents
*/
export function analyze23andMe(text: string): AnalysisResult;
// -------------------------------------------------------------------
// Biomarker Risk Scoring Engine (v0.3.0)
// -------------------------------------------------------------------
/** Clinical reference range for a single biomarker. */
export interface BiomarkerReference {
name: string;
unit: string;
normalLow: number;
normalHigh: number;
criticalLow: number | null;
criticalHigh: number | null;
category: string;
}
/** Classification of a biomarker value relative to its reference range. */
export type BiomarkerClassification = 'CriticalLow' | 'Low' | 'Normal' | 'High' | 'CriticalHigh';
/** Risk score for a single clinical category. */
export interface CategoryScore {
category: string;
score: number;
confidence: number;
contributingVariants: string[];
}
/** Full biomarker + genotype risk profile for one subject. */
export interface BiomarkerProfile {
subjectId: string;
timestamp: number;
categoryScores: Record<string, CategoryScore>;
globalRiskScore: number;
profileVector: Float32Array;
biomarkerValues: Record<string, number>;
}
/** SNP risk descriptor. */
export interface SnpDef {
rsid: string;
category: string;
wRef: number;
wHet: number;
wAlt: number;
homRef: string;
het: string;
homAlt: string;
maf: number;
}
/** Gene-gene interaction descriptor. */
export interface InteractionDef {
rsidA: string;
rsidB: string;
modifier: number;
category: string;
}
/** 13 clinical biomarker reference ranges. */
export const BIOMARKER_REFERENCES: readonly BiomarkerReference[];
/** 20-SNP risk table (mirrors Rust biomarker.rs). */
export const SNPS: readonly SnpDef[];
/** 6 gene-gene interaction modifiers. */
export const INTERACTIONS: readonly InteractionDef[];
/** Category ordering: Cancer Risk, Cardiovascular, Neurological, Metabolism. */
export const CAT_ORDER: readonly string[];
/** Return the static biomarker reference table. */
export function biomarkerReferences(): readonly BiomarkerReference[];
/** Compute a z-score for a value relative to a reference range. */
export function zScore(value: number, ref: BiomarkerReference): number;
/** Classify a biomarker value against its reference range. */
export function classifyBiomarker(value: number, ref: BiomarkerReference): BiomarkerClassification;
/** Compute composite risk scores from genotype data (20 SNPs, 6 interactions). */
export function computeRiskScores(genotypes: Map<string, string>): BiomarkerProfile;
/** Encode a profile into a 64-dim L2-normalized Float32Array. */
export function encodeProfileVector(profile: BiomarkerProfile): Float32Array;
/** Generate a deterministic synthetic population of biomarker profiles. */
export function generateSyntheticPopulation(count: number, seed: number): BiomarkerProfile[];
// -------------------------------------------------------------------
// Streaming Biomarker Processor (v0.3.0)
// -------------------------------------------------------------------
/** Biomarker stream definition. */
export interface BiomarkerDef {
id: string;
low: number;
high: number;
}
/** 6 streaming biomarker definitions. */
export const BIOMARKER_DEFS: readonly BiomarkerDef[];
/** Configuration for the streaming biomarker simulator. */
export interface StreamConfig {
baseIntervalMs: number;
noiseAmplitude: number;
driftRate: number;
anomalyProbability: number;
anomalyMagnitude: number;
numBiomarkers: number;
windowSize: number;
}
/** A single timestamped biomarker data point. */
export interface BiomarkerReading {
timestampMs: number;
biomarkerId: string;
value: number;
referenceLow: number;
referenceHigh: number;
isAnomaly: boolean;
zScore: number;
}
/** Rolling statistics for a single biomarker stream. */
export interface StreamStats {
mean: number;
variance: number;
min: number;
max: number;
count: number;
anomalyRate: number;
trendSlope: number;
ema: number;
cusumPos: number;
cusumNeg: number;
changepointDetected: boolean;
}
/** Result of processing a single reading. */
export interface ProcessingResult {
accepted: boolean;
zScore: number;
isAnomaly: boolean;
currentTrend: number;
}
/** Aggregate summary across all biomarker streams. */
export interface StreamSummary {
totalReadings: number;
anomalyCount: number;
anomalyRate: number;
biomarkerStats: Record<string, StreamStats>;
throughputReadingsPerSec: number;
}
/** Fixed-capacity circular buffer backed by Float64Array. */
export class RingBuffer {
constructor(capacity: number);
push(item: number): void;
toArray(): number[];
readonly length: number;
readonly capacity: number;
isFull(): boolean;
clear(): void;
[Symbol.iterator](): IterableIterator<number>;
}
/** Streaming biomarker processor with per-stream ring buffers, z-score anomaly detection, CUSUM changepoint detection, and trend analysis. */
export class StreamProcessor {
constructor(config?: StreamConfig);
processReading(reading: BiomarkerReading): ProcessingResult;
getStats(biomarkerId: string): StreamStats | null;
summary(): StreamSummary;
}
/** Return default stream configuration. */
export function defaultStreamConfig(): StreamConfig;
/** Generate batch of synthetic biomarker readings. */
export function generateReadings(config: StreamConfig, count: number, seed: number): BiomarkerReading[];

View File

@@ -0,0 +1,392 @@
const { platform, arch } = process;
// Platform-specific native binary packages
const platformMap = {
'linux': {
'x64': '@ruvector/rvdna-linux-x64-gnu',
'arm64': '@ruvector/rvdna-linux-arm64-gnu'
},
'darwin': {
'x64': '@ruvector/rvdna-darwin-x64',
'arm64': '@ruvector/rvdna-darwin-arm64'
},
'win32': {
'x64': '@ruvector/rvdna-win32-x64-msvc'
}
};
function loadNativeModule() {
const platformPackage = platformMap[platform]?.[arch];
if (!platformPackage) {
throw new Error(
`Unsupported platform: ${platform}-${arch}\n` +
`@ruvector/rvdna native bindings are available for:\n` +
`- Linux (x64, ARM64)\n` +
`- macOS (x64, ARM64)\n` +
`- Windows (x64)\n\n` +
`For other platforms, use the WASM build: npm install @ruvector/rvdna-wasm`
);
}
try {
return require(platformPackage);
} catch (error) {
if (error.code === 'MODULE_NOT_FOUND') {
throw new Error(
`Native module not found for ${platform}-${arch}\n` +
`Please install: npm install ${platformPackage}\n` +
`Or reinstall @ruvector/rvdna to get optional dependencies`
);
}
throw error;
}
}
// Try native first, fall back to pure JS shim with basic functionality
let nativeModule;
try {
nativeModule = loadNativeModule();
} catch (e) {
// Native bindings not available — provide JS shim for basic operations
nativeModule = null;
}
// -------------------------------------------------------------------
// Public API — wraps native bindings or provides JS fallbacks
// -------------------------------------------------------------------
/**
* Encode a DNA string to 2-bit packed bytes (4 bases per byte).
* A=00, C=01, G=10, T=11. Returns a Buffer.
*/
function encode2bit(sequence) {
if (nativeModule?.encode2bit) return nativeModule.encode2bit(sequence);
// JS fallback
const map = { A: 0, C: 1, G: 2, T: 3, N: 0 };
const len = sequence.length;
const buf = Buffer.alloc(Math.ceil(len / 4));
for (let i = 0; i < len; i++) {
const byteIdx = i >> 2;
const bitOff = 6 - (i & 3) * 2;
buf[byteIdx] |= (map[sequence[i]] || 0) << bitOff;
}
return buf;
}
/**
* Decode 2-bit packed bytes back to a DNA string.
*/
function decode2bit(buffer, length) {
if (nativeModule?.decode2bit) return nativeModule.decode2bit(buffer, length);
const bases = ['A', 'C', 'G', 'T'];
let result = '';
for (let i = 0; i < length; i++) {
const byteIdx = i >> 2;
const bitOff = 6 - (i & 3) * 2;
result += bases[(buffer[byteIdx] >> bitOff) & 3];
}
return result;
}
/**
* Translate a DNA string to a protein amino acid string.
*/
function translateDna(sequence) {
if (nativeModule?.translateDna) return nativeModule.translateDna(sequence);
// JS fallback — standard genetic code
const codons = {
'TTT':'F','TTC':'F','TTA':'L','TTG':'L','CTT':'L','CTC':'L','CTA':'L','CTG':'L',
'ATT':'I','ATC':'I','ATA':'I','ATG':'M','GTT':'V','GTC':'V','GTA':'V','GTG':'V',
'TCT':'S','TCC':'S','TCA':'S','TCG':'S','CCT':'P','CCC':'P','CCA':'P','CCG':'P',
'ACT':'T','ACC':'T','ACA':'T','ACG':'T','GCT':'A','GCC':'A','GCA':'A','GCG':'A',
'TAT':'Y','TAC':'Y','TAA':'*','TAG':'*','CAT':'H','CAC':'H','CAA':'Q','CAG':'Q',
'AAT':'N','AAC':'N','AAA':'K','AAG':'K','GAT':'D','GAC':'D','GAA':'E','GAG':'E',
'TGT':'C','TGC':'C','TGA':'*','TGG':'W','CGT':'R','CGC':'R','CGA':'R','CGG':'R',
'AGT':'S','AGC':'S','AGA':'R','AGG':'R','GGT':'G','GGC':'G','GGA':'G','GGG':'G',
};
let protein = '';
for (let i = 0; i + 2 < sequence.length; i += 3) {
const codon = sequence.slice(i, i + 3).toUpperCase();
const aa = codons[codon] || 'X';
if (aa === '*') break;
protein += aa;
}
return protein;
}
/**
* Compute cosine similarity between two numeric arrays.
*/
function cosineSimilarity(a, b) {
if (nativeModule?.cosineSimilarity) return nativeModule.cosineSimilarity(a, b);
let dot = 0, magA = 0, magB = 0;
for (let i = 0; i < a.length; i++) {
dot += a[i] * b[i];
magA += a[i] * a[i];
magB += b[i] * b[i];
}
magA = Math.sqrt(magA);
magB = Math.sqrt(magB);
return (magA && magB) ? dot / (magA * magB) : 0;
}
/**
* Convert a FASTA sequence string to .rvdna binary format.
* Returns a Buffer with the complete .rvdna file contents.
*/
function fastaToRvdna(sequence, options = {}) {
if (nativeModule?.fastaToRvdna) {
return nativeModule.fastaToRvdna(sequence, options.k || 11, options.dims || 512, options.blockSize || 500);
}
throw new Error('fastaToRvdna requires native bindings. Install the platform-specific package.');
}
/**
* Read a .rvdna file from a Buffer. Returns parsed sections.
*/
function readRvdna(buffer) {
if (nativeModule?.readRvdna) return nativeModule.readRvdna(buffer);
throw new Error('readRvdna requires native bindings. Install the platform-specific package.');
}
/**
* Check if native bindings are available.
*/
function isNativeAvailable() {
return nativeModule !== null;
}
// -------------------------------------------------------------------
// 23andMe Genotyping Pipeline (pure JS — mirrors Rust rvdna::genotyping)
// -------------------------------------------------------------------
/**
* Normalize a genotype string: uppercase, trim, sort allele pair.
* "ag" → "AG", "TC" → "CT", "DI" → "DI"
*/
function normalizeGenotype(gt) {
gt = gt.trim().toUpperCase();
if (gt.length === 2 && gt[0] > gt[1]) {
return gt[1] + gt[0];
}
return gt;
}
/**
* Parse a 23andMe raw data file (v4/v5 tab-separated format).
* @param {string} text - Raw file contents
* @returns {{ snps: Map<string,object>, totalMarkers: number, noCalls: number, chrCounts: Map<string,number>, build: string }}
*/
function parse23andMe(text) {
const snps = new Map();
const chrCounts = new Map();
let total = 0, noCalls = 0;
let build = 'Unknown';
for (const line of text.split('\n')) {
if (line.startsWith('#')) {
const lower = line.toLowerCase();
if (lower.includes('build 37') || lower.includes('grch37') || lower.includes('hg19')) build = 'GRCh37';
else if (lower.includes('build 38') || lower.includes('grch38') || lower.includes('hg38')) build = 'GRCh38';
continue;
}
if (!line.trim()) continue;
const parts = line.split('\t');
if (parts.length < 4) continue;
const [rsid, chrom, posStr, genotype] = parts;
total++;
if (genotype === '--') { noCalls++; continue; }
const pos = parseInt(posStr, 10) || 0;
const normGt = normalizeGenotype(genotype);
chrCounts.set(chrom, (chrCounts.get(chrom) || 0) + 1);
snps.set(rsid, { rsid, chromosome: chrom, position: pos, genotype: normGt });
}
if (total === 0) throw new Error('No markers found in file');
return { snps, totalMarkers: total, noCalls, chrCounts, build };
}
// CYP defining variant tables
const CYP2D6_DEFS = [
{ rsid: 'rs3892097', allele: '*4', alt: 'T', isDel: false, activity: 0.0, fn: 'No function (splicing defect)' },
{ rsid: 'rs35742686', allele: '*3', alt: '-', isDel: true, activity: 0.0, fn: 'No function (frameshift)' },
{ rsid: 'rs5030655', allele: '*6', alt: '-', isDel: true, activity: 0.0, fn: 'No function (frameshift)' },
{ rsid: 'rs1065852', allele: '*10', alt: 'T', isDel: false, activity: 0.5, fn: 'Decreased function' },
{ rsid: 'rs28371725', allele: '*41', alt: 'T', isDel: false, activity: 0.5, fn: 'Decreased function' },
{ rsid: 'rs28371706', allele: '*17', alt: 'T', isDel: false, activity: 0.5, fn: 'Decreased function' },
];
const CYP2C19_DEFS = [
{ rsid: 'rs4244285', allele: '*2', alt: 'A', isDel: false, activity: 0.0, fn: 'No function (splicing defect)' },
{ rsid: 'rs4986893', allele: '*3', alt: 'A', isDel: false, activity: 0.0, fn: 'No function (premature stop)' },
{ rsid: 'rs12248560', allele: '*17', alt: 'T', isDel: false, activity: 1.5, fn: 'Increased function' },
];
/**
* Call a CYP diplotype from a genotype map.
* @param {string} gene - Gene name (e.g., "CYP2D6")
* @param {object[]} defs - Defining variant table
* @param {Map<string,string>} gts - rsid → genotype map
*/
function callCypDiplotype(gene, defs, gts) {
const alleles = [];
const details = [];
const notes = [];
let genotyped = 0, matched = 0;
for (const def of defs) {
const gt = gts.get(def.rsid);
if (gt !== undefined) {
genotyped++;
if (def.isDel) {
if (gt === 'DD') { matched++; alleles.push([def.allele, def.activity], [def.allele, def.activity]); details.push(` ${def.rsid}: ${gt} -> homozygous ${def.allele} (${def.fn})`); }
else if (gt === 'DI') { matched++; alleles.push([def.allele, def.activity]); details.push(` ${def.rsid}: ${gt} -> heterozygous ${def.allele} (${def.fn})`); }
else { details.push(` ${def.rsid}: ${gt} -> reference (no ${def.allele})`); }
} else {
const hom = def.alt + def.alt;
if (gt === hom) { matched++; alleles.push([def.allele, def.activity], [def.allele, def.activity]); details.push(` ${def.rsid}: ${gt} -> homozygous ${def.allele} (${def.fn})`); }
else if (gt.includes(def.alt)) { matched++; alleles.push([def.allele, def.activity]); details.push(` ${def.rsid}: ${gt} -> heterozygous ${def.allele} (${def.fn})`); }
else { details.push(` ${def.rsid}: ${gt} -> reference (no ${def.allele})`); }
}
} else {
details.push(` ${def.rsid}: not genotyped`);
}
}
let confidence;
if (genotyped === 0) confidence = 'Unsupported';
else if (matched >= 2 && genotyped * 2 >= defs.length) confidence = 'Strong';
else if ((matched >= 1 && genotyped >= 2) || genotyped * 2 >= defs.length) confidence = 'Moderate';
else confidence = 'Weak';
if (confidence === 'Unsupported') notes.push('Panel lacks all defining variants for this gene.');
if (confidence === 'Weak') notes.push(`Only ${genotyped}/${defs.length} defining rsids genotyped; call unreliable.`);
notes.push('No phase or CNV resolution from genotyping array.');
while (alleles.length < 2) alleles.push(['*1', 1.0]);
const activity = alleles[0][1] + alleles[1][1];
let phenotype;
if (activity > 2.0) phenotype = 'UltraRapid';
else if (activity >= 1.0) phenotype = 'Normal';
else if (activity >= 0.5) phenotype = 'Intermediate';
else phenotype = 'Poor';
return {
gene, allele1: alleles[0][0], allele2: alleles[1][0],
activity, phenotype, confidence,
rsidsGenotyped: genotyped, rsidsMatched: matched, rsidsTotal: defs.length,
notes, details,
};
}
/** Call CYP2D6 diplotype */
function callCyp2d6(gts) { return callCypDiplotype('CYP2D6', CYP2D6_DEFS, gts); }
/** Call CYP2C19 diplotype */
function callCyp2c19(gts) { return callCypDiplotype('CYP2C19', CYP2C19_DEFS, gts); }
/**
* Determine APOE genotype from rs429358 + rs7412.
* @param {Map<string,string>} gts
*/
function determineApoe(gts) {
const gt1 = gts.get('rs429358') || '';
const gt2 = gts.get('rs7412') || '';
if (!gt1 || !gt2) return { genotype: 'Unable to determine (missing data)', rs429358: gt1, rs7412: gt2 };
const e4 = (gt1.match(/C/g) || []).length;
const e2 = (gt2.match(/T/g) || []).length;
const geno = {
'0,0': 'e3/e3 (most common, baseline risk)',
'0,1': 'e2/e3 (PROTECTIVE - reduced Alzheimer\'s risk)',
'0,2': 'e2/e2 (protective; monitor for type III hyperlipoproteinemia)',
'1,0': 'e3/e4 (increased Alzheimer\'s risk ~3x)',
'1,1': 'e2/e4 (mixed - e2 partially offsets e4 risk)',
}[`${e4},${e2}`] || (e4 >= 2 ? 'e4/e4 (significantly increased Alzheimer\'s risk ~12x)' : `Unusual: rs429358=${gt1}, rs7412=${gt2}`);
return { genotype: geno, rs429358: gt1, rs7412: gt2 };
}
/**
* Run the full 23andMe analysis pipeline.
* @param {string} text - Raw 23andMe file contents
* @returns {object} Full analysis result
*/
function analyze23andMe(text) {
const data = parse23andMe(text);
const gts = new Map();
for (const [rsid, snp] of data.snps) gts.set(rsid, snp.genotype);
const cyp2d6 = callCyp2d6(gts);
const cyp2c19 = callCyp2c19(gts);
const apoe = determineApoe(gts);
// Variant classification
let homozygous = 0, heterozygous = 0, indels = 0;
const isNuc = c => 'ACGT'.includes(c);
for (const snp of data.snps.values()) {
const g = snp.genotype;
if (g.length === 2) {
if (isNuc(g[0]) && isNuc(g[1])) { g[0] === g[1] ? homozygous++ : heterozygous++; }
else indels++;
}
}
return {
data: { ...data, snps: Object.fromEntries(data.snps), chrCounts: Object.fromEntries(data.chrCounts) },
cyp2d6, cyp2c19, apoe,
homozygous, heterozygous, indels,
hetRatio: data.totalMarkers - data.noCalls > 0 ? heterozygous / (data.totalMarkers - data.noCalls) * 100 : 0,
};
}
// -------------------------------------------------------------------
// Biomarker Analysis Engine (v0.3.0 — mirrors biomarker.rs + biomarker_stream.rs)
// -------------------------------------------------------------------
const biomarkerModule = require('./src/biomarker');
const streamModule = require('./src/stream');
module.exports = {
// Original API
encode2bit,
decode2bit,
translateDna,
cosineSimilarity,
fastaToRvdna,
readRvdna,
isNativeAvailable,
// 23andMe Genotyping API (v0.2.0)
normalizeGenotype,
parse23andMe,
callCyp2d6,
callCyp2c19,
determineApoe,
analyze23andMe,
// Biomarker Risk Scoring Engine (v0.3.0)
biomarkerReferences: biomarkerModule.biomarkerReferences,
zScore: biomarkerModule.zScore,
classifyBiomarker: biomarkerModule.classifyBiomarker,
computeRiskScores: biomarkerModule.computeRiskScores,
encodeProfileVector: biomarkerModule.encodeProfileVector,
generateSyntheticPopulation: biomarkerModule.generateSyntheticPopulation,
BIOMARKER_REFERENCES: biomarkerModule.BIOMARKER_REFERENCES,
SNPS: biomarkerModule.SNPS,
INTERACTIONS: biomarkerModule.INTERACTIONS,
CAT_ORDER: biomarkerModule.CAT_ORDER,
// Streaming Biomarker Processor (v0.3.0)
RingBuffer: streamModule.RingBuffer,
StreamProcessor: streamModule.StreamProcessor,
generateReadings: streamModule.generateReadings,
defaultStreamConfig: streamModule.defaultStreamConfig,
BIOMARKER_DEFS: streamModule.BIOMARKER_DEFS,
// Re-export native module for advanced use
native: nativeModule,
};

View File

@@ -0,0 +1,65 @@
{
"name": "@ruvector/rvdna",
"version": "0.3.0",
"description": "rvDNA — AI-native genomic analysis. 20-SNP biomarker risk scoring, streaming anomaly detection, 64-dim profile vectors, 23andMe genotyping, CYP2D6/CYP2C19 pharmacogenomics, variant calling, protein prediction, and HNSW vector search.",
"main": "index.js",
"types": "index.d.ts",
"author": "rUv <info@ruv.io> (https://ruv.io)",
"homepage": "https://github.com/ruvnet/ruvector/tree/main/examples/dna",
"repository": {
"type": "git",
"url": "https://github.com/ruvnet/ruvector.git",
"directory": "npm/packages/rvdna"
},
"bugs": {
"url": "https://github.com/ruvnet/ruvector/issues"
},
"license": "MIT",
"engines": {
"node": ">=18.0.0"
},
"files": [
"index.js",
"index.d.ts",
"src/",
"README.md"
],
"scripts": {
"build:napi": "napi build --platform --release --cargo-cwd ../../../examples/dna",
"test": "node tests/test-biomarker.js"
},
"devDependencies": {
"@napi-rs/cli": "^2.18.0"
},
"optionalDependencies": {
"@ruvector/rvdna-linux-x64-gnu": "0.1.0",
"@ruvector/rvdna-linux-arm64-gnu": "0.1.0",
"@ruvector/rvdna-darwin-x64": "0.1.0",
"@ruvector/rvdna-darwin-arm64": "0.1.0",
"@ruvector/rvdna-win32-x64-msvc": "0.1.0"
},
"publishConfig": {
"access": "public"
},
"keywords": [
"genomics",
"bioinformatics",
"dna",
"rvdna",
"biomarker",
"health",
"risk-score",
"streaming",
"anomaly-detection",
"23andme",
"pharmacogenomics",
"variant-calling",
"protein",
"hnsw",
"vector-search",
"napi",
"rust",
"ai",
"wasm"
]
}

View File

@@ -0,0 +1,351 @@
'use strict';
// ── Clinical reference ranges (mirrors REFERENCES in biomarker.rs) ──────────
const BIOMARKER_REFERENCES = Object.freeze([
{ name: 'Total Cholesterol', unit: 'mg/dL', normalLow: 125, normalHigh: 200, criticalLow: 100, criticalHigh: 300, category: 'Lipid' },
{ name: 'LDL', unit: 'mg/dL', normalLow: 50, normalHigh: 100, criticalLow: 25, criticalHigh: 190, category: 'Lipid' },
{ name: 'HDL', unit: 'mg/dL', normalLow: 40, normalHigh: 90, criticalLow: 20, criticalHigh: null, category: 'Lipid' },
{ name: 'Triglycerides', unit: 'mg/dL', normalLow: 35, normalHigh: 150, criticalLow: 20, criticalHigh: 500, category: 'Lipid' },
{ name: 'Fasting Glucose', unit: 'mg/dL', normalLow: 70, normalHigh: 100, criticalLow: 50, criticalHigh: 250, category: 'Metabolic' },
{ name: 'HbA1c', unit: '%', normalLow: 4, normalHigh: 5.7, criticalLow: null, criticalHigh: 9, category: 'Metabolic' },
{ name: 'Homocysteine', unit: 'umol/L', normalLow: 5, normalHigh: 15, criticalLow: null, criticalHigh: 30, category: 'Metabolic' },
{ name: 'Vitamin D', unit: 'ng/mL', normalLow: 30, normalHigh: 80, criticalLow: 10, criticalHigh: 150, category: 'Nutritional' },
{ name: 'CRP', unit: 'mg/L', normalLow: 0, normalHigh: 3, criticalLow: null, criticalHigh: 10, category: 'Inflammatory' },
{ name: 'TSH', unit: 'mIU/L', normalLow: 0.4, normalHigh: 4, criticalLow: 0.1, criticalHigh: 10, category: 'Thyroid' },
{ name: 'Ferritin', unit: 'ng/mL', normalLow: 20, normalHigh: 250, criticalLow: 10, criticalHigh: 1000, category: 'Iron' },
{ name: 'Vitamin B12', unit: 'pg/mL', normalLow: 200, normalHigh: 900, criticalLow: 150, criticalHigh: null, category: 'Nutritional' },
{ name: 'Lp(a)', unit: 'nmol/L', normalLow: 0, normalHigh: 75, criticalLow: null, criticalHigh: 200, category: 'Lipid' },
]);
// ── 20-SNP risk table (mirrors SNPS in biomarker.rs) ────────────────────────
const SNPS = Object.freeze([
{ rsid: 'rs429358', category: 'Neurological', wRef: 0, wHet: 0.4, wAlt: 0.9, homRef: 'TT', het: 'CT', homAlt: 'CC', maf: 0.14 },
{ rsid: 'rs7412', category: 'Neurological', wRef: 0, wHet: -0.15, wAlt: -0.3, homRef: 'CC', het: 'CT', homAlt: 'TT', maf: 0.08 },
{ rsid: 'rs1042522', category: 'Cancer Risk', wRef: 0, wHet: 0.25, wAlt: 0.5, homRef: 'CC', het: 'CG', homAlt: 'GG', maf: 0.40 },
{ rsid: 'rs80357906', category: 'Cancer Risk', wRef: 0, wHet: 0.7, wAlt: 0.95, homRef: 'DD', het: 'DI', homAlt: 'II', maf: 0.003 },
{ rsid: 'rs28897696', category: 'Cancer Risk', wRef: 0, wHet: 0.3, wAlt: 0.6, homRef: 'GG', het: 'AG', homAlt: 'AA', maf: 0.005 },
{ rsid: 'rs11571833', category: 'Cancer Risk', wRef: 0, wHet: 0.20, wAlt: 0.5, homRef: 'AA', het: 'AT', homAlt: 'TT', maf: 0.01 },
{ rsid: 'rs1801133', category: 'Metabolism', wRef: 0, wHet: 0.35, wAlt: 0.7, homRef: 'GG', het: 'AG', homAlt: 'AA', maf: 0.32 },
{ rsid: 'rs1801131', category: 'Metabolism', wRef: 0, wHet: 0.10, wAlt: 0.25, homRef: 'TT', het: 'GT', homAlt: 'GG', maf: 0.30 },
{ rsid: 'rs4680', category: 'Neurological', wRef: 0, wHet: 0.2, wAlt: 0.45, homRef: 'GG', het: 'AG', homAlt: 'AA', maf: 0.50 },
{ rsid: 'rs1799971', category: 'Neurological', wRef: 0, wHet: 0.2, wAlt: 0.4, homRef: 'AA', het: 'AG', homAlt: 'GG', maf: 0.15 },
{ rsid: 'rs762551', category: 'Metabolism', wRef: 0, wHet: 0.15, wAlt: 0.35, homRef: 'AA', het: 'AC', homAlt: 'CC', maf: 0.37 },
{ rsid: 'rs4988235', category: 'Metabolism', wRef: 0, wHet: 0.05, wAlt: 0.15, homRef: 'AA', het: 'AG', homAlt: 'GG', maf: 0.24 },
{ rsid: 'rs53576', category: 'Neurological', wRef: 0, wHet: 0.1, wAlt: 0.25, homRef: 'GG', het: 'AG', homAlt: 'AA', maf: 0.35 },
{ rsid: 'rs6311', category: 'Neurological', wRef: 0, wHet: 0.15, wAlt: 0.3, homRef: 'CC', het: 'CT', homAlt: 'TT', maf: 0.45 },
{ rsid: 'rs1800497', category: 'Neurological', wRef: 0, wHet: 0.25, wAlt: 0.5, homRef: 'GG', het: 'AG', homAlt: 'AA', maf: 0.20 },
{ rsid: 'rs4363657', category: 'Cardiovascular', wRef: 0, wHet: 0.35, wAlt: 0.7, homRef: 'TT', het: 'CT', homAlt: 'CC', maf: 0.15 },
{ rsid: 'rs1800566', category: 'Cancer Risk', wRef: 0, wHet: 0.15, wAlt: 0.30, homRef: 'CC', het: 'CT', homAlt: 'TT', maf: 0.22 },
{ rsid: 'rs10455872', category: 'Cardiovascular', wRef: 0, wHet: 0.40, wAlt: 0.75, homRef: 'AA', het: 'AG', homAlt: 'GG', maf: 0.07 },
{ rsid: 'rs3798220', category: 'Cardiovascular', wRef: 0, wHet: 0.35, wAlt: 0.65, homRef: 'TT', het: 'CT', homAlt: 'CC', maf: 0.02 },
{ rsid: 'rs11591147', category: 'Cardiovascular', wRef: 0, wHet: -0.30, wAlt: -0.55, homRef: 'GG', het: 'GT', homAlt: 'TT', maf: 0.024 },
]);
// ── Gene-gene interactions (mirrors INTERACTIONS in biomarker.rs) ────────────
const INTERACTIONS = Object.freeze([
{ rsidA: 'rs4680', rsidB: 'rs1799971', modifier: 1.4, category: 'Neurological' },
{ rsidA: 'rs1801133', rsidB: 'rs1801131', modifier: 1.3, category: 'Metabolism' },
{ rsidA: 'rs429358', rsidB: 'rs1042522', modifier: 1.2, category: 'Cancer Risk' },
{ rsidA: 'rs80357906',rsidB: 'rs1042522', modifier: 1.5, category: 'Cancer Risk' },
{ rsidA: 'rs1801131', rsidB: 'rs4680', modifier: 1.25, category: 'Neurological' },
{ rsidA: 'rs1800497', rsidB: 'rs4680', modifier: 1.2, category: 'Neurological' },
]);
const CAT_ORDER = ['Cancer Risk', 'Cardiovascular', 'Neurological', 'Metabolism'];
const NUM_ONEHOT_SNPS = 17;
// ── Helpers ──────────────────────────────────────────────────────────────────
function genotypeCode(snp, gt) {
if (gt === snp.homRef) return 0;
if (gt.length === 2 && gt[0] !== gt[1]) return 1;
return 2;
}
function snpWeight(snp, code) {
return code === 0 ? snp.wRef : code === 1 ? snp.wHet : snp.wAlt;
}
// Pre-built rsid -> index lookup (O(1) instead of O(n) findIndex)
const RSID_INDEX = new Map();
for (let i = 0; i < SNPS.length; i++) RSID_INDEX.set(SNPS[i].rsid, i);
// Pre-cache LPA SNP references to avoid repeated iteration
const LPA_SNPS = SNPS.filter(s => s.rsid === 'rs10455872' || s.rsid === 'rs3798220');
function snpIndex(rsid) {
const idx = RSID_INDEX.get(rsid);
return idx !== undefined ? idx : -1;
}
function isNonRef(genotypes, rsid) {
const idx = RSID_INDEX.get(rsid);
if (idx === undefined) return false;
const gt = genotypes.get(rsid);
return gt !== undefined && gt !== SNPS[idx].homRef;
}
function interactionMod(genotypes, ix) {
return (isNonRef(genotypes, ix.rsidA) && isNonRef(genotypes, ix.rsidB)) ? ix.modifier : 1.0;
}
// Pre-compute category metadata (mirrors category_meta() in Rust)
const CATEGORY_META = CAT_ORDER.map(cat => {
let maxPossible = 0;
let expectedCount = 0;
for (const snp of SNPS) {
if (snp.category === cat) {
maxPossible += Math.max(snp.wAlt, 0);
expectedCount++;
}
}
return { name: cat, maxPossible: Math.max(maxPossible, 1), expectedCount };
});
// Mulberry32 PRNG — deterministic, fast, no dependencies
function mulberry32(seed) {
let t = (seed + 0x6D2B79F5) | 0;
return function () {
t = (t + 0x6D2B79F5) | 0;
let z = t ^ (t >>> 15);
z = Math.imul(z | 1, z);
z ^= z + Math.imul(z ^ (z >>> 7), z | 61);
return ((z ^ (z >>> 14)) >>> 0) / 4294967296;
};
}
// ── Simplified MTHFR/pain scoring (mirrors health.rs analysis functions) ────
function analyzeMthfr(genotypes) {
let score = 0;
const gt677 = genotypes.get('rs1801133');
const gt1298 = genotypes.get('rs1801131');
if (gt677) {
const code = genotypeCode(SNPS[6], gt677);
score += code;
}
if (gt1298) {
const code = genotypeCode(SNPS[7], gt1298);
score += code;
}
return { score };
}
function analyzePain(genotypes) {
const gtComt = genotypes.get('rs4680');
const gtOprm1 = genotypes.get('rs1799971');
if (!gtComt || !gtOprm1) return null;
const comtCode = genotypeCode(SNPS[8], gtComt);
const oprm1Code = genotypeCode(SNPS[9], gtOprm1);
return { score: comtCode + oprm1Code };
}
// ── Public API ───────────────────────────────────────────────────────────────
function biomarkerReferences() {
return BIOMARKER_REFERENCES;
}
function zScore(value, ref_) {
const mid = (ref_.normalLow + ref_.normalHigh) / 2;
const halfRange = (ref_.normalHigh - ref_.normalLow) / 2;
if (halfRange === 0) return 0;
return (value - mid) / halfRange;
}
function classifyBiomarker(value, ref_) {
if (ref_.criticalLow !== null && value < ref_.criticalLow) return 'CriticalLow';
if (value < ref_.normalLow) return 'Low';
if (ref_.criticalHigh !== null && value > ref_.criticalHigh) return 'CriticalHigh';
if (value > ref_.normalHigh) return 'High';
return 'Normal';
}
function computeRiskScores(genotypes) {
const catScores = new Map(); // category -> { raw, variants, count }
for (const snp of SNPS) {
const gt = genotypes.get(snp.rsid);
if (gt === undefined) continue;
const code = genotypeCode(snp, gt);
const w = snpWeight(snp, code);
if (!catScores.has(snp.category)) {
catScores.set(snp.category, { raw: 0, variants: [], count: 0 });
}
const entry = catScores.get(snp.category);
entry.raw += w;
entry.count++;
if (code > 0) entry.variants.push(snp.rsid);
}
for (const inter of INTERACTIONS) {
const m = interactionMod(genotypes, inter);
if (m > 1.0 && catScores.has(inter.category)) {
catScores.get(inter.category).raw *= m;
}
}
const categoryScores = {};
for (const cm of CATEGORY_META) {
const entry = catScores.get(cm.name) || { raw: 0, variants: [], count: 0 };
const score = Math.min(Math.max(entry.raw / cm.maxPossible, 0), 1);
const confidence = entry.count > 0 ? Math.min(entry.count / Math.max(cm.expectedCount, 1), 1) : 0;
categoryScores[cm.name] = {
category: cm.name,
score,
confidence,
contributingVariants: entry.variants,
};
}
let ws = 0, cs = 0;
for (const c of Object.values(categoryScores)) {
ws += c.score * c.confidence;
cs += c.confidence;
}
const globalRiskScore = cs > 0 ? ws / cs : 0;
const profile = {
subjectId: '',
timestamp: 0,
categoryScores,
globalRiskScore,
profileVector: null,
biomarkerValues: {},
};
profile.profileVector = encodeProfileVectorWithGenotypes(profile, genotypes);
return profile;
}
function encodeProfileVector(profile) {
return encodeProfileVectorWithGenotypes(profile, new Map());
}
function encodeProfileVectorWithGenotypes(profile, genotypes) {
const v = new Float32Array(64);
// Dims 0..50: one-hot genotype encoding (first 17 SNPs x 3 = 51 dims)
for (let i = 0; i < NUM_ONEHOT_SNPS; i++) {
const snp = SNPS[i];
const gt = genotypes.get(snp.rsid);
const code = gt !== undefined ? genotypeCode(snp, gt) : 0;
v[i * 3 + code] = 1.0;
}
// Dims 51..54: category scores
for (let j = 0; j < CAT_ORDER.length; j++) {
const cs = profile.categoryScores[CAT_ORDER[j]];
v[51 + j] = cs ? cs.score : 0;
}
v[55] = profile.globalRiskScore;
// Dims 56..59: first 4 interaction modifiers
for (let j = 0; j < 4; j++) {
const m = interactionMod(genotypes, INTERACTIONS[j]);
v[56 + j] = m > 1 ? m - 1 : 0;
}
// Dims 60..63: derived clinical scores
v[60] = analyzeMthfr(genotypes).score / 4;
const pain = analyzePain(genotypes);
v[61] = pain ? pain.score / 4 : 0;
const apoeGt = genotypes.get('rs429358');
v[62] = apoeGt !== undefined ? genotypeCode(SNPS[0], apoeGt) / 2 : 0;
// LPA composite: average of rs10455872 + rs3798220 genotype codes (cached)
let lpaSum = 0, lpaCount = 0;
for (const snp of LPA_SNPS) {
const gt = genotypes.get(snp.rsid);
if (gt !== undefined) {
lpaSum += genotypeCode(snp, gt) / 2;
lpaCount++;
}
}
v[63] = lpaCount > 0 ? lpaSum / 2 : 0;
// L2-normalize
let norm = 0;
for (let i = 0; i < 64; i++) norm += v[i] * v[i];
norm = Math.sqrt(norm);
if (norm > 0) for (let i = 0; i < 64; i++) v[i] /= norm;
return v;
}
function randomGenotype(rng, snp) {
const p = snp.maf;
const q = 1 - p;
const r = rng();
if (r < q * q) return snp.homRef;
if (r < q * q + 2 * p * q) return snp.het;
return snp.homAlt;
}
function generateSyntheticPopulation(count, seed) {
const rng = mulberry32(seed);
const pop = [];
for (let i = 0; i < count; i++) {
const genotypes = new Map();
for (const snp of SNPS) {
genotypes.set(snp.rsid, randomGenotype(rng, snp));
}
const profile = computeRiskScores(genotypes);
profile.subjectId = `SYN-${String(i).padStart(6, '0')}`;
profile.timestamp = 1700000000 + i;
const mthfrScore = analyzeMthfr(genotypes).score;
const apoeCode = genotypes.get('rs429358') ? genotypeCode(SNPS[0], genotypes.get('rs429358')) : 0;
const nqo1Idx = RSID_INDEX.get('rs1800566');
const nqo1Code = genotypes.get('rs1800566') ? genotypeCode(SNPS[nqo1Idx], genotypes.get('rs1800566')) : 0;
let lpaRisk = 0;
for (const snp of LPA_SNPS) {
const gt = genotypes.get(snp.rsid);
if (gt) lpaRisk += genotypeCode(snp, gt);
}
const pcsk9Idx = RSID_INDEX.get('rs11591147');
const pcsk9Code = genotypes.get('rs11591147') ? genotypeCode(SNPS[pcsk9Idx], genotypes.get('rs11591147')) : 0;
for (const bref of BIOMARKER_REFERENCES) {
const mid = (bref.normalLow + bref.normalHigh) / 2;
const sd = (bref.normalHigh - bref.normalLow) / 4;
let val = mid + (rng() * 3 - 1.5) * sd;
// Gene->biomarker correlations (mirrors Rust)
const nm = bref.name;
if (nm === 'Homocysteine' && mthfrScore >= 2) val += sd * (mthfrScore - 1);
if ((nm === 'Total Cholesterol' || nm === 'LDL') && apoeCode > 0) val += sd * 0.5 * apoeCode;
if (nm === 'HDL' && apoeCode > 0) val -= sd * 0.3 * apoeCode;
if (nm === 'Triglycerides' && apoeCode > 0) val += sd * 0.4 * apoeCode;
if (nm === 'Vitamin B12' && mthfrScore >= 2) val -= sd * 0.4;
if (nm === 'CRP' && nqo1Code === 2) val += sd * 0.3;
if (nm === 'Lp(a)' && lpaRisk > 0) val += sd * 1.5 * lpaRisk;
if ((nm === 'LDL' || nm === 'Total Cholesterol') && pcsk9Code > 0) val -= sd * 0.6 * pcsk9Code;
val = Math.max(val, bref.criticalLow || 0, 0);
if (bref.criticalHigh !== null) val = Math.min(val, bref.criticalHigh * 1.2);
profile.biomarkerValues[bref.name] = Math.round(val * 10) / 10;
}
pop.push(profile);
}
return pop;
}
module.exports = {
BIOMARKER_REFERENCES,
SNPS,
INTERACTIONS,
CAT_ORDER,
biomarkerReferences,
zScore,
classifyBiomarker,
computeRiskScores,
encodeProfileVector,
generateSyntheticPopulation,
};

View File

@@ -0,0 +1,312 @@
'use strict';
// ── Constants (identical to biomarker_stream.rs) ─────────────────────────────
const EMA_ALPHA = 0.1;
const Z_SCORE_THRESHOLD = 2.5;
const REF_OVERSHOOT = 0.20;
const CUSUM_THRESHOLD = 4.0;
const CUSUM_DRIFT = 0.5;
// ── Biomarker definitions ────────────────────────────────────────────────────
const BIOMARKER_DEFS = Object.freeze([
{ id: 'glucose', low: 70, high: 100 },
{ id: 'cholesterol_total', low: 150, high: 200 },
{ id: 'hdl', low: 40, high: 60 },
{ id: 'ldl', low: 70, high: 130 },
{ id: 'triglycerides', low: 50, high: 150 },
{ id: 'crp', low: 0.1, high: 3.0 },
]);
// ── RingBuffer ───────────────────────────────────────────────────────────────
class RingBuffer {
constructor(capacity) {
if (capacity <= 0) throw new Error('RingBuffer capacity must be > 0');
this._buffer = new Float64Array(capacity);
this._head = 0;
this._len = 0;
this._capacity = capacity;
}
push(item) {
this._buffer[this._head] = item;
this._head = (this._head + 1) % this._capacity;
if (this._len < this._capacity) this._len++;
}
/** Push item and return evicted value (NaN if buffer wasn't full). */
pushPop(item) {
const wasFull = this._len === this._capacity;
const evicted = wasFull ? this._buffer[this._head] : NaN;
this._buffer[this._head] = item;
this._head = (this._head + 1) % this._capacity;
if (!wasFull) this._len++;
return evicted;
}
/** Iterate in insertion order (oldest to newest). */
*[Symbol.iterator]() {
const start = this._len < this._capacity ? 0 : this._head;
for (let i = 0; i < this._len; i++) {
yield this._buffer[(start + i) % this._capacity];
}
}
/** Return values as a plain array (oldest to newest). */
toArray() {
const arr = new Array(this._len);
const start = this._len < this._capacity ? 0 : this._head;
for (let i = 0; i < this._len; i++) {
arr[i] = this._buffer[(start + i) % this._capacity];
}
return arr;
}
get length() { return this._len; }
get capacity() { return this._capacity; }
isFull() { return this._len === this._capacity; }
clear() {
this._head = 0;
this._len = 0;
}
}
// ── Welford's online mean+std (single-pass, mirrors Rust) ────────────────────
function windowMeanStd(buf) {
const n = buf.length;
if (n === 0) return [0, 0];
let mean = 0, m2 = 0, k = 0;
for (const x of buf) {
k++;
const delta = x - mean;
mean += delta / k;
m2 += delta * (x - mean);
}
if (n < 2) return [mean, 0];
return [mean, Math.sqrt(m2 / (n - 1))];
}
// ── Trend slope via simple linear regression (mirrors Rust) ──────────────────
function computeTrendSlope(buf) {
const n = buf.length;
if (n < 2) return 0;
const nf = n;
const xm = (nf - 1) / 2;
let ys = 0, xys = 0, xxs = 0, i = 0;
for (const y of buf) {
ys += y;
xys += i * y;
xxs += i * i;
i++;
}
const ssXy = xys - nf * xm * (ys / nf);
const ssXx = xxs - nf * xm * xm;
return Math.abs(ssXx) < 1e-12 ? 0 : ssXy / ssXx;
}
// ── StreamConfig ─────────────────────────────────────────────────────────────
function defaultStreamConfig() {
return {
baseIntervalMs: 1000,
noiseAmplitude: 0.02,
driftRate: 0.0,
anomalyProbability: 0.02,
anomalyMagnitude: 2.5,
numBiomarkers: 6,
windowSize: 100,
};
}
// ── Mulberry32 PRNG ──────────────────────────────────────────────────────────
function mulberry32(seed) {
let t = (seed + 0x6D2B79F5) | 0;
return function () {
t = (t + 0x6D2B79F5) | 0;
let z = t ^ (t >>> 15);
z = Math.imul(z | 1, z);
z ^= z + Math.imul(z ^ (z >>> 7), z | 61);
return ((z ^ (z >>> 14)) >>> 0) / 4294967296;
};
}
// Box-Muller for normal distribution
function normalSample(rng, mean, stddev) {
const u1 = rng();
const u2 = rng();
return mean + stddev * Math.sqrt(-2 * Math.log(u1 || 1e-12)) * Math.cos(2 * Math.PI * u2);
}
// ── Batch generation (mirrors generate_readings in Rust) ─────────────────────
function generateReadings(config, count, seed) {
const rng = mulberry32(seed);
const active = BIOMARKER_DEFS.slice(0, Math.min(config.numBiomarkers, BIOMARKER_DEFS.length));
const readings = [];
// Pre-compute distributions per biomarker
const dists = active.map(def => {
const range = def.high - def.low;
const mid = (def.low + def.high) / 2;
const sigma = Math.max(config.noiseAmplitude * range, 1e-12);
return { mid, range, sigma };
});
let ts = 0;
for (let step = 0; step < count; step++) {
for (let j = 0; j < active.length; j++) {
const def = active[j];
const { mid, range, sigma } = dists[j];
const drift = config.driftRate * range * step;
const isAnomaly = rng() < config.anomalyProbability;
const effectiveSigma = isAnomaly ? sigma * config.anomalyMagnitude : sigma;
const value = Math.max(normalSample(rng, mid + drift, effectiveSigma), 0);
readings.push({
timestampMs: ts,
biomarkerId: def.id,
value,
referenceLow: def.low,
referenceHigh: def.high,
isAnomaly,
zScore: 0,
});
}
ts += config.baseIntervalMs;
}
return readings;
}
// ── StreamProcessor ──────────────────────────────────────────────────────────
class StreamProcessor {
constructor(config) {
this._config = config || defaultStreamConfig();
this._buffers = new Map();
this._stats = new Map();
this._totalReadings = 0;
this._anomalyCount = 0;
this._anomPerBio = new Map();
this._welford = new Map();
this._startTs = null;
this._lastTs = null;
}
_initBiomarker(id) {
this._buffers.set(id, new RingBuffer(this._config.windowSize));
this._stats.set(id, {
mean: 0, variance: 0, min: Infinity, max: -Infinity,
count: 0, anomalyRate: 0, trendSlope: 0, ema: 0,
cusumPos: 0, cusumNeg: 0, changepointDetected: false,
});
// Incremental Welford state for windowed mean/variance (O(1) per reading)
this._welford.set(id, { n: 0, mean: 0, m2: 0 });
}
processReading(reading) {
const id = reading.biomarkerId;
if (this._startTs === null) this._startTs = reading.timestampMs;
this._lastTs = reading.timestampMs;
if (!this._buffers.has(id)) this._initBiomarker(id);
const buf = this._buffers.get(id);
const evicted = buf.pushPop(reading.value);
this._totalReadings++;
// Incremental windowed Welford: O(1) add + O(1) remove
const w = this._welford.get(id);
const val = reading.value;
if (Number.isNaN(evicted)) {
// Buffer wasn't full — just add
w.n++;
const d1 = val - w.mean;
w.mean += d1 / w.n;
w.m2 += d1 * (val - w.mean);
} else {
// Buffer full — remove evicted, add new (n stays the same)
const oldMean = w.mean;
w.mean += (val - evicted) / w.n;
w.m2 += (val - evicted) * ((val - w.mean) + (evicted - oldMean));
if (w.m2 < 0) w.m2 = 0; // numerical guard
}
const wmean = w.mean;
const wstd = w.n > 1 ? Math.sqrt(w.m2 / (w.n - 1)) : 0;
const z = wstd > 1e-12 ? (val - wmean) / wstd : 0;
const rng = reading.referenceHigh - reading.referenceLow;
const overshoot = REF_OVERSHOOT * rng;
const oor = val < (reading.referenceLow - overshoot) ||
val > (reading.referenceHigh + overshoot);
const isAnomaly = Math.abs(z) > Z_SCORE_THRESHOLD || oor;
if (isAnomaly) {
this._anomalyCount++;
this._anomPerBio.set(id, (this._anomPerBio.get(id) || 0) + 1);
}
const slope = computeTrendSlope(buf);
const bioAnom = this._anomPerBio.get(id) || 0;
const st = this._stats.get(id);
st.count++;
st.mean = wmean;
st.variance = wstd * wstd;
st.trendSlope = slope;
st.anomalyRate = bioAnom / st.count;
if (val < st.min) st.min = val;
if (val > st.max) st.max = val;
st.ema = st.count === 1
? val
: EMA_ALPHA * val + (1 - EMA_ALPHA) * st.ema;
// CUSUM changepoint detection
if (wstd > 1e-12) {
const normDev = (val - wmean) / wstd;
st.cusumPos = Math.max(st.cusumPos + normDev - CUSUM_DRIFT, 0);
st.cusumNeg = Math.max(st.cusumNeg - normDev - CUSUM_DRIFT, 0);
st.changepointDetected = st.cusumPos > CUSUM_THRESHOLD || st.cusumNeg > CUSUM_THRESHOLD;
if (st.changepointDetected) { st.cusumPos = 0; st.cusumNeg = 0; }
}
return { accepted: true, zScore: z, isAnomaly, currentTrend: slope };
}
getStats(biomarkerId) {
return this._stats.get(biomarkerId) || null;
}
summary() {
const elapsed = (this._startTs !== null && this._lastTs !== null && this._lastTs > this._startTs)
? this._lastTs - this._startTs : 1;
const ar = this._totalReadings > 0 ? this._anomalyCount / this._totalReadings : 0;
const statsObj = {};
for (const [k, v] of this._stats) statsObj[k] = { ...v };
return {
totalReadings: this._totalReadings,
anomalyCount: this._anomalyCount,
anomalyRate: ar,
biomarkerStats: statsObj,
throughputReadingsPerSec: this._totalReadings / (elapsed / 1000),
};
}
}
module.exports = {
RingBuffer,
StreamProcessor,
BIOMARKER_DEFS,
EMA_ALPHA,
Z_SCORE_THRESHOLD,
REF_OVERSHOOT,
CUSUM_THRESHOLD,
CUSUM_DRIFT,
defaultStreamConfig,
generateReadings,
};

View File

@@ -0,0 +1,33 @@
# 23andMe raw data file — Scenario: High-risk cardiovascular + MTHFR compound het
# This file is format version: v5
# Below is a subset of data (build 37, GRCh37/hg19)
# rsid chromosome position genotype
rs429358 19 45411941 CT
rs7412 19 45412079 CC
rs1042522 17 7579472 CG
rs80357906 17 41246537 DD
rs28897696 17 41244999 GG
rs11571833 13 32972626 AA
rs1801133 1 11856378 AA
rs1801131 1 11854476 GT
rs4680 22 19951271 AG
rs1799971 6 154360797 AG
rs762551 15 75041917 AC
rs4988235 2 136608646 AG
rs53576 3 8804371 AG
rs6311 13 47471478 CT
rs1800497 11 113270828 AG
rs4363657 12 21331549 CT
rs1800566 16 69745145 CT
rs10455872 6 161010118 AG
rs3798220 6 160961137 CT
rs11591147 1 55505647 GG
rs3892097 22 42524947 CT
rs35742686 22 42523791 DD
rs5030655 22 42522393 DD
rs1065852 22 42526694 CT
rs28371725 22 42525772 TT
rs28371706 22 42523610 CC
rs4244285 10 96541616 AG
rs4986893 10 96540410 GG
rs12248560 10 96521657 CT

View File

@@ -0,0 +1,33 @@
# 23andMe raw data file — Scenario: Low-risk baseline (all reference genotypes)
# This file is format version: v5
# Below is a subset of data (build 38, GRCh38/hg38)
# rsid chromosome position genotype
rs429358 19 45411941 TT
rs7412 19 45412079 CC
rs1042522 17 7579472 CC
rs80357906 17 41246537 DD
rs28897696 17 41244999 GG
rs11571833 13 32972626 AA
rs1801133 1 11856378 GG
rs1801131 1 11854476 TT
rs4680 22 19951271 GG
rs1799971 6 154360797 AA
rs762551 15 75041917 AA
rs4988235 2 136608646 AA
rs53576 3 8804371 GG
rs6311 13 47471478 CC
rs1800497 11 113270828 GG
rs4363657 12 21331549 TT
rs1800566 16 69745145 CC
rs10455872 6 161010118 AA
rs3798220 6 160961137 TT
rs11591147 1 55505647 GG
rs3892097 22 42524947 CC
rs35742686 22 42523791 DD
rs5030655 22 42522393 DD
rs1065852 22 42526694 CC
rs28371725 22 42525772 CC
rs28371706 22 42523610 CC
rs4244285 10 96541616 GG
rs4986893 10 96540410 GG
rs12248560 10 96521657 CC

View File

@@ -0,0 +1,33 @@
# 23andMe raw data file — Scenario: APOE e4/e4 + BRCA1 carrier + NQO1 null
# This file is format version: v5
# Below is a subset of data (build 37, GRCh37/hg19)
# rsid chromosome position genotype
rs429358 19 45411941 CC
rs7412 19 45412079 CC
rs1042522 17 7579472 GG
rs80357906 17 41246537 DI
rs28897696 17 41244999 AG
rs11571833 13 32972626 AT
rs1801133 1 11856378 AG
rs1801131 1 11854476 TT
rs4680 22 19951271 AA
rs1799971 6 154360797 GG
rs762551 15 75041917 CC
rs4988235 2 136608646 GG
rs53576 3 8804371 AA
rs6311 13 47471478 TT
rs1800497 11 113270828 AA
rs4363657 12 21331549 CC
rs1800566 16 69745145 TT
rs10455872 6 161010118 GG
rs3798220 6 160961137 CC
rs11591147 1 55505647 GG
rs3892097 22 42524947 CC
rs35742686 22 42523791 DD
rs5030655 22 42522393 DD
rs1065852 22 42526694 CC
rs28371725 22 42525772 CC
rs28371706 22 42523610 CC
rs4244285 10 96541616 GG
rs4986893 10 96540410 GG
rs12248560 10 96521657 CC

View File

@@ -0,0 +1,33 @@
# 23andMe raw data file — Scenario: PCSK9 protective + minimal risk
# This file is format version: v5
# Below is a subset of data (build 37, GRCh37/hg19)
# rsid chromosome position genotype
rs429358 19 45411941 TT
rs7412 19 45412079 CT
rs1042522 17 7579472 CC
rs80357906 17 41246537 DD
rs28897696 17 41244999 GG
rs11571833 13 32972626 AA
rs1801133 1 11856378 GG
rs1801131 1 11854476 TT
rs4680 22 19951271 GG
rs1799971 6 154360797 AA
rs762551 15 75041917 AA
rs4988235 2 136608646 AA
rs53576 3 8804371 GG
rs6311 13 47471478 CC
rs1800497 11 113270828 GG
rs4363657 12 21331549 TT
rs1800566 16 69745145 CC
rs10455872 6 161010118 AA
rs3798220 6 160961137 TT
rs11591147 1 55505647 GT
rs3892097 22 42524947 CC
rs35742686 22 42523791 DD
rs5030655 22 42522393 DD
rs1065852 22 42526694 CC
rs28371725 22 42525772 CC
rs28371706 22 42523610 CC
rs4244285 10 96541616 GG
rs4986893 10 96540410 GG
rs12248560 10 96521657 CC

View File

@@ -0,0 +1,457 @@
'use strict';
const {
biomarkerReferences, zScore, classifyBiomarker,
computeRiskScores, encodeProfileVector, generateSyntheticPopulation,
SNPS, INTERACTIONS, CAT_ORDER,
} = require('../src/biomarker');
const {
RingBuffer, StreamProcessor, generateReadings, defaultStreamConfig,
Z_SCORE_THRESHOLD,
} = require('../src/stream');
// ── Test harness ─────────────────────────────────────────────────────────────
let passed = 0, failed = 0, benchResults = [];
function assert(cond, msg) {
if (!cond) throw new Error(`Assertion failed: ${msg}`);
}
function assertClose(a, b, eps, msg) {
if (Math.abs(a - b) > eps) throw new Error(`${msg}: ${a} != ${b} (eps=${eps})`);
}
function test(name, fn) {
try {
fn();
passed++;
process.stdout.write(` PASS ${name}\n`);
} catch (e) {
failed++;
process.stdout.write(` FAIL ${name}: ${e.message}\n`);
}
}
function bench(name, fn, iterations) {
// Warmup
for (let i = 0; i < Math.min(iterations, 1000); i++) fn();
const start = performance.now();
for (let i = 0; i < iterations; i++) fn();
const elapsed = performance.now() - start;
const perOp = (elapsed / iterations * 1000).toFixed(2);
benchResults.push({ name, perOp: `${perOp} us`, total: `${elapsed.toFixed(1)} ms`, iterations });
process.stdout.write(` BENCH ${name}: ${perOp} us/op (${iterations} iters, ${elapsed.toFixed(1)} ms)\n`);
}
// ── Helpers ──────────────────────────────────────────────────────────────────
function fullHomRef() {
const gts = new Map();
for (const snp of SNPS) gts.set(snp.rsid, snp.homRef);
return gts;
}
function reading(ts, id, val, lo, hi) {
return { timestampMs: ts, biomarkerId: id, value: val, referenceLow: lo, referenceHigh: hi, isAnomaly: false, zScore: 0 };
}
function glucose(ts, val) { return reading(ts, 'glucose', val, 70, 100); }
// ═════════════════════════════════════════════════════════════════════════════
// Biomarker Reference Tests
// ═════════════════════════════════════════════════════════════════════════════
process.stdout.write('\n--- Biomarker References ---\n');
test('biomarker_references_count', () => {
assert(biomarkerReferences().length === 13, `expected 13, got ${biomarkerReferences().length}`);
});
test('z_score_midpoint_is_zero', () => {
const ref = biomarkerReferences()[0]; // Total Cholesterol
const mid = (ref.normalLow + ref.normalHigh) / 2;
assertClose(zScore(mid, ref), 0, 1e-10, 'midpoint z-score');
});
test('z_score_high_bound_is_one', () => {
const ref = biomarkerReferences()[0];
assertClose(zScore(ref.normalHigh, ref), 1.0, 1e-10, 'high-bound z-score');
});
// ═════════════════════════════════════════════════════════════════════════════
// Classification Tests
// ═════════════════════════════════════════════════════════════════════════════
process.stdout.write('\n--- Classification ---\n');
test('classify_normal', () => {
const ref = biomarkerReferences()[0]; // 125-200
assert(classifyBiomarker(150, ref) === 'Normal', 'expected Normal');
});
test('classify_high', () => {
const ref = biomarkerReferences()[0]; // normalHigh=200, criticalHigh=300
assert(classifyBiomarker(250, ref) === 'High', 'expected High');
});
test('classify_critical_high', () => {
const ref = biomarkerReferences()[0]; // criticalHigh=300
assert(classifyBiomarker(350, ref) === 'CriticalHigh', 'expected CriticalHigh');
});
test('classify_low', () => {
const ref = biomarkerReferences()[0]; // normalLow=125, criticalLow=100
assert(classifyBiomarker(110, ref) === 'Low', 'expected Low');
});
test('classify_critical_low', () => {
const ref = biomarkerReferences()[0]; // criticalLow=100
assert(classifyBiomarker(90, ref) === 'CriticalLow', 'expected CriticalLow');
});
// ═════════════════════════════════════════════════════════════════════════════
// Risk Scoring Tests
// ═════════════════════════════════════════════════════════════════════════════
process.stdout.write('\n--- Risk Scoring ---\n');
test('all_hom_ref_low_risk', () => {
const gts = fullHomRef();
const profile = computeRiskScores(gts);
assert(profile.globalRiskScore < 0.15, `hom-ref should be low risk, got ${profile.globalRiskScore}`);
});
test('high_cancer_risk', () => {
const gts = fullHomRef();
gts.set('rs80357906', 'DI');
gts.set('rs1042522', 'GG');
gts.set('rs11571833', 'TT');
const profile = computeRiskScores(gts);
const cancer = profile.categoryScores['Cancer Risk'];
assert(cancer.score > 0.3, `should have elevated cancer risk, got ${cancer.score}`);
});
test('interaction_comt_oprm1', () => {
const gts = fullHomRef();
gts.set('rs4680', 'AA');
gts.set('rs1799971', 'GG');
const withInteraction = computeRiskScores(gts);
const neuroInter = withInteraction.categoryScores['Neurological'].score;
const gts2 = fullHomRef();
gts2.set('rs4680', 'AA');
const withoutFull = computeRiskScores(gts2);
const neuroSingle = withoutFull.categoryScores['Neurological'].score;
assert(neuroInter > neuroSingle, `interaction should amplify risk: ${neuroInter} > ${neuroSingle}`);
});
test('interaction_brca1_tp53', () => {
const gts = fullHomRef();
gts.set('rs80357906', 'DI');
gts.set('rs1042522', 'GG');
const profile = computeRiskScores(gts);
const cancer = profile.categoryScores['Cancer Risk'];
assert(cancer.contributingVariants.includes('rs80357906'), 'missing rs80357906');
assert(cancer.contributingVariants.includes('rs1042522'), 'missing rs1042522');
});
// ═════════════════════════════════════════════════════════════════════════════
// Profile Vector Tests
// ═════════════════════════════════════════════════════════════════════════════
process.stdout.write('\n--- Profile Vectors ---\n');
test('vector_dimension_is_64', () => {
const gts = fullHomRef();
const profile = computeRiskScores(gts);
assert(profile.profileVector.length === 64, `expected 64, got ${profile.profileVector.length}`);
});
test('vector_is_l2_normalized', () => {
const gts = fullHomRef();
gts.set('rs4680', 'AG');
gts.set('rs1799971', 'AG');
const profile = computeRiskScores(gts);
let norm = 0;
for (let i = 0; i < 64; i++) norm += profile.profileVector[i] ** 2;
norm = Math.sqrt(norm);
assertClose(norm, 1.0, 1e-4, 'L2 norm');
});
test('vector_deterministic', () => {
const gts = fullHomRef();
gts.set('rs1801133', 'AG');
const a = computeRiskScores(gts);
const b = computeRiskScores(gts);
for (let i = 0; i < 64; i++) {
assertClose(a.profileVector[i], b.profileVector[i], 1e-10, `dim ${i}`);
}
});
// ═════════════════════════════════════════════════════════════════════════════
// Population Generation Tests
// ═════════════════════════════════════════════════════════════════════════════
process.stdout.write('\n--- Population Generation ---\n');
test('population_correct_count', () => {
const pop = generateSyntheticPopulation(50, 42);
assert(pop.length === 50, `expected 50, got ${pop.length}`);
for (const p of pop) {
assert(p.profileVector.length === 64, `expected 64-dim vector`);
assert(Object.keys(p.biomarkerValues).length > 0, 'should have biomarker values');
assert(p.globalRiskScore >= 0 && p.globalRiskScore <= 1, 'risk in [0,1]');
}
});
test('population_deterministic', () => {
const a = generateSyntheticPopulation(10, 99);
const b = generateSyntheticPopulation(10, 99);
for (let i = 0; i < 10; i++) {
assert(a[i].subjectId === b[i].subjectId, 'subject IDs must match');
assertClose(a[i].globalRiskScore, b[i].globalRiskScore, 1e-10, `risk score ${i}`);
}
});
test('mthfr_elevates_homocysteine', () => {
const pop = generateSyntheticPopulation(200, 7);
const high = [], low = [];
for (const p of pop) {
const hcy = p.biomarkerValues['Homocysteine'] || 0;
const metaScore = p.categoryScores['Metabolism'] ? p.categoryScores['Metabolism'].score : 0;
if (metaScore > 0.3) high.push(hcy); else low.push(hcy);
}
if (high.length > 0 && low.length > 0) {
const avgHigh = high.reduce((a, b) => a + b, 0) / high.length;
const avgLow = low.reduce((a, b) => a + b, 0) / low.length;
assert(avgHigh > avgLow, `MTHFR should elevate homocysteine: high=${avgHigh}, low=${avgLow}`);
}
});
// ═════════════════════════════════════════════════════════════════════════════
// RingBuffer Tests
// ═════════════════════════════════════════════════════════════════════════════
process.stdout.write('\n--- RingBuffer ---\n');
test('ring_buffer_push_iter_len', () => {
const rb = new RingBuffer(4);
for (const v of [10, 20, 30]) rb.push(v);
const arr = rb.toArray();
assert(arr.length === 3 && arr[0] === 10 && arr[1] === 20 && arr[2] === 30, 'push/iter');
assert(rb.length === 3, 'length');
assert(!rb.isFull(), 'not full');
});
test('ring_buffer_overflow_keeps_newest', () => {
const rb = new RingBuffer(3);
for (let v = 1; v <= 4; v++) rb.push(v);
assert(rb.isFull(), 'should be full');
const arr = rb.toArray();
assert(arr[0] === 2 && arr[1] === 3 && arr[2] === 4, `got [${arr}]`);
});
test('ring_buffer_capacity_one', () => {
const rb = new RingBuffer(1);
rb.push(42); rb.push(99);
const arr = rb.toArray();
assert(arr.length === 1 && arr[0] === 99, `got [${arr}]`);
});
test('ring_buffer_clear_resets', () => {
const rb = new RingBuffer(3);
rb.push(1); rb.push(2); rb.clear();
assert(rb.length === 0, 'length after clear');
assert(!rb.isFull(), 'not full after clear');
assert(rb.toArray().length === 0, 'empty after clear');
});
// ═════════════════════════════════════════════════════════════════════════════
// Stream Processor Tests
// ═════════════════════════════════════════════════════════════════════════════
process.stdout.write('\n--- Stream Processor ---\n');
test('processor_computes_stats', () => {
const cfg = { ...defaultStreamConfig(), windowSize: 10 };
const p = new StreamProcessor(cfg);
const readings = generateReadings(cfg, 20, 55);
for (const r of readings) p.processReading(r);
const s = p.getStats('glucose');
assert(s !== null, 'should have glucose stats');
assert(s.count > 0 && s.mean > 0 && s.min <= s.max, 'valid stats');
});
test('processor_summary_totals', () => {
const cfg = defaultStreamConfig();
const p = new StreamProcessor(cfg);
const readings = generateReadings(cfg, 30, 77);
for (const r of readings) p.processReading(r);
const s = p.summary();
assert(s.totalReadings === 30 * cfg.numBiomarkers, `expected ${30 * cfg.numBiomarkers}, got ${s.totalReadings}`);
assert(s.anomalyRate >= 0 && s.anomalyRate <= 1, 'anomaly rate in [0,1]');
});
test('processor_throughput_positive', () => {
const cfg = defaultStreamConfig();
const p = new StreamProcessor(cfg);
const readings = generateReadings(cfg, 100, 88);
for (const r of readings) p.processReading(r);
const s = p.summary();
assert(s.throughputReadingsPerSec > 0, 'throughput should be positive');
});
// ═════════════════════════════════════════════════════════════════════════════
// Anomaly Detection Tests
// ═════════════════════════════════════════════════════════════════════════════
process.stdout.write('\n--- Anomaly Detection ---\n');
test('detects_z_score_anomaly', () => {
const p = new StreamProcessor({ ...defaultStreamConfig(), windowSize: 20 });
for (let i = 0; i < 20; i++) p.processReading(glucose(i * 1000, 85));
const r = p.processReading(glucose(20000, 300));
assert(r.isAnomaly, 'should detect anomaly');
assert(Math.abs(r.zScore) > Z_SCORE_THRESHOLD, `z-score ${r.zScore} should exceed threshold`);
});
test('detects_out_of_range_anomaly', () => {
const p = new StreamProcessor({ ...defaultStreamConfig(), windowSize: 5 });
for (const [i, v] of [80, 82, 78, 84, 81].entries()) {
p.processReading(glucose(i * 1000, v));
}
// 140 >> ref_high(100) + 20%*range(30)=106
const r = p.processReading(glucose(5000, 140));
assert(r.isAnomaly, 'should detect out-of-range anomaly');
});
test('zero_anomaly_for_constant_stream', () => {
const p = new StreamProcessor({ ...defaultStreamConfig(), windowSize: 50 });
for (let i = 0; i < 10; i++) p.processReading(reading(i * 1000, 'crp', 1.5, 0.1, 3));
const s = p.getStats('crp');
assert(Math.abs(s.anomalyRate) < 1e-9, `expected zero anomaly rate, got ${s.anomalyRate}`);
});
// ═════════════════════════════════════════════════════════════════════════════
// Trend Detection Tests
// ═════════════════════════════════════════════════════════════════════════════
process.stdout.write('\n--- Trend Detection ---\n');
test('positive_trend_for_increasing', () => {
const p = new StreamProcessor({ ...defaultStreamConfig(), windowSize: 20 });
let r;
for (let i = 0; i < 20; i++) r = p.processReading(glucose(i * 1000, 70 + i));
assert(r.currentTrend > 0, `expected positive trend, got ${r.currentTrend}`);
});
test('negative_trend_for_decreasing', () => {
const p = new StreamProcessor({ ...defaultStreamConfig(), windowSize: 20 });
let r;
for (let i = 0; i < 20; i++) r = p.processReading(reading(i * 1000, 'hdl', 60 - i * 0.5, 40, 60));
assert(r.currentTrend < 0, `expected negative trend, got ${r.currentTrend}`);
});
test('exact_slope_for_linear_series', () => {
const p = new StreamProcessor({ ...defaultStreamConfig(), windowSize: 10 });
for (let i = 0; i < 10; i++) {
p.processReading(reading(i * 1000, 'ldl', 100 + i * 3, 70, 130));
}
assertClose(p.getStats('ldl').trendSlope, 3.0, 1e-9, 'slope');
});
// ═════════════════════════════════════════════════════════════════════════════
// Z-score / EMA Tests
// ═════════════════════════════════════════════════════════════════════════════
process.stdout.write('\n--- Z-Score / EMA ---\n');
test('z_score_small_for_near_mean', () => {
const p = new StreamProcessor({ ...defaultStreamConfig(), windowSize: 10 });
for (const [i, v] of [80, 82, 78, 84, 76, 86, 81, 79, 83].entries()) {
p.processReading(glucose(i * 1000, v));
}
const mean = p.getStats('glucose').mean;
const r = p.processReading(glucose(9000, mean));
assert(Math.abs(r.zScore) < 1, `z-score for mean value should be small, got ${r.zScore}`);
});
test('ema_converges_to_constant', () => {
const p = new StreamProcessor({ ...defaultStreamConfig(), windowSize: 50 });
for (let i = 0; i < 50; i++) p.processReading(reading(i * 1000, 'crp', 2.0, 0.1, 3));
assertClose(p.getStats('crp').ema, 2.0, 1e-6, 'EMA convergence');
});
// ═════════════════════════════════════════════════════════════════════════════
// Batch Generation Tests
// ═════════════════════════════════════════════════════════════════════════════
process.stdout.write('\n--- Batch Generation ---\n');
test('generate_correct_count_and_ids', () => {
const cfg = defaultStreamConfig();
const readings = generateReadings(cfg, 50, 42);
assert(readings.length === 50 * cfg.numBiomarkers, `expected ${50 * cfg.numBiomarkers}, got ${readings.length}`);
const validIds = new Set(['glucose', 'cholesterol_total', 'hdl', 'ldl', 'triglycerides', 'crp']);
for (const r of readings) assert(validIds.has(r.biomarkerId), `invalid id: ${r.biomarkerId}`);
});
test('generated_values_non_negative', () => {
const readings = generateReadings(defaultStreamConfig(), 100, 999);
for (const r of readings) assert(r.value >= 0, `negative value: ${r.value}`);
});
// ═════════════════════════════════════════════════════════════════════════════
// Benchmarks
// ═════════════════════════════════════════════════════════════════════════════
process.stdout.write('\n--- Benchmarks ---\n');
const benchGts = fullHomRef();
benchGts.set('rs4680', 'AG');
benchGts.set('rs1801133', 'AA');
bench('computeRiskScores (20 SNPs)', () => {
computeRiskScores(benchGts);
}, 10000);
bench('encodeProfileVector (64-dim)', () => {
const p = computeRiskScores(benchGts);
encodeProfileVector(p);
}, 10000);
bench('StreamProcessor.processReading', () => {
const p = new StreamProcessor({ ...defaultStreamConfig(), windowSize: 100 });
const r = glucose(0, 85);
for (let i = 0; i < 100; i++) p.processReading(r);
}, 1000);
bench('generateSyntheticPopulation(100)', () => {
generateSyntheticPopulation(100, 42);
}, 100);
bench('RingBuffer push+iter (100 items)', () => {
const rb = new RingBuffer(100);
for (let i = 0; i < 100; i++) rb.push(i);
let s = 0;
for (const v of rb) s += v;
}, 10000);
// ═════════════════════════════════════════════════════════════════════════════
// Summary
// ═════════════════════════════════════════════════════════════════════════════
process.stdout.write(`\n${'='.repeat(60)}\n`);
process.stdout.write(`Results: ${passed} passed, ${failed} failed, ${passed + failed} total\n`);
if (benchResults.length > 0) {
process.stdout.write('\nBenchmark Summary:\n');
for (const b of benchResults) {
process.stdout.write(` ${b.name}: ${b.perOp}/op\n`);
}
}
process.stdout.write(`${'='.repeat(60)}\n`);
process.exit(failed > 0 ? 1 : 0);

View File

@@ -0,0 +1,559 @@
'use strict';
const fs = require('fs');
const path = require('path');
// Import from index.js (the package entry point) to test the full re-export chain
const rvdna = require('../index.js');
// ── Test harness ─────────────────────────────────────────────────────────────
let passed = 0, failed = 0, benchResults = [];
function assert(cond, msg) {
if (!cond) throw new Error(`Assertion failed: ${msg}`);
}
function assertClose(a, b, eps, msg) {
if (Math.abs(a - b) > eps) throw new Error(`${msg}: ${a} != ${b} (eps=${eps})`);
}
function assertGt(a, b, msg) {
if (!(a > b)) throw new Error(`${msg}: expected ${a} > ${b}`);
}
function assertLt(a, b, msg) {
if (!(a < b)) throw new Error(`${msg}: expected ${a} < ${b}`);
}
function test(name, fn) {
try {
fn();
passed++;
process.stdout.write(` PASS ${name}\n`);
} catch (e) {
failed++;
process.stdout.write(` FAIL ${name}: ${e.message}\n`);
}
}
function bench(name, fn, iterations) {
for (let i = 0; i < Math.min(iterations, 1000); i++) fn();
const start = performance.now();
for (let i = 0; i < iterations; i++) fn();
const elapsed = performance.now() - start;
const perOp = (elapsed / iterations * 1000).toFixed(2);
benchResults.push({ name, perOp: `${perOp} us`, total: `${elapsed.toFixed(1)} ms`, iterations });
process.stdout.write(` BENCH ${name}: ${perOp} us/op (${iterations} iters, ${elapsed.toFixed(1)} ms)\n`);
}
// ── Fixture loading ──────────────────────────────────────────────────────────
const FIXTURES = path.join(__dirname, 'fixtures');
function loadFixture(name) {
return fs.readFileSync(path.join(FIXTURES, name), 'utf8');
}
function parseFixtureToGenotypes(name) {
const text = loadFixture(name);
const data = rvdna.parse23andMe(text);
const gts = new Map();
for (const [rsid, snp] of data.snps) gts.set(rsid, snp.genotype);
return { data, gts };
}
// ═════════════════════════════════════════════════════════════════════════════
// SECTION 1: End-to-End Pipeline (parse 23andMe → biomarker scoring → stream)
// ═════════════════════════════════════════════════════════════════════════════
process.stdout.write('\n--- End-to-End Pipeline ---\n');
test('e2e_high_risk_cardio_pipeline', () => {
const { data, gts } = parseFixtureToGenotypes('sample-high-risk-cardio.23andme.txt');
// Stage 1: 23andMe parsing
assert(data.totalMarkers === 29, `expected 29 markers, got ${data.totalMarkers}`);
assert(data.build === 'GRCh37', `expected GRCh37, got ${data.build}`);
assert(data.noCalls === 0, 'no no-calls expected');
// Stage 2: Genotyping analysis
const analysis = rvdna.analyze23andMe(loadFixture('sample-high-risk-cardio.23andme.txt'));
assert(analysis.cyp2d6.phenotype !== undefined, 'CYP2D6 phenotype should be defined');
assert(analysis.cyp2c19.phenotype !== undefined, 'CYP2C19 phenotype should be defined');
// Stage 3: Biomarker risk scoring
const profile = rvdna.computeRiskScores(gts);
assert(profile.profileVector.length === 64, 'profile vector should be 64-dim');
assert(profile.globalRiskScore >= 0 && profile.globalRiskScore <= 1, 'risk in [0,1]');
// High-risk cardiac: MTHFR 677TT + LPA het + SLCO1B1 het → elevated metabolism + cardiovascular
const metab = profile.categoryScores['Metabolism'];
assertGt(metab.score, 0.3, 'MTHFR 677TT should elevate metabolism risk');
assertGt(metab.confidence, 0.5, 'metabolism confidence should be substantial');
const cardio = profile.categoryScores['Cardiovascular'];
assert(cardio.contributingVariants.includes('rs10455872'), 'LPA variant should contribute');
assert(cardio.contributingVariants.includes('rs4363657'), 'SLCO1B1 variant should contribute');
assert(cardio.contributingVariants.includes('rs3798220'), 'LPA rs3798220 should contribute');
// Stage 4: Feed synthetic biomarker readings through streaming processor
const cfg = rvdna.defaultStreamConfig();
const processor = new rvdna.StreamProcessor(cfg);
const readings = rvdna.generateReadings(cfg, 50, 42);
for (const r of readings) processor.processReading(r);
const summary = processor.summary();
assert(summary.totalReadings > 0, 'should have processed readings');
assert(summary.anomalyRate >= 0, 'anomaly rate should be valid');
});
test('e2e_low_risk_baseline_pipeline', () => {
const { data, gts } = parseFixtureToGenotypes('sample-low-risk-baseline.23andme.txt');
// Parse
assert(data.totalMarkers === 29, `expected 29 markers`);
assert(data.build === 'GRCh38', `expected GRCh38, got ${data.build}`);
// Score
const profile = rvdna.computeRiskScores(gts);
assertLt(profile.globalRiskScore, 0.15, 'all-ref should be very low risk');
// All categories should be near-zero
for (const [cat, cs] of Object.entries(profile.categoryScores)) {
assertLt(cs.score, 0.05, `${cat} should be near-zero for all-ref`);
}
// APOE should be e3/e3
const apoe = rvdna.determineApoe(gts);
assert(apoe.genotype.includes('e3/e3'), `expected e3/e3, got ${apoe.genotype}`);
});
// ═════════════════════════════════════════════════════════════════════════════
// SECTION 2: Clinical Scenario Tests
// ═════════════════════════════════════════════════════════════════════════════
process.stdout.write('\n--- Clinical Scenarios ---\n');
test('scenario_apoe_e4e4_brca1_carrier', () => {
const { gts } = parseFixtureToGenotypes('sample-multi-risk.23andme.txt');
const profile = rvdna.computeRiskScores(gts);
// APOE e4/e4 → high neurological risk
const neuro = profile.categoryScores['Neurological'];
assertGt(neuro.score, 0.5, `APOE e4/e4 + COMT Met/Met should push neuro >0.5, got ${neuro.score}`);
assert(neuro.contributingVariants.includes('rs429358'), 'APOE should contribute');
assert(neuro.contributingVariants.includes('rs4680'), 'COMT should contribute');
// BRCA1 carrier + TP53 variant → elevated cancer risk with interaction
const cancer = profile.categoryScores['Cancer Risk'];
assertGt(cancer.score, 0.4, `BRCA1 carrier + TP53 should push cancer >0.4, got ${cancer.score}`);
assert(cancer.contributingVariants.includes('rs80357906'), 'BRCA1 should contribute');
assert(cancer.contributingVariants.includes('rs1042522'), 'TP53 should contribute');
// Cardiovascular should be elevated from SLCO1B1 + LPA
const cardio = profile.categoryScores['Cardiovascular'];
assertGt(cardio.score, 0.3, `SLCO1B1 + LPA should push cardio >0.3, got ${cardio.score}`);
// NQO1 null (TT) should contribute to cancer
assert(cancer.contributingVariants.includes('rs1800566'), 'NQO1 should contribute');
// Global risk should be substantial
assertGt(profile.globalRiskScore, 0.4, `multi-risk global should be >0.4, got ${profile.globalRiskScore}`);
// APOE determination
const apoe = rvdna.determineApoe(gts);
assert(apoe.genotype.includes('e4/e4'), `expected e4/e4, got ${apoe.genotype}`);
});
test('scenario_pcsk9_protective', () => {
const { gts } = parseFixtureToGenotypes('sample-pcsk9-protective.23andme.txt');
const profile = rvdna.computeRiskScores(gts);
// PCSK9 R46L het (rs11591147 GT) → negative cardiovascular weight (protective)
const cardio = profile.categoryScores['Cardiovascular'];
// With only PCSK9 protective allele and no risk alleles, cardio score should be very low
assertLt(cardio.score, 0.05, `PCSK9 protective should keep cardio very low, got ${cardio.score}`);
// APOE e2/e3 protective
const apoe = rvdna.determineApoe(gts);
assert(apoe.genotype.includes('e2/e3'), `expected e2/e3, got ${apoe.genotype}`);
});
test('scenario_mthfr_compound_heterozygote', () => {
const { gts } = parseFixtureToGenotypes('sample-high-risk-cardio.23andme.txt');
// This file has rs1801133=AA (677TT hom) + rs1801131=GT (1298AC het) → compound score 3
const profile = rvdna.computeRiskScores(gts);
const metab = profile.categoryScores['Metabolism'];
// MTHFR compound should push metabolism risk up
assertGt(metab.score, 0.3, `MTHFR compound should elevate metabolism, got ${metab.score}`);
assert(metab.contributingVariants.includes('rs1801133'), 'rs1801133 (C677T) should contribute');
assert(metab.contributingVariants.includes('rs1801131'), 'rs1801131 (A1298C) should contribute');
// MTHFR interaction with MTHFR should amplify
// The interaction rs1801133×rs1801131 has modifier 1.3
});
test('scenario_comt_oprm1_pain_interaction', () => {
// Use controlled genotypes that don't saturate the category at 1.0
const gts = new Map();
for (const snp of rvdna.SNPS) gts.set(snp.rsid, snp.homRef);
gts.set('rs4680', 'AA'); // COMT Met/Met
gts.set('rs1799971', 'GG'); // OPRM1 Asp/Asp
const profile = rvdna.computeRiskScores(gts);
const neuro = profile.categoryScores['Neurological'];
// Without OPRM1 variant → no interaction modifier
const gts2 = new Map(gts);
gts2.set('rs1799971', 'AA'); // reference
const profile2 = rvdna.computeRiskScores(gts2);
const neuro2 = profile2.categoryScores['Neurological'];
assertGt(neuro.score, neuro2.score, 'COMT×OPRM1 interaction should amplify neurological risk');
});
test('scenario_drd2_comt_interaction', () => {
// Use controlled genotypes that don't saturate the category at 1.0
const gts = new Map();
for (const snp of rvdna.SNPS) gts.set(snp.rsid, snp.homRef);
gts.set('rs1800497', 'AA'); // DRD2 A1/A1
gts.set('rs4680', 'AA'); // COMT Met/Met
const profile = rvdna.computeRiskScores(gts);
// Without DRD2 variant → no DRD2×COMT interaction
const gts2 = new Map(gts);
gts2.set('rs1800497', 'GG'); // reference
const profile2 = rvdna.computeRiskScores(gts2);
assertGt(
profile.categoryScores['Neurological'].score,
profile2.categoryScores['Neurological'].score,
'DRD2×COMT interaction should amplify'
);
});
// ═════════════════════════════════════════════════════════════════════════════
// SECTION 3: Cross-Validation (JS matches Rust expectations)
// ═════════════════════════════════════════════════════════════════════════════
process.stdout.write('\n--- Cross-Validation (JS ↔ Rust parity) ---\n');
test('parity_reference_count_matches_rust', () => {
assert(rvdna.BIOMARKER_REFERENCES.length === 13, 'should have 13 references (matches Rust)');
assert(rvdna.SNPS.length === 20, 'should have 20 SNPs (matches Rust)');
assert(rvdna.INTERACTIONS.length === 6, 'should have 6 interactions (matches Rust)');
assert(rvdna.CAT_ORDER.length === 4, 'should have 4 categories (matches Rust)');
});
test('parity_snp_table_exact_match', () => {
// Verify first and last SNP match Rust exactly
const first = rvdna.SNPS[0];
assert(first.rsid === 'rs429358', 'first SNP rsid');
assertClose(first.wHet, 0.4, 1e-10, 'first SNP wHet');
assertClose(first.wAlt, 0.9, 1e-10, 'first SNP wAlt');
assert(first.homRef === 'TT', 'first SNP homRef');
assert(first.category === 'Neurological', 'first SNP category');
const last = rvdna.SNPS[19];
assert(last.rsid === 'rs11591147', 'last SNP rsid');
assertClose(last.wHet, -0.30, 1e-10, 'PCSK9 wHet (negative = protective)');
assertClose(last.wAlt, -0.55, 1e-10, 'PCSK9 wAlt (negative = protective)');
});
test('parity_interaction_table_exact_match', () => {
const i0 = rvdna.INTERACTIONS[0];
assert(i0.rsidA === 'rs4680' && i0.rsidB === 'rs1799971', 'first interaction pair');
assertClose(i0.modifier, 1.4, 1e-10, 'COMT×OPRM1 modifier');
const i3 = rvdna.INTERACTIONS[3];
assert(i3.rsidA === 'rs80357906' && i3.rsidB === 'rs1042522', 'BRCA1×TP53 pair');
assertClose(i3.modifier, 1.5, 1e-10, 'BRCA1×TP53 modifier');
});
test('parity_z_score_matches_rust', () => {
// z_score(mid, ref) should be 0.0 (Rust test_z_score_midpoint_is_zero)
const ref = rvdna.BIOMARKER_REFERENCES[0]; // Total Cholesterol
const mid = (ref.normalLow + ref.normalHigh) / 2;
assertClose(rvdna.zScore(mid, ref), 0, 1e-10, 'midpoint z-score = 0');
// z_score(normalHigh, ref) should be 1.0 (Rust test_z_score_high_bound_is_one)
assertClose(rvdna.zScore(ref.normalHigh, ref), 1, 1e-10, 'high-bound z-score = 1');
});
test('parity_classification_matches_rust', () => {
const ref = rvdna.BIOMARKER_REFERENCES[0]; // Total Cholesterol 125-200
assert(rvdna.classifyBiomarker(150, ref) === 'Normal', 'Normal');
assert(rvdna.classifyBiomarker(350, ref) === 'CriticalHigh', 'CriticalHigh (>300)');
assert(rvdna.classifyBiomarker(110, ref) === 'Low', 'Low');
assert(rvdna.classifyBiomarker(90, ref) === 'CriticalLow', 'CriticalLow (<100)');
});
test('parity_vector_layout_64dim_l2', () => {
// Rust test_vector_dimension_is_64 and test_vector_is_l2_normalized
const gts = new Map();
for (const snp of rvdna.SNPS) gts.set(snp.rsid, snp.homRef);
gts.set('rs4680', 'AG');
gts.set('rs1799971', 'AG');
const profile = rvdna.computeRiskScores(gts);
assert(profile.profileVector.length === 64, '64 dims');
let norm = 0;
for (let i = 0; i < 64; i++) norm += profile.profileVector[i] ** 2;
norm = Math.sqrt(norm);
assertClose(norm, 1.0, 1e-4, 'L2 norm');
});
test('parity_hom_ref_low_risk_matches_rust', () => {
// Rust test_risk_scores_all_hom_ref_low_risk: global < 0.15
const gts = new Map();
for (const snp of rvdna.SNPS) gts.set(snp.rsid, snp.homRef);
const profile = rvdna.computeRiskScores(gts);
assertLt(profile.globalRiskScore, 0.15, 'hom-ref should be <0.15');
});
test('parity_high_cancer_matches_rust', () => {
// Rust test_risk_scores_high_cancer_risk: cancer > 0.3
const gts = new Map();
for (const snp of rvdna.SNPS) gts.set(snp.rsid, snp.homRef);
gts.set('rs80357906', 'DI');
gts.set('rs1042522', 'GG');
gts.set('rs11571833', 'TT');
const profile = rvdna.computeRiskScores(gts);
assertGt(profile.categoryScores['Cancer Risk'].score, 0.3, 'cancer > 0.3');
});
// ═════════════════════════════════════════════════════════════════════════════
// SECTION 4: Population-Scale Correlation Tests
// ═════════════════════════════════════════════════════════════════════════════
process.stdout.write('\n--- Population Correlations ---\n');
test('population_apoe_lowers_hdl', () => {
// Mirrors Rust test_apoe_lowers_hdl_in_population
const pop = rvdna.generateSyntheticPopulation(300, 88);
const apoeHdl = [], refHdl = [];
for (const p of pop) {
const hdl = p.biomarkerValues['HDL'] || 0;
const neuro = p.categoryScores['Neurological'] ? p.categoryScores['Neurological'].score : 0;
if (neuro > 0.3) apoeHdl.push(hdl); else refHdl.push(hdl);
}
if (apoeHdl.length > 0 && refHdl.length > 0) {
const avgApoe = apoeHdl.reduce((a, b) => a + b, 0) / apoeHdl.length;
const avgRef = refHdl.reduce((a, b) => a + b, 0) / refHdl.length;
assertLt(avgApoe, avgRef, 'APOE e4 should lower HDL');
}
});
test('population_lpa_elevates_lpa_biomarker', () => {
const pop = rvdna.generateSyntheticPopulation(300, 44);
const lpaHigh = [], lpaLow = [];
for (const p of pop) {
const lpaVal = p.biomarkerValues['Lp(a)'] || 0;
const cardio = p.categoryScores['Cardiovascular'] ? p.categoryScores['Cardiovascular'].score : 0;
if (cardio > 0.2) lpaHigh.push(lpaVal); else lpaLow.push(lpaVal);
}
if (lpaHigh.length > 0 && lpaLow.length > 0) {
const avgHigh = lpaHigh.reduce((a, b) => a + b, 0) / lpaHigh.length;
const avgLow = lpaLow.reduce((a, b) => a + b, 0) / lpaLow.length;
assertGt(avgHigh, avgLow, 'cardiovascular risk should correlate with elevated Lp(a)');
}
});
test('population_risk_score_distribution', () => {
const pop = rvdna.generateSyntheticPopulation(1000, 123);
const scores = pop.map(p => p.globalRiskScore);
const min = Math.min(...scores);
const max = Math.max(...scores);
const mean = scores.reduce((a, b) => a + b, 0) / scores.length;
// Should have good spread
assertGt(max - min, 0.2, `risk score range should be >0.2, got ${max - min}`);
// Mean should be moderate (not all near 0 or 1)
assertGt(mean, 0.05, 'mean risk should be >0.05');
assertLt(mean, 0.7, 'mean risk should be <0.7');
});
test('population_all_biomarkers_within_clinical_limits', () => {
const pop = rvdna.generateSyntheticPopulation(500, 55);
for (const p of pop) {
for (const bref of rvdna.BIOMARKER_REFERENCES) {
const val = p.biomarkerValues[bref.name];
assert(val !== undefined, `missing ${bref.name} for ${p.subjectId}`);
assert(val >= 0, `${bref.name} should be non-negative, got ${val}`);
if (bref.criticalHigh !== null) {
assertLt(val, bref.criticalHigh * 1.25, `${bref.name} should be < criticalHigh*1.25`);
}
}
}
});
// ═════════════════════════════════════════════════════════════════════════════
// SECTION 5: Streaming with Real-Data Correlated Biomarkers
// ═════════════════════════════════════════════════════════════════════════════
process.stdout.write('\n--- Streaming with Real Biomarkers ---\n');
test('stream_cusum_changepoint_on_shift', () => {
// Mirror Rust test_cusum_changepoint_detection
const cfg = { ...rvdna.defaultStreamConfig(), windowSize: 20 };
const p = new rvdna.StreamProcessor(cfg);
// Establish baseline at 85
for (let i = 0; i < 30; i++) {
p.processReading({
timestampMs: i * 1000, biomarkerId: 'glucose', value: 85,
referenceLow: 70, referenceHigh: 100, isAnomaly: false, zScore: 0,
});
}
// Sustained shift to 120
for (let i = 30; i < 50; i++) {
p.processReading({
timestampMs: i * 1000, biomarkerId: 'glucose', value: 120,
referenceLow: 70, referenceHigh: 100, isAnomaly: false, zScore: 0,
});
}
const stats = p.getStats('glucose');
assertGt(stats.mean, 90, `mean should shift upward after changepoint: ${stats.mean}`);
});
test('stream_drift_detected_as_trend', () => {
// Mirror Rust test_trend_detection
const cfg = { ...rvdna.defaultStreamConfig(), windowSize: 50 };
const p = new rvdna.StreamProcessor(cfg);
// Strong upward drift
for (let i = 0; i < 50; i++) {
p.processReading({
timestampMs: i * 1000, biomarkerId: 'glucose', value: 70 + i * 0.5,
referenceLow: 70, referenceHigh: 100, isAnomaly: false, zScore: 0,
});
}
assertGt(p.getStats('glucose').trendSlope, 0, 'should detect positive trend');
});
test('stream_population_biomarker_values_through_processor', () => {
// Take synthetic population biomarker values and stream them
const pop = rvdna.generateSyntheticPopulation(20, 77);
const cfg = { ...rvdna.defaultStreamConfig(), windowSize: 20 };
const p = new rvdna.StreamProcessor(cfg);
for (let i = 0; i < pop.length; i++) {
const homocysteine = pop[i].biomarkerValues['Homocysteine'];
p.processReading({
timestampMs: i * 1000, biomarkerId: 'homocysteine',
value: homocysteine, referenceLow: 5, referenceHigh: 15,
isAnomaly: false, zScore: 0,
});
}
const stats = p.getStats('homocysteine');
assert(stats !== null, 'should have homocysteine stats');
assertGt(stats.count, 0, 'should have processed readings');
assertGt(stats.mean, 0, 'mean should be positive');
});
// ═════════════════════════════════════════════════════════════════════════════
// SECTION 6: Package Re-export Verification
// ═════════════════════════════════════════════════════════════════════════════
process.stdout.write('\n--- Package Re-exports ---\n');
test('index_exports_all_biomarker_apis', () => {
const expectedFns = [
'biomarkerReferences', 'zScore', 'classifyBiomarker',
'computeRiskScores', 'encodeProfileVector', 'generateSyntheticPopulation',
];
for (const fn of expectedFns) {
assert(typeof rvdna[fn] === 'function', `missing export: ${fn}`);
}
const expectedConsts = ['BIOMARKER_REFERENCES', 'SNPS', 'INTERACTIONS', 'CAT_ORDER'];
for (const c of expectedConsts) {
assert(rvdna[c] !== undefined, `missing export: ${c}`);
}
});
test('index_exports_all_stream_apis', () => {
assert(typeof rvdna.RingBuffer === 'function', 'missing RingBuffer');
assert(typeof rvdna.StreamProcessor === 'function', 'missing StreamProcessor');
assert(typeof rvdna.generateReadings === 'function', 'missing generateReadings');
assert(typeof rvdna.defaultStreamConfig === 'function', 'missing defaultStreamConfig');
assert(rvdna.BIOMARKER_DEFS !== undefined, 'missing BIOMARKER_DEFS');
});
test('index_exports_v02_apis_unchanged', () => {
const v02fns = [
'encode2bit', 'decode2bit', 'translateDna', 'cosineSimilarity',
'isNativeAvailable', 'normalizeGenotype', 'parse23andMe',
'callCyp2d6', 'callCyp2c19', 'determineApoe', 'analyze23andMe',
];
for (const fn of v02fns) {
assert(typeof rvdna[fn] === 'function', `v0.2 API missing: ${fn}`);
}
});
// ═════════════════════════════════════════════════════════════════════════════
// SECTION 7: Optimized Benchmarks (pre/post optimization comparison)
// ═════════════════════════════════════════════════════════════════════════════
process.stdout.write('\n--- Optimized Benchmarks ---\n');
// Prepare benchmark genotypes from real fixture
const { gts: benchGts } = parseFixtureToGenotypes('sample-high-risk-cardio.23andme.txt');
bench('computeRiskScores (real 23andMe data, 20 SNPs)', () => {
rvdna.computeRiskScores(benchGts);
}, 20000);
bench('encodeProfileVector (real profile)', () => {
const p = rvdna.computeRiskScores(benchGts);
rvdna.encodeProfileVector(p);
}, 20000);
bench('StreamProcessor.processReading (optimized incremental)', () => {
const p = new rvdna.StreamProcessor({ ...rvdna.defaultStreamConfig(), windowSize: 100 });
const r = { timestampMs: 0, biomarkerId: 'glucose', value: 85, referenceLow: 70, referenceHigh: 100, isAnomaly: false, zScore: 0 };
for (let i = 0; i < 100; i++) {
r.timestampMs = i * 1000;
p.processReading(r);
}
}, 2000);
bench('generateSyntheticPopulation(100) (optimized lookups)', () => {
rvdna.generateSyntheticPopulation(100, 42);
}, 200);
bench('full pipeline: parse + score + stream (real data)', () => {
const text = loadFixture('sample-high-risk-cardio.23andme.txt');
const data = rvdna.parse23andMe(text);
const gts = new Map();
for (const [rsid, snp] of data.snps) gts.set(rsid, snp.genotype);
const profile = rvdna.computeRiskScores(gts);
const proc = new rvdna.StreamProcessor(rvdna.defaultStreamConfig());
for (const bref of rvdna.BIOMARKER_REFERENCES) {
const val = profile.biomarkerValues[bref.name] || ((bref.normalLow + bref.normalHigh) / 2);
proc.processReading({
timestampMs: 0, biomarkerId: bref.name, value: val,
referenceLow: bref.normalLow, referenceHigh: bref.normalHigh,
isAnomaly: false, zScore: 0,
});
}
}, 5000);
bench('population 1000 subjects', () => {
rvdna.generateSyntheticPopulation(1000, 42);
}, 20);
// ═════════════════════════════════════════════════════════════════════════════
// Summary
// ═════════════════════════════════════════════════════════════════════════════
process.stdout.write(`\n${'='.repeat(70)}\n`);
process.stdout.write(`Results: ${passed} passed, ${failed} failed, ${passed + failed} total\n`);
if (benchResults.length > 0) {
process.stdout.write('\nBenchmark Summary:\n');
for (const b of benchResults) {
process.stdout.write(` ${b.name}: ${b.perOp}/op\n`);
}
}
process.stdout.write(`${'='.repeat(70)}\n`);
process.exit(failed > 0 ? 1 : 0);