Intelligence Mesh: Value Engineering Assessment
Technical architecture, algorithm inventory, data pipeline analysis, and IP differentiation assessment for the ninja.ing intelligence ecosystem. Prepared for prospective partners, acquirers, and technical evaluators.
1. Ecosystem Overview
The ninja.ing ecosystem comprises 17 operational intelligence platforms sharing a unified Neo4j graph database containing 1,000,000+ nodes and their relationships. The platforms span six intelligence domains: cyber threat intelligence, geopolitical intelligence, OSINT investigation, financial intelligence, identity security, and autonomous incident response.
All platforms are production-deployed across dedicated domains, serving authenticated users with real-time data ingestion, ML-driven analysis, and WebSocket-based live updates.
| Platform | Domain | Intelligence Domain | Key Metric |
|---|---|---|---|
| Ninja Signal | ninjasignal.ninja | Cyber Threat Intelligence | 89K+ IOCs, 245+ actors |
| Ninja Fusion | ninjafusion.ninja | Geopolitical Intelligence | 80+ intelligence sources |
| Raz0r | ninjaraz0r.ninja | EDR & SIEM | 33 rules, Rust agent |
| Ninja Nexus | ninjanexus.ninja | OSINT Investigation | Sanctions + ownership |
| Kin0bi | ninjaken0bi.ninja | Financial Intelligence | Crypto/equity/forex ML |
| Ninja 1D | 1d.ninja.ing | Identity Security | AD/Entra attack paths |
| ANTOS | antos.ninja.ing | DevSecOps Pipeline | 8 stages, 17 tools |
| Agentic V0id | ninjav0id.io | Autonomous IR | 3 LLM agents, 8 playbooks |
| V01d | ninjav0id.io | Sentiment Intelligence | Oracle score, 18 feeds |
| Los Alamos | — | Red vs Blue Wargaming | LLM adversaries, ELO scoring |
| Knox | ninjav0id.io/knox | Secrets & Crypto | Post-quantum crypto, TOTP |
| Social | ninjasocial.ninja | TI Collaboration | Encrypted messaging, IOC detect |
| War Room | warroom.ninja | Incident Response | LiveKit video, breach tracker |
| Sabaki | ninja.ing/ninjasabaki | Vulnerability Triage | Multi-scanner, ServiceNow |
| Depth | ninja.ing/depth | Supply Chain Security | SBOM, dependency audit |
| NinjaClaw | PyPI | CLI Security Agent | 10 scanners, CIS rules |
| GITAIR | gitair.ninja | DevSecOps | Air-gapped git intelligence |
2. Technical Architecture
Stack
- Backend: Python 3.14, FastAPI (async), uvicorn. One FastAPI app per platform.
- Database: Neo4j 5.x with GDS 2.22.0 (Graph Data Science library). Shared graph in production, isolated per-platform in development.
- Frontend: Next.js 16, React 19, Tailwind CSS v4. Floating-window UI paradigm per platform.
- Reverse Proxy: Caddy (automatic TLS, HTTP/2). Single Caddyfile routes all 17 platforms + subdomains.
- EDR Agent: Rust (Raz0r agent). ETW hooking, AMSI monitoring, memory scanning. ~2MB release binary.
- Deployment: Docker Compose (dev + prod overlays). Two Hetzner dedicated servers (WireGuard tunnel), Cloudflare DNS (proxied for .ninja domains, Let’s Encrypt direct for .ninja.ing).
- Real-time: WebSocket per platform for live data streaming.
Cross-Platform Integration Pattern
The critical architectural differentiator is the shared graph. In production, all 17 platforms' FastAPI backends connect to the same Neo4j instance. This means:
- An IOC ingested by Signal is immediately queryable by Raz0r's correlator
- A threat actor in Signal connects to techniques that connect to Raz0r detection rules
- An OSINT entity in Nexus can be linked to identities in 1D and financial entities in Kin0bi
- Fusion's geopolitical events provide context for Signal's threat actor activity
- V0id's autonomous agents can traverse the entire graph for incident investigation
This is not API-level integration. It's graph-level integration — the relationships are native Neo4j edges, traversable in a single Cypher query.
3. Platform Inventory
Ninja Signal — Cyber Threat Intelligence
The foundational CTI platform. Ingests from NVD, MITRE ATT&CK, AlienVault OTX, CISA KEV, VirusTotal, AbuseIPDB, and 15+ additional feeds. 89,000+ Indicator nodes, 48,000+ Software nodes, 21,000+ Infrastructure nodes, 1,900+ Vulnerability nodes, 1,200+ Technique nodes, 245+ ThreatActor nodes.
Key Features: Risk propagation, community detection, KEV prediction, adversary digital twins (Monte Carlo campaign simulation), hidden connections, cross-domain analysis, activity forecasting, GraphSAGE risk learning.
~3,500-line main application file. 15+ ingesters. OpenCTI bidirectional sync (STIX 2.1 import/export).
Ninja Fusion — Geopolitical Intelligence
Cross-domain intelligence fusion platform. 68 sources including GDELT GKG, RSS feeds (BBC, Reuters, AP, Al Jazeera), FRED economic data, Reddit, academic papers, humanitarian alerts, prediction markets. VADER sentiment scoring, Monte Carlo simulation with conditional kill-chain probabilities, MAD-based anomaly detection.
Key Features: Adversary digital twins, emergent behavior detection (7 detectors), public SITREP page (SEO-optimized, free access), social features.
Raz0r — Graph-Native SIEM & EDR
Full SIEM implementation with custom Rust EDR agent.
Rust Agent: 25 source files, dual build (EXE + DLL). ETW hooking, AMSI monitoring, memory scanning, behavioral heuristics. Reports to SIEM via REST API.
Detection: 33 detection rules. 25 event types. Ransomware predictor (5-phase kill chain tracker, 24 indicators, SEV1 at Phase 4). Cross-node correlator for distributed campaign detection. Auto-rule generator from threat intelligence data.
Integration: AssetMapper links SIEM events to Indicator/Technique/ThreatActor/Software/Vulnerability nodes in Signal. BFS blast radius analysis from any event.
Ninja Nexus — OSINT Investigation
Investigative intelligence platform. 12 entity types: Person, Company, Property, BankAccount, Transaction, Document, Sanction, Address, Vessel, Domain, Jurisdiction, Investigation. Ingesters for OpenSanctions, OFAC SDN, ICIJ Offshore Leaks, UK Companies House, SEC EDGAR, OpenCorporates.
Key Features: Suspicion propagation with temporal decay and confidence intervals, money flow tracing, UBO (Ultimate Beneficial Owner) resolution, 7 emergent behavior detectors with false positive tracking and persistence.
Kin0bi — Financial Intelligence
Market intelligence platform. Data sources: Binance WebSocket (crypto), Finnhub (stocks), ECB (forex), FRED (macro), ApeWisdom + Reddit (sentiment). Real-time pipeline: pollers → async queue → batch writer → Neo4j.
Key Features: Log-return-based correlation (not raw prices), EMA+momentum prediction, Isolation Forest on residuals, historical VaR (no Gaussian assumption), streaming anomaly detection (Half-Space Trees).
Ninja 1D — Identity Intelligence
AD/Entra ID attack surface analysis. Ingesters for BloodHound 4.x JSON, LDAP/LDAPS, Azure AD (MS Graph OAuth2), CSV. 8 labels, 20+ relationship types covering all standard ACE types.
Key Features: BFS attack path analysis, Kerberoasting/AS-REP roastable detection, shadow admin identification, unconstrained delegation detection, group nesting analysis, identity risk scoring.
ANTOS — AI-Orchestrated DevSecOps
Static site (Next.js) documenting an 8-stage security pipeline with 17 tools. Claude AI triage layer. SAST, SCA, secrets detection, container scanning, DAST, compliance, IaC scanning.
Agentic V0id — Autonomous Defensive Agents
Three LLM-powered autonomous agents. Sentinel (triage), Warden (containment), Spectre (threat hunting). Connectors to Raz0r SIEM, Signal CTI, Azure AD. 8 IR playbooks (ransomware, BEC, data exfil, insider threat, DDoS, supply chain, credential compromise, zero-day). Detection engineering agents (coverage mapper, rule tuner, signature generator, correlation builder). Forensic collection agents (memory, disk, network, evidence packager) with chain-of-custody.
ninjaV01d — Predictive Sentiment Intelligence
Companion to V0id, deployed at ninjav0id.io (root). 16 data source pollers (GDELT, RSS, FRED, Reddit, HackerNews, Crypto Fear & Greed, USGS, WHO, ReliefWeb, Wikipedia, arXiv, Polymarket, NewsAPI, Finnhub News, GDELT Doc, EventRegistry). V01d Oracle composite scoring engine.
ML Phases: Cross-source consensus, LSTM forecasting, Hawkes cascade detection, graph embeddings (Node2Vec), narrative detection, geospatial sentiment diffusion, Granger causality (economic → sentiment), streaming anomaly detection.
4. Algorithm & ML Inventory
The following documents every ML algorithm deployed across the ecosystem, including implementation details, parameters, and rationale.
4.1 Graph Analytics
Risk Propagation (BFS with Temporal Decay)
Propagates risk scores through the graph from high-risk seed nodes (threat actors, known-exploited vulnerabilities). Each hop attenuates the score by a configurable damping factor. Edge age modulates propagation weight via temporal decay: exp(-0.001 * age_days).
Platforms: Signal, Fusion | Decay: 1yr=69%, 2yr=48%, 3yr=33% | Damping: configurable via ML_RISK_DAMPING
GraphSAGE Risk Propagation
2-layer neural network that learns per-edge-type risk propagation weights rather than using uniform damping. Aggregates neighbor features through mean-pooling, producing learned risk embeddings. Falls back to standard BFS propagation when training data is insufficient.
Platform: Signal | Layers: 2 | Aggregation: mean-pool | Training: supervised on known-risk labels
Suspicion Propagation (OSINT)
BFS propagation specific to OSINT investigation. Seeds from sanctioned entities, PEPs, and high-risk jurisdiction connections. Temporal decay at exp(-0.002 * age_days) (faster decay than CTI due to OSINT data volatility). Returns confidence intervals: 1 - exp(-0.1 * num_relationships).
Platform: Nexus | Seed types: Sanction, PEP, jurisdiction | Confidence: 5 edges=0.39, 10=0.63, 20=0.86
Community Detection (Louvain)
Standard Louvain modularity optimization for clustering graph nodes into communities. Used to identify related threat clusters, actor groups, and financial networks. When Neo4j GDS is available (ML_USE_GDS=true), delegates to native GDS Louvain for performance.
Platforms: Signal, Fusion, Nexus, Kin0bi | GDS toggle: ML_USE_GDS | Drill-down: /ml/communities/{id}
Node2Vec Graph Embeddings
Random walk-based graph embedding. Generates vector representations of nodes by performing biased random walks, then applying Skip-gram (Word2Vec). Used for hidden connection detection and similarity analysis.
Platforms: Signal, V01d | walk_length=20, num_walks=20, window=5, embedding_dim=64
GCN/GAT (Graph Attention Network)
Simplified single-layer linear attention mechanism for node classification. Pure-numpy implementation by default, optional PyTorch backend for GPU acceleration. Used for predicting node properties from neighborhood structure.
Platform: Signal | Implementation: core/gat.py | Fallback: numpy (no PyTorch required)
4.2 Prediction & Forecasting
KEV Predictor (Random Forest)
Predicts whether a CVE will be added to CISA's Known Exploited Vulnerabilities catalog. Features: CVSS score, EPSS probability, vendor, CWE, existing exploit references, attack complexity. Uses class_weight='balanced' to handle severe class imbalance (98% of CVEs are not in KEV). Evaluated with 5-fold stratified cross-validation.
Platform: Signal | Algorithm: RandomForestClassifier | CV: StratifiedKFold(5) | Balance: class_weight='balanced'
EMA + Momentum Price Prediction
Short-term trend prediction using Exponential Moving Average with momentum confirmation. Replaced naive linear regression on raw prices. Operates on log returns to prevent spurious correlation artifacts. Linear regression retained as fallback when EMA data is insufficient.
Platform: Kin0bi | Primary: EMA+momentum | Fallback: linear regression | Input: log returns
Hawkes Process Activity Forecasting
Self-exciting point process model for predicting future threat activity. Threat events are modeled as a temporal process where past events increase the probability of future events (clustering effect). Exponential decay kernel.
Platforms: Signal, V01d | Decay: configurable via ML_HAWKES_DECAY | Window: ML_HAWKES_WINDOW_HOURS
LSTM Sentiment Forecasting
Long Short-Term Memory network for predicting sentiment trajectory. Lookback window of 48 hours, forecast horizon of 24 hours. Uses entity-level and region-level aggregated sentiment as input features.
Platform: V01d | Hidden: 32 | Lookback: 48h | Forecast: 24h | Configurable via ML_LSTM_*
Granger Causality Testing
Statistical test for whether one time series helps predict another beyond the target's own history. Pure-NumPy implementation using F-tests on restricted vs. unrestricted autoregressive models. Tests both directions (x→y and y→x). Used in V01d Oracle to determine whether economic indicators actually predict sentiment shifts.
Platform: V01d | Max lag: ML_GRANGER_MAX_LAG (default 5) | Significance: F > 2.5 | Days: ML_GRANGER_DAYS (default 30)
Monte Carlo Campaign Simulation
Simulates threat actor campaign progression through kill chain phases. 1,000+ simulations per run. Uses conditional probability boosts for phase transitions (e.g., initial-access→execution gets 1.3x boost if prior phase succeeded). Actor-specific technique probabilities derived from historical data.
Platforms: Signal, Fusion (Adversary Digital Twins) | Simulations: 1000+ | Phase boosts: 1.15x-1.4x | Cap: 0.95
4.3 Anomaly Detection
MAD-Based Anomaly Detection
Median Absolute Deviation replaces z-scores across the ecosystem. modified_z = 0.6745 * (value - median) / MAD. Robust to outliers and fat-tailed distributions that invalidate the Gaussian assumption required by z-scores. Threshold: 3.5 (configurable).
Platforms: Signal, Fusion | Consistency factor: 0.6745 | Replaced: z-score (threshold 2.0)
Isolation Forest (Residual-Based)
Anomaly detection on EMA residuals of log returns, not raw prices. This prevents trend and seasonality from triggering false anomalies. Contamination parameter: 5%.
Platforms: Kin0bi, V01d | Input: log return residuals | Contamination: 0.05 | sklearn.ensemble.IsolationForest
Half-Space Trees (Streaming)
Online anomaly detection for streaming data. No batch retraining required — the model updates incrementally with each new observation. Used for real-time anomaly scoring on financial market data and sentiment streams.
Platforms: V01d, Kin0bi | Type: streaming/online | No retraining | core/streaming_anomaly.py
Ransomware Kill Chain Predictor
5-phase behavioral model: reconnaissance, weaponization, delivery, exploitation, actions-on-objectives. 24 behavioral indicators tracked per host. When Phase 4 is reached, triggers SEV1 MAJOR alert. Cross-node correlator detects distributed campaigns via phase alignment and entropy convergence.
Platform: Raz0r | Phases: 5 | Indicators: 24 | Alert threshold: Phase 4 | Cross-node: entropy convergence
4.4 Composite Scoring
V01d Oracle
Flagship composite intelligence score (0–100). Weighted components: 30% tone (VADER sentiment aggregate), 25% velocity (rate of change), 20% anomaly (Isolation Forest anomaly ratio), 15% topic heat (topic clustering density), 10% economic (Granger causality with economic indicators). Normalization calibrated to use full 0–100 range.
Platform: V01d | Components: 5 | Weights: configurable via V01D_ORACLE_WEIGHTS | Cache: 900s TTL
4.5 Emergent Behavior Detection
Emergent Detectors (7 per platform)
Pattern-matching detectors that identify emergent behaviors from graph structure changes. Examples: sanctions network mutation, jurisdiction hopping, shell company emergence (Nexus); velocity anomalies, bridge anomalies, community shifts (Signal/Fusion). Each signal includes a stable pattern hash for deduplication and false positive tracking with Neo4j persistence.
Platforms: Signal, Fusion, Nexus | FP tracking: SHA-256 pattern hash | Persistence: FalsePositive label in Neo4j
4.6 Autonomous Agents
LLM-Powered Decision Agents
Three autonomous agents (Sentinel, Warden, Spectre) using Claude as the reasoning engine. Each agent has access to the full graph for context. Decision steps include confidence thresholds — below-threshold decisions are escalated to human operators. Every action is logged and reversible.
Platform: V0id | Agents: 3 | Playbooks: 8 (YAML-driven) | Step types: reason, action, manual, wait
5. Data Pipeline Architecture
Ingestion Pattern
All platforms follow the same async pipeline pattern:
- Pollers: Async tasks on configurable intervals (15s to 24h) that fetch from external sources
- Queue:
AsyncEventQueue(capacity: 50,000) with backpressure (80% pause, 50% resume) - Batch Writer:
UNWIND CREATEfor new data (3–5x faster than MERGE),MERGEfor aggregations - Neo4j: Graph storage with indexes on key lookup fields per label
- Post-write hooks: Correlator, AlertEngine, ML cache warming
Data Sources (95+)
| Category | Sources | Platform |
|---|---|---|
| CTI Feeds | NVD, MITRE ATT&CK, OTX, CISA KEV, VirusTotal, AbuseIPDB, MalwareBazaar, URLhaus, Shodan, ThreatFox, PhishTank, SpamhausDBL, C2IntelFeeds, FeodoTracker, OpenCTI | Signal |
| News/OSINT | GDELT GKG, GDELT Doc, BBC/Reuters/AP RSS, Reddit, HackerNews, Wikipedia, NewsAPI, Finnhub News, EventRegistry | Fusion, V01d |
| Economic | FRED (Federal Reserve), ECB (forex), Polymarket (predictions), Crypto Fear & Greed | V01d, Kin0bi |
| Markets | Binance WebSocket (crypto), Finnhub (equities), ApeWisdom (social sentiment) | Kin0bi |
| Humanitarian | USGS (earthquakes), WHO (disease alerts), ReliefWeb (crises) | V01d |
| Academic | arXiv (preprints) | V01d |
| Sanctions | OpenSanctions, OFAC SDN, ICIJ Offshore Leaks | Nexus |
| Corporate | UK Companies House, SEC EDGAR, OpenCorporates | Nexus |
| Identity | BloodHound JSON, LDAP/LDAPS, Azure AD (MS Graph) | 1D |
| Endpoint | ETW, AMSI, memory scanning, behavioral heuristics (Rust agent) | Raz0r |
Retention
Configurable per severity per platform. Default: critical=365d, high=90d, medium=30d, low=7d. Batched DETACH DELETE every 6 hours. Non-destructive — aggregated EventSummary nodes persist beyond raw event retention.
6. Technical Differentiation
Unified Graph Architecture
The single most significant differentiator. All 17 platforms share one Neo4j graph. Cross-domain traversal (SIEM event → threat actor → OSINT entity → identity → financial entity) is a native graph operation, not an API integration layer. No comparable product exists in the market.
Mathematically Sound ML
Every algorithm was chosen for a specific mathematical reason. Log returns prevent spurious correlation. MAD handles fat tails. Temporal decay reflects real-world intelligence aging. Granger causality tests for actual predictive relationships, not just correlation. Class-balanced cross-validation prevents overfitting to majority class. This level of methodological rigor is rare in security tooling.
Full-Stack Vertical Integration
From Rust EDR agent (memory-level) through graph database (storage) to Next.js UI (presentation) to LLM agents (autonomous response). No external dependencies on third-party security platforms. The entire intelligence pipeline is self-contained and architecturally coherent.
Autonomous Incident Response
V0id's LLM agents with full graph access represent the next generation of IR. Unlike rule-based SOAR, each decision step involves reasoning over the entire threat graph. 8 IR playbooks covering the most common incident types. Every action reversible, every decision auditable.
Runtime Configurability
All ML parameters, polling intervals, queue sizes, thresholds, and feature flags are configurable via environment variables at runtime. No code changes required to tune the system. This enables rapid deployment customization for different operational contexts.
7. Intellectual Property Summary
Codebase Metrics
| Component | Files | Primary Language | Notable |
|---|---|---|---|
| Signal (RTM) | ~80+ | Python + TypeScript | 3,500+ line main app, 15+ ingesters |
| Fusion | ~70+ | Python + TypeScript | 68-source fusion engine |
| Raz0r | ~60+ | Python + Rust + TypeScript | 25 Rust source files (EDR agent) |
| Nexus | ~65 | Python + TypeScript | 6 OSINT ingesters, 7 emergent detectors |
| Kin0bi | ~70 | Python + TypeScript | Real-time market pipeline |
| 1D | ~37 | Python + TypeScript | 4 identity ingesters |
| ANTOS | ~20 | TypeScript | Static analysis documentation |
| V0id + V01d | ~120+ | Python + TypeScript | LLM agents + 16 pollers + Oracle |
Proprietary Algorithms
- V01d Oracle: 5-component composite scoring with Granger causality economic integration
- Cross-Node Ransomware Correlator: Phase alignment + entropy convergence for distributed campaign detection
- GraphSAGE Risk Propagation: Per-edge-type learned risk weights (not published)
- Adversary Digital Twins: Probabilistic actor behavioral models with Monte Carlo conditional simulation
- Intelligence Mesh: The architectural pattern of 8 domains sharing one graph
Data Assets
- 1,000,000+ curated nodes with typed relationships
- 89,000+ IOCs with source provenance
- 245+ threat actor profiles with technique mappings
- Sanctions/OSINT entity network (OpenSanctions, OFAC, ICIJ)
- Historical sentiment and economic correlation data
8. Technical Risks & Mitigations
Key-Person Dependency
The ecosystem was built by a single engineer. Knowledge concentration is high. Mitigation: Consistent architecture pattern across all 17 platforms (FastAPI + Neo4j + Next.js + Caddy). Well-structured codebases. This technical guide, CLAUDE.md files, and dev diary serve as documentation. The uniformity of the stack means onboarding is learning one pattern, not eight.
Scalability
Current deployment is a two-server architecture (Hetzner dedicated — Ryzen 9 7950X3D/128GB for Signal+Fusion, Ryzen 5 3600/64GB for all other apps, connected via WireGuard). Neo4j's 1M+ nodes are well within single-instance capacity (Neo4j handles millions). Horizontal scaling would require Neo4j clustering (supported in Enterprise Edition) and container orchestration (Kubernetes). Mitigation: Docker Compose architecture maps cleanly to K8s. No hardcoded single-instance assumptions.
Third-Party Data Dependencies
Several ingesters depend on free-tier APIs (NVD, GDELT, OTX, OpenSanctions). Rate limits or API changes could disrupt ingestion. Mitigation: Modular ingester design — each ingester is a standalone module. Historical data is preserved in the graph regardless of API availability. Premium API keys can be added via environment variables.
LLM Dependency (V0id Agents)
Autonomous agents depend on Claude API availability and cost. Mitigation: Agents degrade gracefully to manual mode. All playbook steps have manual approval fallback. LLM is used for reasoning, not critical-path execution.
9. Market Positioning
Competitive Landscape
| Competitor | Coverage | Graph-Native | Cross-Domain | Autonomous IR |
|---|---|---|---|---|
| CrowdStrike | EDR + TI | No | Limited | No (human MDR) |
| Palo Alto (XSIAM) | SIEM + SOAR | No | Limited | Rule-based |
| Recorded Future | CTI | Partial | CTI only | No |
| Splunk (Cisco) | SIEM + SOAR | No | Via apps | Rule-based |
| OpenCTI | CTI | Yes (Neo4j) | CTI only | No |
| Maltego | OSINT | Yes | OSINT only | No |
| ninja.ing | All 6 domains | Yes (Neo4j) | Full mesh | LLM agents |
Value Proposition
No existing product covers all six intelligence domains in a single graph. Enterprises currently assemble this capability from 10–15 vendors at $500K–$2M/year in licensing, plus integration costs. The ninja.ing mesh provides equivalent or superior coverage with native graph integration and autonomous response, deployable as a single Docker Compose stack.
Go-to-Market Options
- SaaS: Multi-tenant deployment with per-platform or full-mesh pricing
- On-Premise: Single Docker Compose stack for air-gapped / regulated environments
- OEM / White-Label: Individual platforms (e.g., Raz0r SIEM or Nexus OSINT) licensed to security vendors
- Managed Intelligence: Hosted mesh with curated data and analyst support