設計図 — Architecture Document

The Master Blueprint

How raw signals from the wild become intelligence you can act on. Two platforms. One graph. 1,000,000+ nodes.

Subject to change. Obviously. 🛠

225
API Endpoints
32
Core Modules
104
Ingesters
1M+
Graph Nodes
15
Platforms
Section I

From Noise to Knowledge

概念構造 — Conceptual Architecture

The internet is noisy. Every day: 60+ new CVEs published. Hundreds of IOC feeds updated. Geopolitical events shift threat actor priorities. Social media amplifies disinformation. Economic indicators correlate with ransomware surges. All of it is data. Almost none of it is intelligence — until something connects it.

The ninja.ing ecosystem exists to make that connection. Not through dashboards bolted together with REST APIs, but through a unified knowledge graph where a vulnerability node connects to the technique that exploits it, connects to the actor that uses it, connects to the campaign that deployed it, connects to the geopolitical event that motivated it.

Two platforms sit at the centre of this. Signal handles cyber threat intelligence — the adversaries, their tools, their techniques, and the indicators they leave behind. Fusion handles everything else — geopolitics, economics, social intelligence, environmental data, military movements — and fuses it with Signal's cyber picture.

The Intelligence Pipeline

Every piece of intelligence follows the same path, regardless of source:

Wild Raw feeds
Ingest 20 + 84 sources
Claim Atomic triples
Graph Neo4j MERGE
Enrich ML & analytics
Present 83 windows

The critical insight: Claims are the universal adapter. Any feed, any format — whether it's a JSON blob from the NVD, a STIX bundle from MITRE, a CSV from CISA, or a GDELT event stream — gets decomposed into atomic subject/predicate/object triples. A Claim says: "this thing has this relationship to that thing, and here's where I learned it."

Signal: The Cyber Layer

20 ingesters pull from the core CTI feeds. NVD for vulnerabilities. MITRE ATT&CK for techniques and threat actors. OTX for IOCs. CISA KEV for known exploited vulnerabilities. Abuse.ch suite (MalwareBazaar, ThreatFox, Feodo, URLhaus) for malware samples and C2 infrastructure. GitHub Advisories. Phishing feeds. Ransomware trackers. CIRCL for intelligence sharing. OpenCTI for STIX import.

Signal's graph holds ~1M nodes across 14 labels: 89K+ indicators, 48K+ software entries, 21K+ infrastructure nodes, 1,900+ vulnerabilities, 1,200+ techniques, 245+ threat actors, and 40+ campaigns. All connected by edges that encode how the threat landscape actually works.

Fusion: The Everything Layer

84 ingesters across eight domains. Not just cyber — geopolitical events (GDELT, ACLED, GTD), economic indicators (FRED, IMF, World Bank), social intelligence (Reddit, Twitter, Mastodon, Telegram, Bluesky), environmental data (NASA FIRMS, GDACS, EMDAT), military and sanctions data (SIPRI, OFAC SDN, OpenSanctions), health (WHO), and technology (Shodan, OpenSky, AIS maritime).

Why does a threat intelligence platform need economic data? Because ransomware surges correlate with cryptocurrency prices. Because sanctions drive state-sponsored actors to find new funding. Because a geopolitical crisis in one region predicts cyber campaigns against targets in another. Fusion makes those connections visible.


Section II

The Claims Engine

論理構造 — Logical Architecture

Every ingester, every source, every feed — they all produce the same thing: Claims. A Claim is the atomic unit of intelligence. It's a frozen, immutable dataclass that says: "Subject X has Predicate relationship to Object Y, sourced from Z with confidence C."

The Claim Dataclass

core/claims.py
@dataclass(frozen=True) class Claim: subject_type: str # "ThreatActor", "Vulnerability", ... subject_key: str # "APT28", "CVE-2024-1234", ... predicate: str # "USES", "EXPLOITS", "TARGETS", ... object_type: Optional[str] # None for node-only claims object_key: Optional[str] first_seen: Optional[str] last_seen: Optional[str] source_name: str = "unknown" source_url: Optional[str] = None source_item_id: Optional[str] = None confidence: float = 0.7 subject_props: Optional[Dict] = None # Merge onto subject node edge_props: Optional[Dict] = None # Merge onto relationship

14 fields. Frozen. Immutable. Two Claims from different sources about the same fact will merge into a single graph edge, preserving the earliest first_seen and updating last_seen. Provenance is never lost — the source_name, source_url, and confidence travel with every assertion.

Graph Merge Strategy

Claims don't INSERT into the graph. They MERGE. The ingestion engine batches Claims in groups of 500, groups them by schema pattern (subject_type, predicate, object_type), and fires a single UNWIND Cypher query per group:

Neo4j Merge Pattern
UNWIND $batch AS row MERGE (s:ThreatActor {name: row.subject_key}) ON CREATE SET s.first_seen = row.first_seen ON MATCH SET s.last_seen = row.last_seen MERGE (o:Technique {name: row.object_key}) MERGE (s)-[r:USES]->(o) SET r.confidence = row.confidence, r.source = row.source_name

Deadlock retry with exponential backoff: 3 attempts, sleeping 2× the attempt number in seconds. Non-deadlock errors fail immediately. This handles Neo4j's transaction contention when multiple ingesters run concurrently.

Node Labels

Signal uses 14 node labels. Each represents a first-class entity in the threat intelligence domain:

LabelCountRole
Indicator~89,000IOCs — hashes, IPs, domains, URLs
Software~48,000Malware families, tools, legitimate software
Infrastructure~21,000C2 servers, hosting providers, ASNs
Vulnerability~1,900CVEs with CVSS, EPSS, KEV status
Technique~1,200MITRE ATT&CK techniques & sub-techniques
ThreatActor~245APT groups, cybercrime orgs, hacktivists
Campaign~40Named operations & attack campaigns
MitigationMITRE mitigations & defensive measures
SourceIntelligence feed provenance nodes
EventDiscrete security events
EventSummaryAggregated event timelines
AlertGenerated alerts from detection rules
DetectionRuleKQL, Sigma, YARA rules
TelemetrySourceLog sources & data collection points

Relationship Types

Edges are typed. Each encodes a specific semantic relationship:

ThreatActor —USES→ Technique
ThreatActor —ATTRIBUTED_TO→ Campaign
Software —EXPLOITS→ Vulnerability
Technique —TARGETS→ Software
Indicator —INDICATES→ Software
Mitigation —MITIGATES→ Technique
Campaign —USES→ Infrastructure

Fusion’s Extended Schema

Fusion extends the schema with 20+ node labels to cover cross-domain entities: SocialPost, EconomicIndicator, GeopoliticalEvent, Country, Organization, SanctionedEntity, and more. Cross-domain edges connect a geopolitical crisis to the cyber campaigns it spawns, or an economic shock to the ransomware surge that follows.


Section III

Signal — Technical Architecture

信号 — Cyber Threat Intelligence

Backend: adversary_graph_app.py

One FastAPI application. 225 endpoints — 151 GET, 62 POST, 5 PATCH, 4 DELETE, 3 WebSocket. Organised by domain: graph queries, ML analytics, twins (digital adversary profiles), threat attribution, KQL generation, process mining, causal inference, semantic search, geospatial analysis, CTI extraction, briefing generation, and more.

Key design decision: one file, one process. No microservices splitting. The intelligence domain is deeply interconnected — a risk scoring endpoint needs access to the graph, the ML cache, the twin profiles, and the semantic index. Splitting that into services would add network hops and serialisation overhead for zero architectural benefit.

Core Modules

Signal ships with 32 Python modules in core/. Each handles a distinct analytical capability:

ML Engine core

ml.py — Risk propagation, community detection (Louvain), link prediction, centrality analysis, GDS graph projections, anomaly scoring.

gat.py — Graph Attention Networks for node classification.

graphsage.py — GraphSAGE inductive node embeddings.

Semantic Search tier a

semantic.py — Hybrid search combining LanceDB vector store with Neo4j fulltext. TF-IDF fallback when embedding models aren't available. Indexes all 1M+ nodes.

Geospatial tier a

geo.py — H3 hexagonal heatmaps. 99 country centroid database. APT actor geographic overlay with threat density calculations.

Causal Inference tier a

causal.py — DoWhy-based causal analysis. 4 CTI scenarios: mitigation effectiveness, technique adoption drivers, infrastructure impact, IOC correlation.

CTI Extraction tier a

extraction.py — LLM-powered (Claude API) entity extraction from unstructured text. Regex fallback. Entity review queue. Direct graph commit.

ORIGAMI Attribution tier a

attribution.py — Multi-source threat actor origin analysis. Infrastructure tracing, temporal clock analysis, TTP fingerprint matching (weighted Jaccard), evidence fusion with Diamond Model output.

Adversary DNA tier a

adversary_dna.py — 18-dimensional behavioral fingerprinting from access logs. Temporal entropy, velocity, method entropy, IRT stats. Archetyping: scanner, brute forcer, researcher, bot, targeted operator.

Event Bus infra

bus.py — NATS JetStream pub/sub. 9 subject hierarchies covering Signal, Fusion, Raz0r, V01d, Nexus, Kin0bi, 1D, V0id, and ecosystem-wide events.

S-Tier Modules

federated.py — Federated threat intelligence sharing.

org_twin.py — Organisational digital twin modelling.

causal_rl.py — Reinforcement learning for causal response.

cascade.py — Cascade failure prediction.

neuromorphic.py — Neuromorphic graph processing.

Supporting Modules

twins.py — Adversary digital twin profiles, Monte Carlo simulation, playbook & wargame generation.

process_mining.py — Attack process flow discovery from event sequences.

kql.py — KQL detection rule generation for Microsoft Sentinel.

graph.py — Neo4j adapter, connection pooling, query helpers.

Frontend: 37 Floating Windows

Signal's UI is a Next.js 16 / React 19 application with a custom window manager. Not tabs. Not pages. Floating, draggable, resizable windows — like a desktop OS for threat intelligence. The user can arrange graph views, ML dashboards, twin profiles, and causal analysis side by side.

CategoryWindows
Graph & VisualisationGraph, Theatre (3D), Galaxy, Heatmap
ML & AnalyticsRisk, Communities, Predict, Centrality, Emergent
IntelligenceTwins, Wargame, Attribution, Causal, DNA, Diff, Briefing
Search & ExtractionSearch (Spektr), Extract, Workbench
DetectionKQL, SIEM, Hunting, Emulation, Process Mining
InfrastructureTraffic, Event Bus, Telemetry, DataLab
AdminAdmin, Settings, Users, Audit

Auth Model

File-based JSON user store. Passwords hashed with bcrypt. JWTs signed with jose. Middleware enforcement in middleware.ts — every route is protected except explicit public paths. MFA gate available. Role-based admin access. SSO token exchange for cross-app authentication.

Sample API Endpoints

Selected Endpoints (225 total)
# Graph Intelligence GET /graph/stats # Node/edge counts by label GET /graph/actors # All threat actors with metrics POST /search/semantic # Hybrid vector + fulltext search # ML Pipeline GET /ml/risk # Risk-propagated scores GET /ml/communities # Louvain community detection GET /ml/predict # Link prediction (future edges) # Adversary Profiling GET /twins/profile/{name} # Digital twin behavioural model POST /twins/wargame # Monte Carlo actor vs. defence sim GET /attribution/{actor} # ORIGAMI multi-source attribution # Real-time WS /ws/threats # Live threat feed stream WS /ws/chat # Niko AI assistant # Generation GET /kql/generate # KQL detection rules GET /briefing/generate # CISO briefing document POST /extract # LLM CTI entity extraction

Section IV

Fusion — Technical Architecture

融合 — Cross-Domain Intelligence

Backend: fusion_app.py

FastAPI with 107 endpoints. Where Signal is deep on cyber, Fusion is wide across domains. The same Claims engine, the same graph store, but pointing at a much broader universe of data — and a set of analytical modules designed to find the connections between domains that no single-domain tool would ever surface.

84 Ingesters Across 8 Domains

Cyber & Vulnerability 14

NVD CVEs, CISA KEV, OTX, EPSS, ThreatFox, URLhaus, Feodo, MalwareBazaar, CrowdSec, Phishtank, OpenPhish, CIRCL MISP, GitHub Advisories, Exploit-DB

Geopolitical 18

GDELT, ACLED, GTD, SIPRI arms transfers, GPI, INFORM Risk, OpenSanctions, OFAC SDN, ReliefWeb, FEWS NET, ND-GAIN, GPR Index, World Bank WGI, UNHCR, V-Dem, RSS (geopolitical)

Economic 9

FRED (macro + GSCPI), IMF WEO, World Bank, WTO trade, commodity prices, ILOSTAT, UN Comtrade

Social Intelligence 8

Twitter/X, Reddit, Mastodon, Telegram, Bluesky, RSS, Google Trends, Mastodon Trending

Environmental & Health 7

NASA FIRMS (wildfires), GDACS, EM-DAT, natural disasters, Safecast (radiation), WHO outbreaks, WHO GHO

Military & Specialty 6

Shodan intel, AIS maritime (AISstream), OpenSky flights, software registries (NPM, PyPI, GitHub)

Detection & Standards 3

MITRE ATT&CK TAXII, NIST CPE dictionary, EPSS probability scores

Core Modules

15 modules in core/, with analytical capabilities tuned for cross-domain fusion:

narrative.pyTF-IDF + DBSCAN clustering, coordination scoring, LLM labels, reality divergence
fusion_ml.pyCross-domain ML algorithms, hidden connections, signal correlations
twins.pyAdversary digital twins, Monte Carlo kill-chain simulation, wargaming
emergent.py7 detectors: TTP convergence, infra overlap, community drift, velocity anomaly, cascade, cross-domain bridges, prediction materialisation
datalab.pySelf-service graph CRUD, bulk import/export, saved queries, audit logging
forecast.pyMulti-horizon forecasting, scenario planning, sector analysis
process_mining.pyAttack flow discovery, conformance checking, dwell time analysis
ioc_extract.pyIOC pattern extraction from unstructured text
predictions.pyPrediction engine with historical tracking & verification
schema.pyNeo4j schema definitions & constraint management

Narrative Clustering

Fusion's signature analytical capability. narrative.py ingests social posts from all platforms, vectorises them with TF-IDF (5,000 features, bigram support), clusters with DBSCAN, and runs coordination scoring to detect information operations. An LLM labels each cluster's theme, then the engine compares narratives against GDELT ground-truth events for a reality divergence score — how far is the online narrative drifting from what's actually happening?

Security Scanner

Built-in web scanner orchestrating Nuclei (template-based vulnerability scanning), testssl.sh (TLS assessment), and httpx (HTTP probing). Findings write back to the graph as SecurityFinding nodes linked to Domain and Vulnerability nodes. Single-scan queueing with thread-safe locking and cancellation support.

Frontend: Cross-Domain Dashboard

Next.js 16 / React 19 with a different approach to Signal's window manager. Fusion uses an AppShell with sidebar navigation — globe view, threat dashboard, social intelligence, narrative analysis, DataLab, scanner, twins, forecasting. Day/night mode toggle (warm paper palette in day, dark stealth in night). No server-side middleware — auth is handled client-side via the AppShell component.

Sample API Endpoints

Selected Endpoints (107 total)
# Cross-Domain Analysis GET /ml/narrative-clusters # Social narrative clustering GET /ml/narrative-divergence # Reality vs narrative drift GET /ml/cross-domain-anomalies # Cross-domain anomaly detection GET /ml/hidden-connections # Latent graph relationships GET /ml/mega-risks # Compound multi-domain risks # Intelligence Products GET /intel/sitrep # Situation report GET /briefing # Daily intelligence briefing GET /actor/{name}/dossier # Full actor dossier GET /globe/data # 3D globe risk overlay # Social Intelligence GET /social/feed # Cross-platform social feed GET /social/entity/{type}/{key} # Entity social footprint POST /niko/chat # Niko AI analyst assistant # Scanner POST /scanner/scan # Launch web security scan GET /scanner/results # Scan findings # Forecasting GET /forecast/{horizon} # Multi-horizon threat forecast POST /forecast/scenario # Scenario planning

Section V

The Stealth Stack

基盤 — Infrastructure

Container Topology

Each app follows the same Docker Compose pattern: Neo4j 5 + FastAPI backend + Next.js frontend. Signal adds Caddy (reverse proxy for all domains) and NATS (event bus). Some apps add Redis for real-time features.

Caddy
Reverse Proxy
HTTPS termination
11 domains
Security shield
Next.js
UI Container
React 19 SSR
Auth middleware
API routes
FastAPI
API Container
Python 3.14
225+ endpoints
WebSocket
Neo4j 5
Graph Database
1M+ nodes
Bolt protocol
APOC & GDS
NATS
Event Bus
JetStream
9 subjects
Cross-app pub/sub

Caddy Routing

One Caddy instance in Signal's Docker Compose handles HTTPS termination for all 11 production domains. The routing logic is subtle and ordering matters:

Caddy Routing Logic (simplified)
# 1. Next.js API routes go to UI container handle /api/auth/* → ui:3000 handle /api/admin/* → ui:3000 handle /api/sso/* → ui:3000 # 2. All other /api/* go to Python backend handle_path /api/* → api:18011 # strips /api prefix # 3. Everything else goes to Next.js handle /* → ui:3000

Critical detail: New Next.js API routes must be added to the Caddy config before the handle_path /api/* catch-all, or they'll be incorrectly routed to the Python backend.

Security Shield

All 11 domains import a shared Caddy (security_shield) snippet:

Security Shield Rules
# Block common attack patterns .git/* → 404 # VCS probe *.php → 404 # PHP probe wp-* → 404 # WordPress probe Empty UA → abort # Drop connection # Response headers Content-Security-Policy: default-src 'self' ... Permissions-Policy: camera=(), microphone=() ... X-Content-Type-Options: nosniff Request body limit: 10MB

fail2ban

Three jails watching Caddy access logs:

JailTriggerBan Duration
caddy-scanner5 blocked paths in 10 min24 hours
caddy-auth10 auth failures in 5 min1 hour
caddy-aggressive50 404s in 5 min12 hours

Network Architecture

All apps run on a single Hetzner dedicated server. Non-RTM containers join RTM's Docker network (rapid-threat-modeler_default) to access the shared Caddy proxy. Each app has its own Neo4j instance on a unique Bolt port. In production, Neo4j and API ports are not exposed to the host — only Caddy's 443 is public.

Port Allocation
# API Ports (internal only in prod) Signal: 18011 Fusion: 18012 Raz0r: 18013 ANTOS: 18014 Kin0bi: 18015 Nexus: 18016 1D: 18017 V01d: 18018 V0id: 18019 Range: 18020 Knox: 18021 Social: 18022 War Room:18023 # Neo4j Bolt Ports (internal only in prod) Signal: 17687 Fusion: 17688 Raz0r: 17689 Kin0bi: 17690 Nexus: 17691 1D: 17692 V01d: 17693 V0id: 17694 Range: 17695 Social: 17696 War Room:17697 # Public (Caddy) HTTPS: 443

Deployment Pattern

Every deploy follows the same sequence. No CI/CD pipeline — deliberate simplicity:

Deploy Sequence
# 1. Push code git push # 2. SSH, pull, rebuild ssh root@server cd /opt/{app} git pull docker compose -f docker-compose.yml \ -f docker-compose.prod.yml \ up --build -d api ui # 3. If non-RTM app: restart Caddy from RTM dir cd /opt/rapid-threat-modeler docker compose -f docker-compose.yml \ -f docker-compose.prod.yml \ restart caddy

Section VI

The Full Arsenal

全体系 — Ecosystem

15 systems. One SSO. One event bus. One graph mindset. Each built for a specific intelligence domain, all designed to share context through the graph and NATS.

#SystemDomainOne-liner
1SignalCTIThreat graph, 225 endpoints, 32 ML modules, 83 windows
2FusionCross-domain84 ingesters across 8 domains, narrative clustering, scanner
3Raz0rSIEMRust EDR agent, ransomware predictor, cross-node correlator
4ANTOSUXEmbedded at /antos — unified analyst desktop
5Kin0biFinancialReal-time crypto/stocks/forex, anomaly detection, portfolio risk
6NexusOSINTSuspicion propagation, money flow, UBO resolution, sanctions
71DIdentityBloodHound/LDAP/Azure AD graph, attack paths, kerberoastable
8V01dSentimentGDELT/RSS/Reddit/FRED pipeline, Oracle score, ninjaTONE
9V0idAgents3 autonomous agents (Sentinel, Warden, Spectre), IR playbooks
10Los AlamosWargamingRed vs Blue agentic range, LLM-driven adversaries, ELO scoring
11KnoxSecretsVault, crypto toolkit, privacy engine, TOTP authenticator
12SocialCollaborationTI messaging, IOC auto-detect, encrypted channels, NATS feed
13War RoomIRLiveKit video, shared timelines, IOC panel, breach tracker
14NinjaClawCLIHardened CLI agent, 10 scanners, CIS rules, Signal intel link
15GITAIRDevSecOpsGit security scanning & air-gapped repository management

SSO: One Identity Everywhere

Every app implements the same SSO handshake. When a user is authenticated in Signal and clicks through to Fusion, the flow is:

Signal JWT cookie
/api/auth/sso Generate token
Redirect ?sso_token=...
Fusion /api/auth/sso-exchange
Set Cookie ninja-fusion-token

Each app has its own cookie name to avoid conflicts. The JWT payload is verified server-side. No shared session store — just cryptographic trust.

NATS Event Bus

JetStream provides durable, at-least-once delivery across all apps. 9 subject hierarchies:

NATS Subjects
signal.* # Signal threat events fusion.* # Fusion cross-domain events razor.* # SIEM detections & alerts v01d.* # Sentiment & Oracle updates nexus.* # OSINT investigation events kin0bi.* # Financial anomalies id.* # Identity exposure events v0id.* # Agent actions & findings ecosystem.* # System-wide coordination

When Raz0r detects a suspicious process, it publishes to razor.alert. V0id agents subscribe and auto-triage. Signal enriches the IOC. Fusion correlates with geopolitical context. All without any app knowing about the others — just messages on a bus.

Galaxy Visualization

Signal's /galaxy/data endpoint samples ~8,000 nodes from the ML graph, groups them by label, and computes a 3D radial cluster layout. V01d's ninjaTONE page renders this as a Three.js point cloud — a galaxy of threats you can fly through, click, and explore. It's not just pretty. It's the entire threat landscape in one view.


Section VII

Subject to Change

進化 — Evolution

This document describes the architecture as of March 2026. It will change. The ecosystem is alive — new ingesters, new ML modules, new analytical capabilities ship regularly. What won't change is the core philosophy: one graph, atomic claims, domain fusion.

The hardest problem in security isn't detection. It's connection. Every tool in this ecosystem exists to make one more connection visible — between an IOC and an actor, between an actor and a campaign, between a campaign and the geopolitical event that triggered it. When all those connections live in one graph, you stop reacting to alerts and start understanding adversaries.

The graph doesn't give you answers. It gives you the right questions — and the traversals to find them.