Scan Pipeline
Pentora's scan pipeline is a structured 9-stage process that transforms raw targets into actionable security intelligence. Each stage builds upon the previous, creating a data flow from initial target specification to final reporting.
Pipeline Overview
┌─────────────────────┐
│ 1. Target Ingestion │
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ 2. Asset Discovery │
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ 3. Port Scanning │
└──────────┬──────────┘
│
┌──────────▼────────────┐
│ 4. Service │
│ Fingerprinting │
└──────────┬────────────┘
│
┌──────────▼──────────┐
│ 5. Asset Profiling │
└──────────┬──────────┘
│
┌──────────▼────────────┐
│ 6. Vulnerability │
│ Evaluation │
└──────────┬────────────┘
│
┌──────────▼────────────┐
│ 7. Compliance & │
│ Risk Scoring │
└──────────┬────────────┘
│
┌──────────▼────────────┐
│ 8. Reporting & │
│ Notification │
└──────────┬────────────┘
│
┌──────────▼────────────┐
│ 9. Archival & │
│ Analytics │
└───────────────────────┘
Stage 1: Target Ingestion
Purpose: Parse, validate, and prepare target specifications for scanning.
Operations:
-
Parse input formats:
- Single IP:
192.168.1.100 - CIDR notation:
192.168.1.0/24 - IP ranges:
192.168.1.1-192.168.1.254 - Hostnames:
example.com - From file:
--target-file targets.txt
- Single IP:
-
Expand CIDR ranges:
- Convert
/24→ 256 individual IPs - Apply pagination for large ranges to avoid memory exhaustion
- Convert
-
Apply blocklists:
- Filter RFC1918 private ranges (configurable)
- Exclude user-defined blocklists
- Skip broadcast/network addresses
-
Validate targets:
- DNS resolution for hostnames
- Validate IP format
- Check for duplicates
Output: Sanitized list of IP addresses ready for scanning.
Configuration:
scanner:
target_expansion:
max_cidr_size: /16 # Largest allowed CIDR
resolve_hostnames: true
blocklists:
- 127.0.0.0/8 # Loopback
- 169.254.0.0/16 # Link-local
CLI Control: Target ingestion always runs; configure via --targets or --target-file.
Stage 2: Asset Discovery
Purpose: Identify live hosts before performing expensive port scans.
Methods:
-
ICMP Echo (Ping):
- Send ICMP ECHO_REQUEST
- Requires raw socket permissions (
CAP_NET_RAWor root) - Fast but can be blocked by firewalls
-
ARP Discovery (local networks):
- Layer 2 discovery for same-subnet targets
- Cannot be blocked by host firewalls
- Only works on local network segments
-
TCP SYN Ping:
- Send TCP SYN to common ports (80, 443, 22)
- Useful when ICMP is blocked
- Requires raw sockets
Discovery Profiles:
fast: ICMP onlystandard: ICMP + ARP (local nets)deep: ICMP + ARP + TCP SYN probestcp: TCP SYN only (no ICMP)
Output: List of responsive hosts with response times.
Configuration:
discovery:
profile: standard
timeout: 2s
retry: 2
icmp:
enabled: true
count: 2
arp:
enabled: true
tcp_probe:
enabled: false
ports: [80, 443, 22, 25]
CLI Control:
# Discovery only
pentora scan --targets 192.168.1.0/24 --only-discover
# Skip discovery (targets known to be live)
pentora scan --targets 192.168.1.100 --no-discover
# Custom discovery profile
pentora scan --targets 192.168.1.0/24 --discover-profile deep
Performance: Discovers 1000 hosts in ~10-30 seconds depending on profile and network conditions.
Stage 3: Port Scanning
Purpose: Identify open TCP/UDP ports on discovered hosts.
Scanning Methods:
-
TCP SYN Scan (default):
- Send SYN packet, analyze SYN-ACK response
- Stealthy (no full connection)
- Requires raw sockets
-
TCP Connect Scan:
- Full 3-way handshake
- No special permissions needed
- Leaves connection logs
-
UDP Scan:
- Send UDP probes, check ICMP port unreachable
- Slow due to rate limiting
- Often requires protocol-specific payloads
Port Selection:
- Quick profile: Top 100 common ports
- Standard profile: Top 1000 ports (Nmap default)
- Deep profile: All 65,535 ports
- Custom: User-specified port list
Concurrency & Rate Limiting:
scanner:
rate: 1000 # Packets per second
concurrency: 100 # Parallel targets
timeout: 3s
retry: 1
ports:
profile: standard
custom: [80, 443, 8080, 8443]
Dark Subnet Detection:
- Identifies networks with no responses (all packets dropped)
- Triggers timeout backoff to avoid wasting time
- Logged for operator awareness
Output: List of open ports per host with state (open/closed/filtered).
CLI Control:
# Use predefined profile
pentora scan --targets 192.168.1.100 --profile quick
# Scan specific ports
pentora scan --targets 192.168.1.100 --ports 80,443,8080
# Scan port range
pentora scan --targets 192.168.1.100 --ports 1-1024
# Adjust rate limit
pentora scan --targets 192.168.1.0/24 --rate 500
Performance: Scans 1000 ports on a single host in ~5-10 seconds at default rate.
Stage 4: Service Fingerprinting
Purpose: Identify the service, application, version, and OS running on open ports.
Layered Detection:
Layer 1: Initial Heuristics
- Port number → service guess (80=HTTP, 22=SSH)
- Initial banner capture (connect and read)
- Basic pattern matching
Layer 2: Protocol-Specific Probes
Targeted probes based on Layer 1 results:
HTTP/HTTPS:
GET / HTTP/1.1with headers- Parse
Server:,X-Powered-By:,X-AspNet-Version: - Detect frameworks (Laravel, Django, Express)
TLS/SSL:
- TLS handshake
- Extract certificate details (CN, SAN, issuer)
- Identify cipher suites and protocol versions
SMTP/IMAP/POP3:
- Read greeting banner
- Send EHLO/CAPABILITY commands
- Parse extension lists
FTP:
- Read welcome banner
- Send SYST for OS detection
- Check for anonymous access
Redis/Memcached:
- Send INFO command
- Parse version and configuration
Layer 3: Confidence Scoring
Aggregate evidence from multiple sources:
{
"fingerprints": [
{
"match": "nginx",
"version": "1.18.0",
"confidence": 95,
"source": "http_header",
"evidence": "Server: nginx/1.18.0"
},
{
"match": "ubuntu",
"confidence": 80,
"source": "banner",
"evidence": "Ubuntu Linux"
}
]
}
Fingerprint Database:
- Builtin rules compiled into binary
- Cached catalogs in workspace:
<workspace>/cache/fingerprints/ - Sync remote catalogs:
pentora fingerprint sync
Output: Service records with application, version, OS, and confidence scores.
Configuration:
fingerprint:
cache_dir: ${workspace}/cache/fingerprints
probe_timeout: 5s
max_protocols: 3 # Max protocols to probe per port
catalog:
builtin: true
remote_url: https://catalog.pentora.io/fingerprints.yaml
CLI Control:
# Use cached fingerprints
pentora scan --targets 192.168.1.100 --fingerprint-cache
# Update fingerprint catalog
pentora fingerprint sync
See Fingerprinting System for detailed probe specifications.
Stage 5: Asset Profiling
Purpose: Fuse signals from discovery, ports, and fingerprints into a comprehensive asset profile.
Profile Components:
-
Device Classification:
- Server, workstation, network device, IoT, mobile
- Based on open ports, services, OS detection
-
Operating System:
- OS family (Linux, Windows, BSD, macOS)
- Distribution (Ubuntu, CentOS, Windows Server 2019)
- Version and kernel
-
Application Stack:
- Web server (nginx, Apache, IIS)
- Application server (Tomcat, Node.js, Gunicorn)
- Frameworks (Django, Rails, ASP.NET)
- Databases (MySQL, PostgreSQL, MongoDB)
-
Network Function:
- Web server, mail server, DNS server, database server
- Multi-function hosts (e.g., web + database)
Profile Confidence:
- High confidence: Multiple corroborating signals
- Medium confidence: Single strong signal
- Low confidence: Weak heuristics only
Output: Asset inventory records suitable for CMDB integration.
Example profile:
{
"host": "192.168.1.100",
"device_type": "server",
"os": {
"family": "linux",
"distribution": "ubuntu",
"version": "20.04",
"confidence": 90
},
"applications": [
{"name": "nginx", "version": "1.18.0", "role": "web_server"},
{"name": "php", "version": "7.4", "role": "runtime"},
{"name": "mysql", "version": "8.0", "role": "database"}
],
"functions": ["web_server", "database_server"]
}
Stage 6: Vulnerability Evaluation
Purpose: Identify known vulnerabilities (CVEs) and common misconfigurations.
Detection Methods:
CVE Matching
- Match service versions against CVE database
- Consider version ranges and patch levels
- Filter by exploitability and severity (CVSS score)
Misconfiguration Checks
- Default credentials (admin/admin, root/toor)
- Weak protocols (SSLv3, TLSv1.0, FTP)
- Anonymous access (FTP, SMB, Redis)
- Missing security headers (HTTP)
- Exposed admin interfaces
Heuristic Checks
- Outdated software versions
- End-of-life products
- Known vulnerable services (ElasticSearch, MongoDB)
Output: Vulnerability records with severity, CVSS, and remediation.
{
"host": "192.168.1.100",
"port": 80,
"vulnerability": {
"id": "CVE-2021-44228",
"title": "Log4Shell Remote Code Execution",
"severity": "critical",
"cvss": 10.0,
"affected": "Apache Log4j 2.0-2.14.1",
"detected": "2.14.0",
"remediation": "Upgrade to 2.15.0 or set log4j2.formatMsgNoLookups=true"
}
}
CLI Control:
# Enable vulnerability checks
pentora scan --targets 192.168.1.100 --vuln
# Disable vulnerability checks (faster)
pentora scan --targets 192.168.1.100 --no-vuln
Stage 7: Compliance & Risk Scoring
Purpose: Evaluate findings against regulatory frameworks and assign risk scores.
Compliance Frameworks (Enterprise):
- CIS Benchmarks: Center for Internet Security baselines
- PCI DSS: Payment Card Industry Data Security Standard
- NIST 800-53: National Institute of Standards and Technology controls
- HIPAA: Health Insurance Portability and Accountability Act
- ISO 27001: Information security management
Risk Scoring: Calculate risk based on:
- Vulnerability severity: Critical > High > Medium > Low
- Asset value: Critical systems weighted higher
- Exploitability: Public exploits increase risk
- Exposure: Internet-facing vs internal
Risk Formula:
Risk Score = (Severity × Exploitability × Exposure) / Mitigations
Output: Compliance violations and aggregated risk scores.
{
"host": "192.168.1.100",
"compliance": {
"framework": "PCI-DSS",
"violations": [
{
"control": "2.2.4",
"description": "Configure system security parameters",
"finding": "Weak SSL/TLS configuration detected"
}
]
},
"risk_score": 8.5,
"risk_level": "high"
}
CLI Control (Enterprise):
# Run compliance checks
pentora scan --targets cardholder-env.txt --compliance pci-dss
# Multiple frameworks
pentora scan --targets dmz.txt --compliance cis-level1,nist-800-53
Stage 8: Reporting & Notification
Purpose: Generate structured reports and trigger external integrations.
Report Formats:
- JSON: Machine-readable, suitable for SIEM ingestion
- JSONL: Line-delimited for streaming and large datasets
- CSV: Spreadsheet import, executive summaries
- PDF: Executive reports with charts and remediation (Enterprise)
- HTML: Interactive dashboards
Notification Channels:
- Slack: Post scan summaries to channels
- Email: Send reports to stakeholders
- Webhooks: POST results to external systems
- Ticketing: Auto-create Jira/ServiceNow tickets (Enterprise)
- SIEM: Forward to Splunk/QRadar/Elastic (Enterprise)
Notification Rules:
notifications:
- name: critical_vulns
channels: [slack, email]
conditions:
severity: [critical]
asset_tags: [production]
- name: compliance_violations
channels: [jira]
conditions:
compliance_failed: true
Output: Reports written to workspace and external systems notified.
CLI Control:
# Specify output format
pentora scan --targets 192.168.1.100 --output json
# Export to file
pentora scan --targets 192.168.1.100 -o results.csv
# Trigger notifications (server mode)
curl -X POST /api/scans -d '{"targets": [...], "notify": ["slack://security"]}'
Stage 9: Archival & Analytics
Purpose: Store scan results for historical analysis and trend detection.
Workspace Storage:
Results saved to: <workspace>/scans/<scan-id>/
scans/20231006-143022-a1b2c3/
├── request.json # Original scan parameters
├── status.json # Execution metadata
├── results.jsonl # Main results (line-delimited JSON)
├── artifacts/
│ ├── banners/ # Raw banner captures
│ ├── screenshots/ # Web screenshots (if enabled)
│ └── pcaps/ # Packet captures (if enabled)
└── reports/
├── summary.json
├── vulnerabilities.csv
└── executive.pdf
Retention Policies:
workspace:
retention:
enabled: true
max_age: 90d # Delete scans older than 90 days
max_scans: 1000 # Keep at most 1000 scans
min_free_space: 10GB # Delete oldest when space low
Analytics (Enterprise):
- Trend analysis: Compare scans over time
- Diff detection: New/resolved vulnerabilities
- Asset changes: Added/removed services
- Risk trends: Organizational risk over time
- Compliance posture: Historical compliance scores
CLI Control:
# Clean old scans
pentora workspace gc --older-than 30d
# Disable workspace (stateless)
pentora scan --targets 192.168.1.100 --no-workspace
# Custom workspace location
pentora scan --targets 192.168.1.100 --workspace-dir /data/pentora
Pipeline Control
Phase Flags
Control which stages execute:
# Discovery only (stages 1-2)
pentora scan --targets 192.168.1.0/24 --only-discover
# Skip discovery (stages 1, 3-9)
pentora scan --targets 192.168.1.100 --no-discover
# Disable vulnerability checks (stages 1-5, 8-9)
pentora scan --targets 192.168.1.100 --no-vuln
# Full pipeline with all stages
pentora scan --targets 192.168.1.100 --vuln
Profiles
Predefined profiles configure multiple stages:
# Quick: Fast discovery, top 100 ports, no vuln
pentora scan --targets 192.168.1.0/24 --profile quick
# Standard: Standard discovery, top 1000 ports, basic fingerprint
pentora scan --targets 192.168.1.0/24 --profile standard
# Deep: Thorough discovery, all ports, advanced fingerprint, vuln checks
pentora scan --targets 192.168.1.0/24 --profile deep
See Scan Profiles for custom profile creation.
Performance Characteristics
Typical scan times (single host, standard profile):
| Stage | Time | Bottleneck |
|---|---|---|
| Target Ingestion | ~1s | CPU |
| Asset Discovery | 2-5s | Network latency |
| Port Scanning | 5-10s | Rate limiting |
| Service Fingerprinting | 10-30s | Protocol probes |
| Asset Profiling | ~1s | CPU |
| Vulnerability Eval | 5-15s | Database lookups |
| Compliance Scoring | ~1s | CPU |
| Reporting | 1-5s | I/O |
| Archival | ~1s | I/O |
Total: ~25-70 seconds per host depending on open ports and enabled checks.
Large Networks: Parallelism allows scanning 1000 hosts in 10-20 minutes with proper rate limiting.
Error Handling
Each stage can fail independently:
- Fail-fast mode: Stop pipeline on first error
- Continue-on-error: Skip failed stage, continue with available data
- Retry logic: Transient failures retried with exponential backoff
Configuration:
engine:
fail_fast: false # Continue on errors
retry:
enabled: true
max_attempts: 3
backoff: exponential
Dependent Stages: If a stage fails, dependent stages are skipped:
Port Scan FAILED
↓
Banner Grab SKIPPED (no ports)
↓
Fingerprint SKIPPED (no banners)
Reporting stage always runs to capture partial results.
Next Steps
- DAG Engine - How stages are orchestrated
- Fingerprinting - Deep dive into Layer 4
- Workspace - Where results are stored
- Scan Profiles - Customizing pipeline behavior