Fingerprinting System

Pentora's fingerprinting system identifies services, applications, versions, and operating systems through a layered detection approach that combines heuristics, protocol-specific probes, and confidence scoring.

Overview

Service fingerprinting goes beyond simple banner matching. Pentora employs a multi-stage approach:

Initial heuristics - Port number and basic banner
Protocol-specific probes - Targeted requests per protocol
Confidence scoring - Aggregate evidence from multiple sources
Multiple match support - Surface all detected technologies

Layered Detection

Layer 1: Initial Heuristics

First-pass identification using readily available information:

Port-Based Heuristics:

Port 22   → Likely SSH
Port 80   → Likely HTTP
Port 443  → Likely HTTPS
Port 3306 → Likely MySQL

Banner Matching:

"SSH-2.0-OpenSSH_8.2p1" → OpenSSH 8.2p1
"220 mail.example.com ESMTP Postfix" → Postfix SMTP

Confidence: Low to Medium (30-60%)

Port heuristics alone: 30-40% confidence
Simple banner match: 50-60% confidence

Layer 2: Protocol-Specific Probes

Targeted probes confirm and refine Layer 1 guesses:

HTTP/HTTPS Probes

GET / HTTP/1.1
Host: target.com
User-Agent: Pentora/1.0
Accept: */*
Connection: close

Analyzed Headers:

Server: Web server identification (e.g., nginx/1.18.0)
X-Powered-By: Application framework (e.g., PHP/7.4.3)
X-AspNet-Version: ASP.NET version
X-Generator: CMS identification (e.g., WordPress 5.8)

Content Analysis:

HTML comments: 
Meta tags: <meta name="generator" content="Drupal 9">
JavaScript frameworks: Detect React, Angular, Vue.js
CSS framework signatures: Bootstrap, Tailwind

Confidence: Medium to High (60-90%)

HTTPS/TLS Probes

TLS ClientHello → ServerHello + Certificate

Analyzed Fields:

Certificate Common Name (CN) and Subject Alternative Names (SAN)
Issuer information
TLS version (TLS 1.2, TLS 1.3)
Cipher suites offered and selected
Extensions (SNI, ALPN, Session tickets)

Identification:

JA3/JA3S fingerprints for TLS client/server
Certificate issuer patterns (Let's Encrypt, DigiCert)
Self-signed detection

Confidence: Medium to High (60-85%)

SMTP/IMAP/POP3 Probes

SMTP:

EHLO pentora.scanner

Response:

250-mail.example.com
250-PIPELINING
250-SIZE 52428800
250-STARTTLS
250 ENHANCEDSTATUSCODES

Identifies:

SMTP server (Postfix, Exim, Sendmail)
Supported extensions
TLS support

IMAP:

A001 CAPABILITY

Response:

* CAPABILITY IMAP4rev1 LITERAL+ SASL-IR LOGIN-REFERRALS ID ENABLE IDLE AUTH=PLAIN AUTH=LOGIN
A001 OK Capability completed.

Confidence: High (75-90%)

FTP Probes

Connect → Read Banner
SYST → Get System Type

Banner:

220 ProFTPD 1.3.6 Server (Debian)

SYST Response:

215 UNIX Type: L8

Identifies:

FTP server (ProFTPD, vsftpd, Pure-FTPd)
Operating system
Anonymous access availability

Confidence: High (80-95%)

Redis Probes

INFO SERVER

Response:

# Server
redis_version:6.2.6
redis_mode:standalone
os:Linux 5.10.0-8-amd64 x86_64

Confidence: Very High (90-95%)

SSH Probes

Connect → Read SSH banner

Banner:

SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.3

Key Exchange Analysis:

Supported algorithms
Encryption methods
Compression

Confidence: High (80-90%)

Layer 3: Confidence Scoring

Aggregate evidence from multiple sources:

{
  "host": "192.168.1.100",
  "port": 80,
  "fingerprints": [
    {
      "match": "nginx",
      "version": "1.18.0",
      "confidence": 95,
      "source": "http_header",
      "evidence": "Server: nginx/1.18.0"
    },
    {
      "match": "ubuntu",
      "version": "20.04",
      "confidence": 80,
      "source": "http_header",
      "evidence": "Server: nginx/1.18.0 (Ubuntu)"
    },
    {
      "match": "php",
      "version": "7.4.3",
      "confidence": 90,
      "source": "http_header",
      "evidence": "X-Powered-By: PHP/7.4.3"
    },
    {
      "match": "wordpress",
      "version": "5.8",
      "confidence": 85,
      "source": "http_content",
      "evidence": "<meta name=\"generator\" content=\"WordPress 5.8\">"
    }
  ]
}

Confidence Levels:

0-30%: Weak guess (port heuristic only)
30-60%: Low confidence (single weak signal)
60-80%: Medium confidence (single strong signal)
80-95%: High confidence (multiple corroborating signals)
95-100%: Very high confidence (explicit version strings)

Layer 4: Multiple Match Support

Pentora surfaces all detected technologies, not just the primary service:

Web Server Stack Example:

Port 443/tcp open
├── nginx 1.18.0 (Web Server, 95% confidence)
├── PHP 7.4.3 (Runtime, 90% confidence)
├── WordPress 5.8 (CMS, 85% confidence)
├── MySQL 8.0 (Database, inferred from PHP/WordPress, 70% confidence)
└── Ubuntu 20.04 (OS, 80% confidence)

Benefits:

Complete technology stack visibility
Better vulnerability correlation
Comprehensive asset inventory

Fingerprint Database

Builtin Rules

Compiled into Pentora binary:

# builtin fingerprints
fingerprints:
  - name: openssh
    category: ssh
    patterns:
      - type: banner
        regex: 'SSH-2\.0-OpenSSH_([0-9.]+)'
        version_group: 1
      - type: banner
        regex: 'SSH-2\.0-OpenSSH_([0-9.]+p[0-9]+) Ubuntu-([0-9.]+)'
        version_group: 1
        os: ubuntu
        os_version_group: 2

  - name: nginx
    category: http
    patterns:
      - type: http_header
        header: Server
        regex: 'nginx/([0-9.]+)'
        version_group: 1
        confidence: 95

  - name: apache
    category: http
    patterns:
      - type: http_header
        header: Server
        regex: 'Apache/([0-9.]+)'
        version_group: 1
        confidence: 95

Remote Catalogs

Sync updated fingerprints from remote repository:

# Sync from default catalog
pentora fingerprint sync

# Sync from custom URL
pentora fingerprint sync --url https://custom.repo/fingerprints.yaml

# Show available catalogs
pentora fingerprint list-catalogs

Cached Location: <workspace>/cache/fingerprints/

Update Frequency: Configurable TTL (default: 7 days)

fingerprint:
  cache:
    ttl: 7d
    auto_sync: true
  catalog:
    remote_url: https://catalog.pentora.io/fingerprints.yaml

Custom Fingerprints

Add organization-specific rules:

# ~/.config/pentora/fingerprints/custom.yaml
fingerprints:
  - name: internal_webapp
    category: http
    patterns:
      - type: http_header
        header: X-App-Name
        regex: 'InternalApp/([0-9.]+)'
        version_group: 1
        confidence: 95

  - name: custom_ssh_banner
    category: ssh
    patterns:
      - type: banner
        regex: 'SSH-2\.0-CustomSSH_([0-9.]+)'
        version_group: 1
        confidence: 90

Load custom rules:

pentora scan --targets 192.168.1.100 --fingerprint-rules custom.yaml

See Custom Fingerprints Guide for rule syntax.

Probe Execution

Probe Definitions

Probes defined in YAML catalog:

# pkg/fingerprint/data/probes.yaml
probes:
  - name: http_get
    protocol: http
    trigger:
      - port: 80
      - port: 8080
      - service_hint: http
    request: |
      GET / HTTP/1.1
      Host: {target}
      User-Agent: Pentora/1.0
      Accept: */*
      Connection: close

    timeout: 5s
    max_size: 1MB

  - name: https_get
    protocol: https
    trigger:
      - port: 443
      - port: 8443
      - service_hint: https
    tls: true
    request: |
      GET / HTTP/1.1
      Host: {target}
      User-Agent: Pentora/1.0
      Accept: */*
      Connection: close

    timeout: 10s

  - name: smtp_ehlo
    protocol: smtp
    trigger:
      - port: 25
      - port: 587
      - service_hint: smtp
    request: "EHLO pentora.scanner\r\n"
    timeout: 5s

  - name: imap_capability
    protocol: imap
    trigger:
      - port: 143
      - port: 993
      - service_hint: imap
    request: "A001 CAPABILITY\r\n"
    timeout: 5s

  - name: redis_info
    protocol: redis
    trigger:
      - port: 6379
      - service_hint: redis
    request: "INFO SERVER\r\n"
    timeout: 3s

Trigger Logic

Probes execute based on:

Port number: Standard ports trigger specific probes
Service hints: Layer 1 guesses influence probe selection
Explicit requests: User specifies protocols to probe

Example Flow:

Port 443 detected open
  ↓
Layer 1: Port heuristic → Likely HTTPS (40% confidence)
  ↓
Trigger: https_get, tls_fingerprint probes
  ↓
Execute probes → Collect evidence
  ↓
Layer 2: Parse HTTP headers, TLS certificate
  ↓
Fingerprint match: nginx 1.18.0 (95% confidence)

Probe Priority

When multiple protocols possible, probe in order:

TLS/SSL: Always probe first on common TLS ports
HTTP/HTTPS: High priority for web services
Email protocols: SMTP, IMAP, POP3
Databases: Redis, MySQL, PostgreSQL, MongoDB
Other services: FTP, SSH, Telnet

Max protocols per port: Configurable (default: 3)

fingerprint:
  max_protocols: 3  # Stop after 3 successful identifications

Response Handling

Each probe captures:

Raw response: Complete protocol output
Timing: Response latency
Status: Success, timeout, error
Evidence fields: Extracted data (headers, banners, etc.)

Stored in artifacts/banners/:

168.1.100-80-http.txt
168.1.100-443-https.txt
168.1.100-22-ssh.txt

Fingerprint Matching

Rule Processing

For each captured response:

Select applicable rules: Match protocol and category
Apply patterns: Test regex against response
Extract version: Capture groups for version/OS
Score confidence: Based on pattern specificity
Deduplicate: Merge redundant matches

Regex Patterns

Named capture groups extract version information:

patterns:
  - type: banner
    regex: 'Apache/(?P<version>[0-9.]+) \((?P<os>[^)]+)\)'
    confidence: 90

Match: Apache/2.4.41 (Ubuntu)

Extracted:

version: 2.4.41
os: Ubuntu
confidence: 90

Confidence Calculation

Base confidence from pattern, adjusted by:

+10%: Multiple corroborating signals +5%: Explicit version string (not just product name) -10%: Ambiguous match (many possible products) -20%: Port heuristic only

Example:

Base pattern confidence: 85
+ Version string present: +5
+ HTTP header match: +10
= Final confidence: 100 (capped at 100)

Deduplication

Merge redundant matches:

Before:
- nginx 1.18.0 (http_header, 95%)
- nginx 1.18.0 (http_content, 80%)

After:
- nginx 1.18.0 (http_header, http_content, 95%)

Highest confidence retained, sources combined.

CLI Integration

Basic Fingerprinting

Enabled by default in standard scans:

pentora scan --targets 192.168.1.100

Output includes fingerprints:

192.168.1.100:80 open
  Service: nginx 1.18.0 (95% confidence)
  OS: Ubuntu 20.04 (80% confidence)
  Stack: PHP 7.4.3 (90% confidence)

Disable Fingerprinting

Skip fingerprinting for faster scans:

pentora scan --targets 192.168.1.100 --no-fingerprint

Only port states reported, no service identification.

Fingerprint Cache

Use cached fingerprint database:

# Enable caching (faster, may be outdated)
pentora scan --targets 192.168.1.100 --fingerprint-cache

# Force refresh
pentora fingerprint sync --force

Custom Rules

Load additional rules:

pentora scan --targets 192.168.1.100 --fingerprint-rules /path/to/custom.yaml

Verbose Output

Show all fingerprint matches and confidence scores:

pentora scan --targets 192.168.1.100 --verbose

Performance Considerations

Probe Overhead

Each protocol probe adds latency:

Simple probe (Redis INFO): ~10-50ms
HTTP GET: ~50-200ms
HTTPS with TLS handshake: ~100-500ms
Complex multi-stage probe: ~200-1000ms

Total fingerprinting time: 5-30 seconds per host depending on open ports.

Optimization Strategies

1. Limit Probe Count

fingerprint:
  max_protocols: 2  # Stop after 2 successful IDs per port

2. Parallel Probing

Probe multiple ports simultaneously:

fingerprint:
  probe_concurrency: 10  # Probe up to 10 ports in parallel

3. Cache Results

Reuse fingerprints for known hosts:

fingerprint:
  cache:
    enabled: true
    ttl: 24h

4. Skip Low-Priority Ports

Focus on interesting services:

fingerprint:
  skip_ports:
    - 1-1023  # Skip well-known ports if time-constrained

Memory Usage

Fingerprint catalog loaded into memory:

Builtin rules: ~1-5 MB
Remote catalog: ~5-20 MB
Custom rules: Variable

Per-scan memory: ~10-100 KB per host depending on open ports and responses.

Integration with Asset Profiling

Fingerprints feed into asset profiling:

Fingerprints:
  - nginx 1.18.0
  - PHP 7.4.3
  - WordPress 5.8
  - Ubuntu 20.04

Asset Profile:
  Device Type: Server
  OS: Linux (Ubuntu 20.04)
  Primary Function: Web Server
  Applications:
    - nginx (Web Server)
    - PHP (Runtime)
    - WordPress (CMS)
  Risk Factors:
    - Publicly accessible
    - CMS detected (attack surface)
    - PHP version (check for CVEs)

See Asset Profiling for details.

Troubleshooting

No Fingerprints Detected

Port 80 open, but service unknown

Causes:

Custom/obscure service
Banner stripped for security
Probe timeout

Solutions:

Increase timeout: fingerprint.timeout: 10s
Add custom rule for service
Use manual banner grab: nc target 80

Incorrect Fingerprint

Port 8080 identified as Tomcat, but actually Jetty

Solutions:

Check probe output: Review artifacts/banners/
Add higher-confidence rule for Jetty
Report false positive to Pentora team

Probe Timeouts

WARN Fingerprint probe timeout on 192.168.1.100:443

Causes:

Slow network
Rate limiting
Firewall interference

Solutions:

Increase timeout: fingerprint.timeout: 15s
Reduce concurrency: fingerprint.probe_concurrency: 5
Retry failed probes: fingerprint.retry: 2

Next Steps

Scan Pipeline - How fingerprinting fits in the pipeline
Custom Fingerprints - Writing custom rules
Module System - Fingerprint module internals
Vulnerability Assessment - Using fingerprints for CVE matching

Overview​

Layered Detection​

Layer 1: Initial Heuristics​

Layer 2: Protocol-Specific Probes​

HTTP/HTTPS Probes​

HTTPS/TLS Probes​

SMTP/IMAP/POP3 Probes​

FTP Probes​

Redis Probes​

SSH Probes​

Layer 3: Confidence Scoring​

Layer 4: Multiple Match Support​

Fingerprint Database​

Builtin Rules​

Remote Catalogs​

Custom Fingerprints​

Probe Execution​

Probe Definitions​

Trigger Logic​

Probe Priority​

Response Handling​

Fingerprint Matching​

Rule Processing​

Regex Patterns​

Confidence Calculation​

Deduplication​

CLI Integration​

Basic Fingerprinting​

Disable Fingerprinting​

Fingerprint Cache​

Custom Rules​

Verbose Output​

Performance Considerations​

Probe Overhead​

Optimization Strategies​

1. Limit Probe Count​

2. Parallel Probing​

3. Cache Results​

4. Skip Low-Priority Ports​

Memory Usage​

Integration with Asset Profiling​

Troubleshooting​

No Fingerprints Detected​

Incorrect Fingerprint​

Probe Timeouts​

Next Steps​

Overview

Layered Detection

Layer 1: Initial Heuristics

Layer 2: Protocol-Specific Probes

HTTP/HTTPS Probes

HTTPS/TLS Probes

SMTP/IMAP/POP3 Probes

FTP Probes

Redis Probes

SSH Probes

Layer 3: Confidence Scoring

Layer 4: Multiple Match Support

Fingerprint Database

Builtin Rules

Remote Catalogs

Custom Fingerprints

Probe Execution

Probe Definitions

Trigger Logic

Probe Priority

Response Handling

Fingerprint Matching

Rule Processing

Regex Patterns

Confidence Calculation

Deduplication

CLI Integration

Basic Fingerprinting

Disable Fingerprinting

Fingerprint Cache

Custom Rules

Verbose Output

Performance Considerations

Probe Overhead

Optimization Strategies

1. Limit Probe Count

2. Parallel Probing

3. Cache Results

4. Skip Low-Priority Ports

Memory Usage

Integration with Asset Profiling

Troubleshooting

No Fingerprints Detected

Incorrect Fingerprint

Probe Timeouts

Next Steps