Performance & Latency Targets
DocFirewall uses a dual-stage scan architecture. Fast scan runs on raw bytes without full parsing (sub-millisecond to ~20 ms). Deep scan runs structural analysis, ML detectors, and YARA rules (latency depends on which detectors are enabled).
Latency Targets by Profile
Targets were measured on a four-core CPU (no GPU) with synthetic representative documents. Run scripts/benchmark_throughput.py to reproduce on your hardware.
| Profile | Target P95 | Detectors active |
|---|---|---|
lenient | < 100 ms | YARA + Aho-Corasick only |
balanced (default) | < 500 ms | YARA + Aho-Corasick + all structural checks |
strict | < 2,000 ms | Above + BERT classifier + steganography checks |
strict + semantic NN | < 3,000 ms | Above + sentence-transformer cosine NN |
Note
BERT (enable_advanced_bert=True) adds ~50 ms per document on CPU at the default bert_max_chunks=20 setting. Reduce bert_max_chunks or run on GPU to lower this.
P50 / P95 / P99 by Format (balanced profile, no ML)
| Format | P50 | P95 | P99 | Notes |
|---|---|---|---|---|
| HTML | < 5 ms | < 15 ms | < 25 ms | Fastest — no ZIP overhead |
| RTF | < 8 ms | < 20 ms | < 35 ms | |
| XLSX | < 12 ms | < 40 ms | < 70 ms | ZIP + XML parse |
| PPTX | < 12 ms | < 40 ms | < 70 ms | |
| DOCX | < 15 ms | < 50 ms | < 90 ms | CustomXML + embeddings scan |
| < 20 ms | < 80 ms | < 150 ms | FlateDecode decompression adds variance |
Archive scanning (enable_archive_scan=True) adds latency proportional to the number and size of archive members. A 50-member ZIP with DOCX files adds roughly 50 × single-file scan time.
Throughput Estimates
| Profile | Target throughput | Notes |
|---|---|---|
balanced (heuristics only) | ≥ 20 docs/s per CPU core | No ML loading overhead |
strict (BERT, CPU) | ≥ 1–2 docs/s per core | BERT inference dominates |
strict (BERT, GPU) | ≥ 10–15 docs/s | Batch inference (bert_max_chunks tuned up) |
Tuning for Throughput
Pre-compile at init time
All YARA rules and regex patterns are compiled once at Scanner.__init__ — not per scan. Instantiate one Scanner per process/thread and reuse it across all scans.
scanner = Scanner(config=ScanConfig(profile="balanced"))
for path in document_queue:
report = scanner.scan(path) # reuses compiled rules and models
Limit BERT chunks
config = ScanConfig(
enable_advanced_bert=True,
bert_max_chunks=10, # default 20 — lower = faster, misses mid-doc injections
)
Reduce archive depth
config = ScanConfig(
enable_archive_scan=True,
limits=Limits(max_archive_depth=1, max_archive_members=20),
)
Disable OCR
enable_ocr_injection_scan is False by default. Leave it off unless your pipeline processes documents from multimodal sources where injection text may be embedded in screenshots.
Use strict profile selectively
Apply strict profile only to untrusted sources. Use balanced or lenient for documents from known-good internal pipelines where throughput is critical.
engine = PolicyEngine("policy.yaml") # hr-intake → strict, internal → balanced
scanner = Scanner(config=ScanConfig(), policy_engine=engine)
Running the Benchmark
# Default: 20 iterations × 6 formats × 2 profiles
python scripts/benchmark_throughput.py
# More iterations for stable P99 estimates
python scripts/benchmark_throughput.py --n 100 --output results.json
# Only benchmark the balanced profile (no ML model downloads needed)
python scripts/benchmark_throughput.py --profiles balanced
Sample output:
── Profile: balanced (n=20 iterations per format) ──
✅ html P50= 3.2ms P95= 8.1ms P99= 11.4ms 287 docs/s
✅ rtf P50= 6.8ms P95= 17.3ms P99= 24.6ms 135 docs/s
✅ xlsx P50= 11.4ms P95= 32.7ms P99= 49.2ms 82 docs/s
✅ pptx P50= 12.1ms P95= 38.4ms P99= 56.0ms 77 docs/s
✅ docx P50= 14.3ms P95= 47.2ms P99= 71.8ms 65 docs/s
✅ pdf P50= 18.9ms P95= 62.5ms P99= 104.3ms 49 docs/s
Latency Breakdown (balanced profile, DOCX)
| Stage | Typical time | Driver |
|---|---|---|
| Magic-byte detection | < 0.1 ms | File header read |
| Fast scan | 2–8 ms | Zip member iteration, regex |
| Deep parse (Docling) | 10–40 ms | XML parse + text extraction |
| Detectors (heuristics) | 3–15 ms | ATS, obfuscation, embedding checks |
| YARA (built-in ruleset) | 1–5 ms | Pre-compiled binary match |
| Audit log write | < 1 ms | Append-only JSONL |
| Total (balanced) | ~20–70 ms | |
| BERT inference (strict) | +40–100 ms | DeBERTa v3, 20 windows |
| Semantic NN (strict+NN) | +15–30 ms | MiniLM-L6, cosine similarity |