Changelog
Changelog
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[0.3.0] - 2026-03-28
Added
- Advanced Local ML Scanners: Introduced powerful offline Machine Learning / NLP modules.
- Aho-Corasick Algorithm: Implemented finite-state automaton for O(n) exact string matching on known
T4_PROMPT_INJECTIONpayloads. - Local BERT Pipeline: Embedded zero-day deep learning text-classification (
huggingface,sentence-transformers) for detecting polymorphic prompt and ATS manipulations. - TF-IDF & Jaccard Similarity: Leveraged
scikit-learnto identify keyword stuffing and statistical term deviations (T5_RANKING_MANIPULATIONandT9_ATS_MANIPULATION). - Shannon Entropy Scoring: Integrated structured mathematical calculations to detect hardcoded API Keys, Passwords, and Data Exfiltration streams.
- Dynamic Feature Flags: Added granular explicit opt-ins via
ScanConfig(enable_advanced_ahocorasick,enable_advanced_bert, etc.) safely defaulting to False for backwards compatibility. - Examples: Included isolated feature scripts (
08_advanced_ml_scanners.py) and fully stacked maximum security scripts (09_recommended_advanced_scan.py).
Changed
- Shifted project distribution state to
Development Status :: 5 - Production/Stable. - Fixed several legacy test expectations that failed under optimized false-positive bounds tuning.
- Resolved top-level GitHub Actions scorecard vulnerability by adopting strict job-level
contentspermissions on PyPI build matrix. Atherispipeline dependencies synchronized/bumped to3.0.0.
[0.2.0] - 2026-03-08
Added
- PPTX Support: Full layout mapping, recursive embedded object tracking, and metadata extraction for Microsoft PowerPoint presentations.
- XLSX Support: Full spreadsheet parsing, cell value extraction, and DDE link (Active Content) detection for Microsoft Excel files.
- T2 (Active Content): Refined scanning capabilities to natively track dynamic external payload queries in
.pptxand.xlsx. - T3 (Obfuscation): Added dynamic ratio thresholding for hidden zero-width unicode characters specific to nested cells/slides.
- T8 (Metadata Injection): Injected deep inspection support to flag embedded SQL queries and malicious command strings hidden in format properties.
- Overlapping Threat Architecture: Allowed internal detection schemas to transparently track dual-state threat classifications (i.e.,
T9_ATS_MANIPULATIONwhen utilizingT3_OBFUSCATION).
Changed
- Refactored
Scanner()initialization to consistently load the complete suite of detector arrays globally (resolving missing isolated threat models). - Enhanced exact threshold scaling across
text_obfuscation.pyto heavily reduce False Negatives.
[0.1.0] - 2026-02-22
Added
- Initial Open Source release of the
doc_firewallscanning engine. - Supported core structures: Microsoft Word (
.docx) and Adobe Standard (.pdf). - Configured 9 Primary Threat Models (
T1throughT9). - Incorporated ClamAV integration functionality.
- Shipped MkDocs documentation bindings.