CVE-2025-66516: Critical XXE Vulnerability Exposes Apache Tika Deployments
A critical vulnerability, CVE-2025-66516 (CVSS 10.0), has been identified in Apache Tika, affecting how the framework processes PDF files containing XFA (XML Forms Architecture) data. The vulnerability resides in tika-core, which means any system using Tika’s default parsing behavior remains vulnerable even if the PDF parser module was previously patched.
No special configuration or insecure application code is required; simply ingesting a malicious PDF is enough to trigger the exploit. In vulnerable versions, Tika processes attacker-controlled XFA content in a way that allows unauthorized access to sensitive files or internal resources during parsing, making this a high-impact issue for any workflow that handles user-supplied PDFs.
What Is CVE-2025-66516?
Risk Analysis
Severity: CRITICAL
CVSSv3.1: Base Score: 10.0 CRITICAL
Vector: CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:H/VI:H/VA:H/SC:H/SI:H/SA:H
Exploit available in public: No
Exploit complexity: Low
CVE-2025-66516 is a high-severity XXE (XML External Entity) vulnerability caused by improper handling of XML embedded within XFA layers of PDF files. When Apache Tika processes a malicious PDF containing attacker-controlled XFA data, the core parser does not properly restrict external entity resolution. This allows the attacker to embed XML entities that reference sensitive filesystem paths, internal URLs, cloud metadata services, or other restricted resources.
The XXE vulnerability also introduces a path traversal vector, allowing attackers to access arbitrary files on the server during PDF processing.
All of the following versions are impacted:
- tika-core:13 through 3.2.1
- tika-parser-pdf-module: 0.0 through 3.2.1
- tika-parsers (1.x series): 13 through 1.28.5, where the PDF parser was bundled
Any application running these versions is exposed, regardless of how securely the surrounding code is written.
How the XXE Exploit Unfolds in Apache Tika
When Tika encounters a PDF with embedded XFA, it processes the XML to extract text. In vulnerable versions, this XML is parsed without properly restricting external entity resolution. A malicious PDF can therefore:
- Force Tika to read sensitive files from the server
- Trigger outbound network calls to internal systems
- Leak cloud metadata or credentials
- Potentially disrupt document processing services
What makes this vulnerability especially dangerous is that it requires no unsafe code or special configuration. Simply using Tika’s standard parsing APIs such as AutoDetectParser or Tika().parseToString() automatically triggers the vulnerable pathway.
Any workflow that processes PDFs by default, including search indexing, ETL pipelines, content classification, or document preview generation, will parse the malicious XFA content without visibility. Because these operations typically run in the background, the attack executes silently, giving attackers the opportunity to extract sensitive files or probe internal systems long before anyone notices something is wrong.
Even routine background operations, such as automated text extraction or metadata scanning, can unknowingly trigger the exploit.
Preventing XXE Exploitation in Apache Tika
The most reliable fix is upgrading to tika-core 3.2.2 or later, where external entity resolution is properly restricted. This closes the XXE attack path and should be applied as soon as possible. To minimize risk until a full patch is applied, the following measures help protect applications that rely on Apache Tika.
- Disable PDF Parsing if Upgrade Is Delayed – If patching cannot happen immediately, you can temporarily disable PDF parsing through a custom tika-config.xml. This prevents Tika from processing potentially malicious PDFs and avoids triggering the vulnerable code path.
- Preprocess PDFs Before Sending Them to Tika – Using tools like qpdf or pdfid.py to scan incoming PDFs helps identify XFA structures or /AcroForm markers. Rejecting such files early greatly reduces the chance of XXE exploitation.
- Enforce Strong Network Egress Controls – Strict outbound network restrictions limit damage even if XXE is triggered. Blocking access to metadata services, internal APIs, and sensitive endpoints prevents attackers from retrieving data through external entity calls.
- Isolate Document-Processing Workloads – Long term, Tika should run in isolated, sandboxed environments with limited file system and network access. Treating document parsing as an untrusted workload helps contain any future parsing vulnerabilities.
AppTrana WAAP Coverage for CVE-2025-66516
AppTrana WAAP has had protection for this exploitation from day 0, using advanced inspection rules to detect and block malicious XFA-based XML payloads inside PDFs before they reach Apache Tika. The platform identifies harmful structures such as embedded external entities, suspicious XML signatures, and abnormal PDF patterns, ensuring XXE attacks are stopped at the edge even when Tika is running a vulnerable version.
Similar to pre-processing tools like qpdf or pdfid.py that flag XFA or /AcroForm markers, AppTrana performs deep file inspection automatically during upload to prevent malicious PDFs from entering the parsing workflow. In addition to inbound filtering, AppTrana restricts unauthorized outbound calls that XXE exploits typically attempt, blocking access to internal URLs or metadata services.
AppTrana’s managed security team continues to track this vulnerability and emerging exploit techniques, with additional protections deployed as new intelligence or PoCs become available.
Stay tuned for more relevant and interesting security articles. Follow Indusface on Facebook, Twitter, and LinkedIn.
December 8, 2025



