The background of this issue appears to be an information missing in the ISO 32000-1 PDF specification; iText 5.5 meanwhile supports the ISO 32000-1 verbatim interpretation.
In ISO 32000-2 this has meanwhile been clarified.
The missing information
Before PDF became an ISO standard, PDF processor implementers followed the lead of Adobe Acrobat when documentation on PDFs were unclear or even claiming otherwise.
When Adobe Acrobat encrypted and signed a PDF, the binary string containing the signature container is not encrypted. Thus, other PDF tools in this case also did not encrypt the signature container.
In 2008 PDF became an ISO standard. According to ISO 32000-1,
Encryption applies to all strings and streams in the document's PDF file, with the following exceptions:
- The values for the ID entry in the trailer
- Any strings in an Encrypt dictionary
- Any strings that are inside streams such as content streams and compressed object streams, which themselves are encrypted
(ISO 32000-1, section 7.6 - Encryption)
According to this, in an encrypted and signed PDF the binary string containing the embedded signature container would also be encrypted.
In 2017, part 2 of ISO 32000 was published. In it the enumeration above is extended by a new entry
- Any hexadecimal strings representing the value of the Contents key in a Signature dictionary
(ISO 32000-2, section 7.6 - Encryption)
According to this, in an encrypted and signed PDF the binary string containing the embedded signature container would not be encrypted.
The code retrieving the signature container in iText
In the earliest code in iText for the retrieval of signature containers I could find, the binary string containing the signature container is assumed to never be encrypted:
pk = new PdfPKCS7(contents.getOriginalBytes(), provider);
(commit ffc70db dated November 5th, 2004, commented as "paulo version 139")
The method getOriginalBytes
retrieves the bytes of a PDF string as they are in the PDF, no decryption applied ever.
Later on the code was moved two or three times without change.
When PAdES support was added, only the subfilter was added here, still the original bytes were used:
pk = new PdfPKCS7(contents.getOriginalBytes(), sub, provider);
(commit 691281c, dated August 31st, 2012, commented as "Verify a CAdES signature")
But in early 2017 it was changed to the code you found:
if(!reader.isEncrypted()){
pk = new PdfPKCS7(contents.getOriginalBytes(), sub, provider);
}else{
pk = new PdfPKCS7(contents.getBytes(),sub,provider);
}
(commit 0b852d7 dated February 9th, 2017, commented as "Handle encrypted content stream when verifying Signatures SUP-1783")
Apparently the support issue SUP-1783 triggered the switch to following a verbatim interpretation of ISO 32000-1.
In iText 7 we have
pk = new PdfPKCS7(PdfEncodings.convertToBytes(contents.getValue(), null), sub, provider);
(commit ae73650, dated October 11th, 2015, commented as "Added classes for support LTV, Ocsp, CRL and TSA.")
but the contents
here have before marked unencrypted
contents.markAsUnencryptedObject();
(commit 6dfb206, dated April 24th, 2018, commented as "Avoid exception in SignatureUtil when a read-only document was passed")
and in iText 7 this makes contents.getValue()
return the original bytes. So iText 7 supports the PDF 2.0 clarification.
What should be done?
In my opinion, considering the verbatim ISO 32000-1 interpretation, one should accept either encrypted or unencrypted signature containers but in the light of the ISO 32000-2 wording one should generate only unencrypted ones.