The files input.pdf
and output.pdf
the OP originally presented did not allow to reproduce the issue but instead seemed not at all to match. Thus, there was an original answer essentially demonstrating that the issue could not be reproduced.
The second set of files Test1.pdf
and Test2.pdf
, though, did allow to reproduce the issue, giving rise to the updated answer...
Updated answer referring to the OP's second set of sample files
There indeed is an issue in the current (up to 5.5.8) iText clean-up code: In case of tagged files some methods of PdfContentByte
used here introduced extra instructions into the content stream which actually damaged it and relocated some text in the eyes of PDF viewers which ignored the damage.
In more detail:
PdfCleanUpContentOperator.writeTextChunks
used canvas.setCharacterSpacing(0)
and canvas.setWordSpacing(0)
to initially set the character and word spacing to 0. Unfortunately these methods in case of tagged files check whether the canvas under construction currently is in a text object and (if not) start a text object. This check depends on a local flag set by beginText
; but during clean-up text objects are not started using that method. Thus, writeTextChunks
here inserts an extra "BT 1 0 0 1 0 0 Tm"
sequence damaging the stream and relocating the following text.
private void writeTextChunks(Map<Integer, Float> structuredTJoperands, List<PdfCleanUpContentChunk> chunks, PdfContentByte canvas,
float characterSpacing, float wordSpacing, float fontSize, float horizontalScaling) throws IOException {
canvas.setCharacterSpacing(0);
canvas.setWordSpacing(0);
...
PdfCleanUpContentOperator.writeTextChunks
instead should use hand-crafted Tc
and Tw
instructions to not trigger this side effect.
private void writeTextChunks(Map<Integer, Float> structuredTJoperands, List<PdfCleanUpContentChunk> chunks, PdfContentByte canvas,
float characterSpacing, float wordSpacing, float fontSize, float horizontalScaling) throws IOException {
if (Float.compare(characterSpacing, 0.0f) != 0 && Float.compare(characterSpacing, -0.0f) != 0) {
new PdfNumber(0).toPdf(canvas.getPdfWriter(), canvas.getInternalBuffer());
canvas.getInternalBuffer().append(Tc);
}
if (Float.compare(wordSpacing, 0.0f) != 0 && Float.compare(wordSpacing, -0.0f) != 0) {
new PdfNumber(0).toPdf(canvas.getPdfWriter(), canvas.getInternalBuffer());
canvas.getInternalBuffer().append(Tw);
}
canvas.getInternalBuffer().append((byte) '[');
With this change in place the OP's new sample file "Test1.pdf" is properly redacted by the sample code
@Test
public void testRedactJavishsTest1() throws IOException, DocumentException
{
try ( InputStream resource = getClass().getResourceAsStream("Test1.pdf");
OutputStream result = new FileOutputStream(new File(OUTPUTDIR, "Test1-redactedJavish.pdf")) )
{
PdfReader reader = new PdfReader(resource);
PdfStamper stamper = new PdfStamper(reader, result);
List<Float> linkBounds = new ArrayList<Float>();
linkBounds.add(0, (float) 202.3);
linkBounds.add(1, (float) 588.6);
linkBounds.add(2, (float) 265.8);
linkBounds.add(3, (float) 599.7);
Rectangle linkLocation1 = new Rectangle(linkBounds.get(0), linkBounds.get(1), linkBounds.get(2), linkBounds.get(3));
List<PdfCleanUpLocation> cleanUpLocations = new ArrayList<PdfCleanUpLocation>();
cleanUpLocations.add(new PdfCleanUpLocation(1, linkLocation1, BaseColor.GRAY));
PdfCleanUpProcessor cleaner = new PdfCleanUpProcessor(cleanUpLocations, stamper);
cleaner.cleanUp();
stamper.close();
reader.close();
}
}
(RedactText.java)
Original answer referring to the OP's original sample files
I just tried to reproduce your issue using this test method
@Test
public void testRedactJavishsText() throws IOException, DocumentException
{
try ( InputStream resource = getClass().getResourceAsStream("input.pdf");
OutputStream result = new FileOutputStream(new File(OUTPUTDIR, "input-redactedJavish.pdf")) )
{
PdfReader reader = new PdfReader(resource);
PdfStamper stamper = new PdfStamper(reader, result);
List<Float> linkBounds = new ArrayList<Float>();
linkBounds.add(0, (float) 200.7);
linkBounds.add(1, (float) 547.3);
linkBounds.add(2, (float) 263.3);
linkBounds.add(3, (float) 558.4);
Rectangle linkLocation1 = new Rectangle(linkBounds.get(0), linkBounds.get(1), linkBounds.get(2), linkBounds.get(3));
List<PdfCleanUpLocation> cleanUpLocations = new ArrayList<PdfCleanUpLocation>();
cleanUpLocations.add(new PdfCleanUpLocation(1, linkLocation1, BaseColor.GRAY));
PdfCleanUpProcessor cleaner = new PdfCleanUpProcessor(cleanUpLocations, stamper);
cleaner.cleanUp();
stamper.close();
reader.close();
}
}
(RedactText.java)
For your source PDF looking like this
the result was
and not your
I even re-tested using the iText versions 5.5.5 you mention in a comment and also 5.5.4, but in all cases I got the correct result.
Thus, I cannot reproduce your issue.
I had a closer look at your output.pdf. It is a bit peculiar, in particular it does not contain certain blocks typical for PDFs created or manipulated by current iText versions. Furthermore the content streams look extremely different.
Thus, I assume that after iText redacted your file some other tool post-processed and in doing so damaged it.
In particular the page content instructions preparing the insertion of the redacted line look like this in your input.pdf:
q
0.24 0 0 0.24 113.7055 548.04 cm
BT
0.0116 Tc
45 0 0 45 0 0 Tm
/TT5 1 Tf
[...] TJ
and like this in the version I received directly from iText:
q
0.24 0 0 0.24 113.7055 548.04 cm
BT
0.0116 Tc
45 0 0 45 0 0 Tm
/TT5 1 Tf
0 Tc
0 Tw
[...] TJ
but the corresponding lines in your output.pdf look like this
BT
1 0 0 1 113.3 548.5 Tm
0 Tc
BT
1 0 0 1 0 0 Tm
0 Tc
[...] TJ
Here the instructions in your output.pdf are
- invalid as inside a text object
BT ... ET
there may be no other text object but you have two BT
operations following each other without an ET
inbetween;
- effectively positioning the text at 0, 0 if a PDF viewer ignores the error mentioned above.
And indeed, if you look at the bottom of your output.pdf page you'll see:
So if my assumption that there is some other program post-processing the iText result, is correct, you should repair that post-processor.
If there is no such post-processor, you seem not to have the officially published iText version but something altogether different.