Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
442 views
in Technique[技术] by (71.8m points)

java - Can POI or docx4j read Word docs that are password-protected?

I'm having an issue with POI that I'd like some help with.

I have a personal journal that I've kept for years by making daily entries into a Word .doc per month that's stored in a year folder. I add a password to open each one, so they're all encrypted.

I want to use Lucene to index the entire collection to allow better searching (e.g. "What day and year did I last write about how much I like oatmeal?").

The first step was to use POI to read a Word .doc, but I can't get off the dime because it can't read my encrypted file.

I've written this class:

package model;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.model.DocumentProperties;
import org.apache.poi.poifs.crypt.Decryptor;
import org.apache.poi.poifs.crypt.EncryptionInfo;
import org.apache.poi.poifs.dev.POIFSLister;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.security.GeneralSecurityException;
import java.util.LinkedList;
import java.util.List;

/**
 * JournalReader class that's the heart of my efforts to finally read, parse, index, and search my journal.
 * @author Michael
 * @link
 * @since 8/19/12 3:48 PM
 */
public class JournalReader {
    public static final Log LOGGER = LogFactory.getLog(JournalReader.class);
    public static final String DEFAULT_PASSWORD = "journal";


    public static void main(String[] args) {
        if (args.length > 0) {
            try {
                POIFSLister.viewFile(args[0], true);
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
    public List<JournalEntry> readEntries(File journalFile) throws IOException, GeneralSecurityException {
        List<JournalEntry> journalEntries = new LinkedList<JournalEntry>();
        if (journalFile != null) {
            POIFSFileSystem fs = new POIFSFileSystem(new FileInputStream(journalFile));
            // Exception is thrown after info line is executed.
            EncryptionInfo info = new EncryptionInfo(fs);
            Decryptor decryptor = Decryptor.getInstance(info);
            decryptor.verifyPassword(DEFAULT_PASSWORD);
            HWPFDocument journalDocument = new HWPFDocument(decryptor.getDataStream(fs));
            DocumentProperties documentProperties = journalDocument.getDocProperties();
        }
        return journalEntries;
    }
}

I have a JUnit test to try it out:

import model.JournalEntry;
import model.JournalReader;
import org.junit.Assert;
import org.junit.Test;

import java.io.File;
import java.io.IOException;
import java.security.GeneralSecurityException;
import java.util.List;

/**
 * JournalReaderTest JUnit test for JournalReader
 * @author Michael
 * @link
 * @since 8/19/12 8:46 PM
 */
public class JournalReaderTest {

    @Test
    public void testReadEntries() throws IOException, GeneralSecurityException {
        JournalReader journalReader = new JournalReader();
        String journalFilePath = "C:\Users\Michael\Documents\Stuff To Back Up\Journal\1994\AUG94.doc";
        File journalFile = new File(journalFilePath);
        List<JournalEntry> journalEntries = journalReader.readEntries(journalFile);
        Assert.assertNotNull(journalEntries);
        Assert.assertTrue(journalEntries.size() > 0);
    }
}

When I run the JUnit test, I get this stack trace:

"C:Program FilesJavajdk1.7.0_02injava" -ea -Didea.launcher.port=7540 "-Didea.launcher.bin.path=C:Program Files (x86)JetBrainsIntelliJ IDEA 122.29in" -Dfile.encoding=UTF-8 -classpath "C:Program Files (x86)JetBrainsIntelliJ IDEA 122.29libidea_rt.jar;C:Program Files (x86)JetBrainsIntelliJ IDEA 122.29pluginsjunitlibjunit-rt.jar;C:Program FilesJavajdk1.7.0_02jrelibalt-rt.jar;C:Program FilesJavajdk1.7.0_02jrelibcharsets.jar;C:Program FilesJavajdk1.7.0_02jrelibdeploy.jar;C:Program FilesJavajdk1.7.0_02jrelibjavaws.jar;C:Program FilesJavajdk1.7.0_02jrelibjce.jar;C:Program FilesJavajdk1.7.0_02jrelibjsse.jar;C:Program FilesJavajdk1.7.0_02jrelibmanagement-agent.jar;C:Program FilesJavajdk1.7.0_02jrelibplugin.jar;C:Program FilesJavajdk1.7.0_02jrelib
esources.jar;C:Program FilesJavajdk1.7.0_02jrelib
t.jar;C:Program FilesJavajdk1.7.0_02jrelibextdnsns.jar;C:Program FilesJavajdk1.7.0_02jrelibextlocaledata.jar;C:Program FilesJavajdk1.7.0_02jrelibextsunec.jar;C:Program FilesJavajdk1.7.0_02jrelibextsunjce_provider.jar;C:Program FilesJavajdk1.7.0_02jrelibextsunmscapi.jar;C:Program FilesJavajdk1.7.0_02jrelibextzipfs.jar;F:ProjectsJavadiary-indexout	estdiary-index;F:ProjectsJavadiary-indexoutproductiondiary-index;F:ProjectsJavadiary-indexlibcommons-lang3-3.1.jar;F:ProjectsJavadiary-indexliblog4j-1.2.16.jar;F:ProjectsJavadiary-indexlibcommons-io-2.3.jar;F:ProjectsJavadiary-indexlibpoi-scratchpad-3.8-20120326.jar;F:ProjectsJavadiary-indexlibpoi-3.8-20120326.jar;F:ProjectsJavadiary-indexlibpoi-examples-3.8-20120326.jar;F:ProjectsJavadiary-indexlibpoi-excelant-3.8-20120326.jar;F:ProjectsJavadiary-indexlibpoi-ooxml-3.8-20120326.jar;F:ProjectsJavadiary-indexlibpoi-ooxml-schemas-3.8-20120326.jar;F:ProjectsJavadiary-indexlibdom4j-1.6.1.jar;F:ProjectsJavadiary-indexlibstax-api-1.0.1.jar;F:ProjectsJavadiary-indexlibxmlbeans-2.3.0.jar;F:ProjectsJavadiary-indexlibantlr-2.7.7.jar;F:ProjectsJavadiary-indexlibantlr-runtime-3.3.jar;F:ProjectsJavadiary-indexlibavalon-framework-api-4.3.1.jar;F:ProjectsJavadiary-indexlibavalon-framework-impl-4.3.1.jar;F:ProjectsJavadiary-indexlibcommons-codec-1.3.jar;F:ProjectsJavadiary-indexlibcommons-io-1.3.1.jar;F:ProjectsJavadiary-indexlibcommons-lang-2.4.jar;F:ProjectsJavadiary-indexlibcommons-logging-1.1.1.jar;F:ProjectsJavadiary-indexlibdocx4j-2.8.0.jar;F:ProjectsJavadiary-indexlibfop-1.0.jar;F:ProjectsJavadiary-indexlibitext-2.1.7.jar;F:ProjectsJavadiary-indexlibjaxb-svg11-1.0.2.jar;F:ProjectsJavadiary-indexlibjaxb-xmldsig-core-1.0.0.jar;F:ProjectsJavadiary-indexlibjaxb-xslfo-1.0.1.jar;F:ProjectsJavadiary-indexliblog4j-1.2.15.jar;F:ProjectsJavadiary-indexlibpoi-3.8.jar;F:ProjectsJavadiary-indexlibpoi-scratchpad-3.8.jar;F:ProjectsJavadiary-indexlibserializer-2.7.1.jar;F:ProjectsJavadiary-indexlibstringtemplate-3.2.1.jar;F:ProjectsJavadiary-indexlibwmf2svg-0.9.0.jar;F:ProjectsJavadiary-indexlibxalan-2.7.1.jar;F:ProjectsJavadiary-indexlibxhtmlrenderer-1.0.0.jar;F:ProjectsJavadiary-indexlibxml-apis-1.3.04.jar;F:ProjectsJavadiary-indexlibxmlgraphics-commons-1.4.jar;F:ProjectsJavadiary-index	est-libjunit-4.10.jar" com.intellij.rt.execution.application.AppMain com.intellij.rt.execution.junit.JUnitStarter -ideVersion5 JournalReaderTest
log4j: reset attribute= "false".
log4j: Threshold ="null".
log4j: Level value for root is  [debug].
log4j: root level set to DEBUG
log4j: Class name: [org.apache.log4j.ConsoleAppender]
log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
log4j: Setting property [conversionPattern] to [%d{dd MMM yyyy HH:mm:ss} %5p %c{1} - %m%n].
log4j: Adding appender named [consoleAppender] to category [root].

java.io.FileNotFoundException: no such entry: "EncryptionInfo"
    at org.apache.poi.poifs.filesystem.DirectoryNode.getEntry(DirectoryNode.java:375)
    at org.apache.poi.poifs.filesystem.DirectoryNode.createDocumentInputStream(DirectoryNode.java:177)
    at org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:45)
    at org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:39)
    at model.JournalReader.readEntries(JournalReader.java:43)
    at JournalReaderTest.testReadEntries(JournalReaderTest.java:24)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
    at org.junit.runner.JUnitCore.run(JUnitCore.java:157)
    at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:76)
    at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:195)
    at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:63)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)


Process finished with exit code -1

The POI docs and javadocs have been useless. I'm frustrated enough where I've thought about switching to docx4j. It would mean translating all my .doc files to .docx (after suitable backup, of course).

I'd like to know if anyone has had any success using docx4j to read encrypted, password-protected files. Anyone? I'd just like an affirmative answer to tell me that it's worth pressing on.

If anyone can see what I'm doing wrong with POI I'd be glad to know that, too. Thanks.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Disclosure: I work on docx4j

docx4j's OpcPackage contains:

/**
 * Convenience method to create a WordprocessingMLPackage
 * or PresentationMLPackage
 * from an existing File (.docx/.docxm, .ppxtx or Flat OPC .xml).
 *
 * @param docxFile
 *            The docx file
 * @param password
 *            The password, if the file is password protected (compound)
 *            
 * @Since 2.8.0           
 */ 
public static OpcPackage load(final java.io.File docxFile, String password) throws Docx4JException

which ought to take care of the password-protected part.

I haven't played much with encryption/decryption of docx files myself.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...