Apache PDFBOX - 分割（PDDocument文書）を使用しているときにjava.lang.OutOfMemoryErrorを取得する

Apache PDFBOX API V2.0.2を使用して適切な300ページの文書を分割しようとしています。次のコードを使用して、単一のページにPDFファイルを分割しようとしているものの：Apache PDFBOX - 分割（PDDocument文書）を使用しているときにjava.lang.OutOfMemoryErrorを取得する

 PDDocument document = PDDocument.load(inputFile); 
     Splitter splitter = new Splitter(); 
     List<PDDocument> splittedDocuments = splitter.split(document); //Exception happens here

私は、GCはヒープをクリアするために多くの時間を取っていることを示し、次の例外

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded

を受け取ります回収された金額によって正当化されない。

状況を解決することができるJVMのチューニング方法は数多くありますが、これらはすべて症状を処理するものであり、実際の問題ではありません。、私はので、新しいJava 8の消費者を使用して、JDK6を使用しています

最後の注意点は、私のcase.Thanksではオプションではありません

編集：

これは、HTTPの重複問題ではありません。 //stackoverflow.com/questions/37771252/splitting-a-pdf-results-in-very-large-pdf-documents-with-pdfbox-2-0-2 as：

 
1. I do not have the size problem mentioned in the aforementioned 
    topic. I am slicing a 270 pages 13.8MB PDF file and after slicing 
    the size of each slice is an average of 80KB with total size of 
    30.7MB. 
2. The Split throws the exception even before it returns the splitted parts.

私は分割が私が文書全体を渡していない限り、代わりに20-30ページの「バッチ」として渡します。 PDFボックスがヒープになりオブジェクトとしてヒープ内の型PDDocumentのオブジェクトは、高速でいっぱいなって部品が分割操作に起因する、とあなたはすべてのラウンド後にクローズ（）操作を呼び出す場合でも、格納

出典

2016-07-04 WiredCoder

比較的扱いやすいチャンク（10〜40ページ）の各バッチがである、バッチにドキュメント分割操作を分割することで知られているバグ2.0.2で、このまで2.0.1を使用固定されています。 –

Tilmanが提案したように以前のバージョンを試しましたか？ –

私はバージョン番号に制限があります@GeorgeGarchagudashvili – WiredCoder

ループでも、GCはヒープサイズを取得するのと同じ方法でヒープサイズを再利用することができません。

オプションは

public void execute() { 
    File inputFile = new File(path/to/the/file.pdf); 
    PDDocument document = null; 
    try { 
     document = PDDocument.load(inputFile); 

     int start = 1; 
     int end = 1; 
     int batchSize = 50; 
     int finalBatchSize = document.getNumberOfPages() % batchSize; 
     int noOfBatches = document.getNumberOfPages()/batchSize; 
     for (int i = 1; i <= noOfBatches; i++) { 
      start = end; 
      end = start + batchSize; 
      System.out.println("Batch: " + i + " start: " + start + " end: " + end); 
      split(document, start, end); 
     } 
     // handling the remaining 
     start = end; 
     end += finalBatchSize; 
     System.out.println("Final Batch start: " + start + " end: " + end); 
     split(document, start, end); 

    } catch (IOException e) { 
     e.printStackTrace(); 
    } finally { 
     //close the document 
    } 
} 

private void split(PDDocument document, int start, int end) throws IOException { 
    List<File> fileList = new ArrayList<File>(); 
    Splitter splitter = new Splitter(); 
    splitter.setStartPage(start); 
    splitter.setEndPage(end); 
    List<PDDocument> splittedDocuments = splitter.split(document); 
    String outputPath = Config.INSTANCE.getProperty("outputPath"); 
    PDFTextStripper stripper = new PDFTextStripper(); 

    for (int index = 0; index < splittedDocuments.size(); index++) { 
     String pdfFullPath = document.getDocumentInformation().getTitle() + index + start+ ".pdf"; 
     PDDocument splittedDocument = splittedDocuments.get(index); 

     splittedDocument.save(pdfFullPath); 
    } 
}

出典

2016-07-10 17:23:28 WiredCoder

Apache PDFBOX - 分割（PDDocument文書）を使用しているときにjava.lang.OutOfMemoryErrorを取得する

答えて

関連する問題