pythonのPDFプロパティ/メタデータを読む

pythonを使用してpdfファイルに保存されているタイトル、著者、件名、キーワードなどのプロパティ/メタデータを読むにはどうすればよいですか？pythonのPDFプロパティ/メタデータを読む

2013-01-08 Khaleel

はpdfminerをお試しください：

from pdfminer.pdfparser import PDFParser 
from pdfminer.pdfdocument import PDFDocument 

fp = open('diveintopython.pdf', 'rb') 
parser = PDFParser(fp) 
doc = PDFDocument(parser) 

print doc.info # The "Info" metadata

は、ここで出力です：

>>> [{'CreationDate': 'D:20040520151901-0500', 
    'Creator': 'DocBook XSL Stylesheets V1.52.2', 
    'Keywords': 'Python, Dive Into Python, tutorial, object-oriented, programming, documentation, book, free', 
    'Producer': 'htmldoc 1.8.23 Copyright 1997-2002 Easy Software Products, All Rights Reserved.', 
    'Title': 'Dive Into Python'}]

詳細情報については、このチュートリアルを見てください：A lightweight XMP parser for extracting PDF metadata in Python。

出典

2013-01-08 06:22:11 namit

ヘッドアップ：pdfminerの著者は、それが、少なくともこの記事の日付（[リンク]のように、Pythonの3と互換性がないと言います（https://github.com/euske/pdfminer/）） – JSmyth

2013年11月現在、「PDFDocumentクラスは引数としてPDFParserオブジェクトを受け取り、PDFDocument.set_parser（）およびPDFParser.set_document（）は削除されました。つまり、doc = PDFDocument（パーサ）を実行し、set_document、set_parser、およびinitializeの呼び出しをスキップするだけです。 –

@JSmyth [PyPi Index]（https://pypi.python.org/pypi?%3Aaction=search&term=pdfminer&submit=search）には現在、Python 3と互換性のある3つの動作中の 'pdfminer'フォークがリストされています。' pip search pdfminer' – zero2cx

pyPdfを使用して実装しました。以下のサンプルコードをご覧ください。

from pyPdf import PdfFileReader 
pdf_toread = PdfFileReader(open("doc2.pdf", "rb")) 
pdf_info = pdf_toread.getDocumentInfo() 
print str(pdf_info)

出力：

{'/Title': u'Microsoft Word - Agnico-Eagle - Complaint (00040197-2)', '/CreationDate': u"D:20111108111228-05'00'", '/Producer': u'Acrobat Distiller 10.0.0 (Windows)', '/ModDate': u"D:20111108112409-05'00'", '/Creator': u'PScript5.dll Version 5.2.2', '/Author': u'LdelPino'}

注：pyPdf homepageが、それはもはや維持されていると言いません。

from PyPDF2 import PdfFileReader 
pdf_toread = PdfFileReader(open("test.pdf", "rb")) 
pdf_info = pdf_toread.getDocumentInfo() 
print(str(pdf_info))

pip install PyPDF2を使用してインストールします：Pythonの3のために

出典

2013-01-08 08:49:01 Khaleel

'file'を使わないで、代わりに' open'を使います。 –

pyPdfは、サポートされていないとしてホームページにマークされています。 –

はに更新@Khaleelからのコード例でPyPDF2を参照してください。 Pythonの3と新しいpdfminerについては

出典

2016-10-08 11:31:14

（pdfminer3kをインストールPIP）：

出典

2016-12-19 01:36:11 Rabash

pythonのPDFプロパティ/メタデータを読む

答えて

関連する問題