Python - URLを取得し、PDFを解析して印刷する

URLのHTMLソースを取得して解析し、その結果をPDFとして出力しようとしています。Python - URLを取得し、PDFを解析して印刷する

私はBeautifulSoup、urllib2、およびreportlabに依存したいと思っていましたが、私はそれらを適切に組み合わせる方法がありません。

エラーとして私は'module' object is not callableを取得します。これは、django 1.3.1 devサーバーを実行してビューにアクセスするときです。

これは私のコードです：ここでは

from reportlab.pdfgen import canvas 
from cStringIO import StringIO 
from django.http import HttpResponse 
from django.shortcuts import render_to_response 
from django.template import RequestContext 
# Fetching the URL 
import urllib2 

# Parsing the HTML 
from BeautifulSoup import BeautifulSoup 

# The ConverterForm 
from django import forms 

class ConverterForm(forms.Form): 
    # Use textarea instead the default TextInput. 
    html_files = forms.CharField(widget=forms.Textarea) 
    filename = forms.CharField() 

# Create your views here. 
def create_pdf(request): 
    # If the form has been submitted 
    if request.method == 'POST': 
     # A form bound to the POST data 
     form = ConverterForm(request.POST) 
    # All validation rules pass 
    if form.is_valid(): 
     # PDF creation process 
     # Assign variables 
     html_files = form.cleaned_data['html_files'] 
     filename = form.cleaned_data['filename'] 

     # Create the HttpResponse object with the appropriate PDF headers. 
     response = HttpResponse(mimetype='application/pdf') 
     # The use of attachment forces the Save as dialog to open. 
     response['Content-Disposition'] = 'attachment; filename=%s.pdf' % filename 

     buffer = StringIO() 

     # Get the page source 
     page = urllib2.urlopen(html_files) 
     html = page.read() 

     # Parse the page source 
     soup = BeautifulSoup(html) 

     # Create the PDF object, using the StringIO() object as its "file". 
     p = canvas.Canvas(buffer) 

     # Draw things on the PDF and generate the PDF. 
     # See ReportLab documentation for full list of functions. 
     p.drawString(100, 100, soup) 

     # Close the PDF object cleanly. 
     p.showPage() 
     p.save() 

     # Get the value of the StringIO buffer and write it to the response. 
     pdf = buffer.getvalue() 
     buffer.close() 
     response.write(pdf) 
     return response 

    else: 
     # An unbound form 
     form = ConverterForm() 

    # For RequestContext in relation to csrf see more here: 
    # https://docs.djangoproject.com/en/1.3/intro/tutorial04/ 
    return render_to_response('converter/index.html', { 
    'form': form, 
    }, context_instance=RequestContext(request))

出典

2012-03-24 orschiro

どこでエラーが発生しますか？完全に表示してください。あなたはあなたのコード全体を表示していません。 – Marcin

コードを編集しました。申し訳ありませんが、最初は残りの部分が関連性がないかもしれないと思いました。 Regards – orschiro

あなたの正確なエラーは 'buffer = StringIO（）'です。これは 'buffer = StringIO.StringIO（）'でなければなりませんが、私は答えとしてより簡単な解決法を提供しています。 –

は簡単な方法です：

import cStringIO as StringIO 

import ho.pisa as pisa 
import requests 

def pdf_maker(request): 

    browser = requests.get('http://www.google.com/') 
    html = browser.text 

    result = StringIO.StringIO() 
    source = StringIO.StringIO(html.encode('UTF-8')) # adjust as required 

    pdf = pisa.pisaDocument(source,dest=result) 

    if not pdf.err: 
     response = HttpResponse(result.getvalue(),mimetype='application/pdf') 
     response['Content-Disposition'] = 'attachment; filename=the_file.pdf' 
     return response 

    return render(request,'error.html')

これはrequestsとpisaを使用しています。しかし、これにはいくつかの制限があります。つまり、PDF変換プロセスではインターネットから直接画像を読み込むことができないため、画像を取り込んで埋め込む方法を見つける必要があります。モジュールとクラスの両方が同じベース名を持っているので、それは混乱するかもしれません

from BeautifulSoup import BeautifulSoup

：

出典

2012-03-24 13:14:43

ありがとうございます。しかし、PyPiによると、Pisaはこれ以上開発されていません。 XHTML2PDFライブラリはPisaと同じように動作しますか？ – orschiro

はい、ほぼ同じ方法です。 –

それでも私は上記の私のアプローチが失敗する理由を理解したい。 – orschiro

あなたがBeautifulSoupクラスをインポートする必要があります。

出典

2012-03-24 14:46:56 jfs

明らかに私のシステムにはありません。 'converter.viewsをインポートできませんでした。エラーは：BeutifulSoup'という名前のモジュールがありません。私はActivePythonディストリビューションを使用し、pypmを通してbeautifulsoupをインストールしました。 http://code.activestate.com/pypm/beautifulsoup/ – orschiro

@orschiro：スペルを確認してください。 – jfs

これは問題を解決しませんでした。上記のコードを更新しました。正確なエラー出力：http://dpaste.com/721558/ – orschiro

Python - URLを取得し、PDFを解析して印刷する

答えて

関連する問題