異なるページのループ内でiTextを使用してHTMLのチャンクを解析する

現在、データベースの複数の行のデータに基づいてPDFを作成する作業版があります。データベースの各行に対して、PDFに新しいページを作成します。これはすごく効果的です。今度は、各行のいくつかのフィールドを解析して、HTMLが適切に表示されるようにする必要があります。私はsee an example hereですが、全体の文字列を取り、ドキュメントを解析していますが、ドキュメント全体の解析を示しています。異なるページのループ内でiTextを使用してHTMLのチャンクを解析する

私が必要とするのは、HTMLの特定のフィールドだけを解析して個々の書式設定されたページを作成することです。これは可能ですか？以下は

新しいページを作成し、私はいくつかのサンプルコードです：

PdfFont fTimes = PdfFontFactory.CreateFont(FontConstants.TIMES_ROMAN); 
PdfFont fTimesBold = PdfFontFactory.CreateFont(FontConstants.TIMES_BOLD);      

// create the first page here 
doc.Add(new Paragraph("Abstract Submissions for " + eventName).SetFont(fTimes).SetFontSize(18).SetFontColor(Color.BLACK)); 
doc.Add(new Paragraph("Section Name: " + GetSectionName(ddlSections.SelectedValue)).SetFont(fTimes).SetFontSize(14).SetFontColor(Color.BLACK)); 
doc.Add(new Paragraph("Created: " + DateTime.Now.ToString("dddd, MMMM d, yyyy h:mm tt")).SetFont(fTimes).SetFontSize(11).SetFontColor(Color.BLACK)); 

// iterate through each of the items 
foreach (DataRow row in dsItems.Tables[0].Rows) 
{ 
    // create a new page for each abstract submission 
    doc.Add(new AreaBreak(iText.Layout.Properties.AreaBreakType.NEXT_PAGE)); 
    doc.Add(new Paragraph(ValidationHelper.GetString(row["PresentationType"], "")).SetFont(fTimes).SetFontSize(12).SetFontColor(Color.BLACK)); 
    doc.Add(new Paragraph(ValidationHelper.GetString(row["PresentationTitle"], "")).SetFont(fTimes).SetFontSize(16).SetFontColor(Color.BLACK)); 
    // html field 
    doc.Add(new Paragraph(ValidationHelper.GetString(row["Authors"], "")).SetFont(fTimes).SetFontSize(12).SetFontColor(Color.BLACK)); 
    // html field 
    doc.Add(new Paragraph(ValidationHelper.GetString(row["Abstract"], "")).SetFont(fTimes).SetFontSize(12).SetFontColor(Color.BLACK)); 
} 

doc.Close();

私はMemoryStream対FileStreamを使用してそのクライアントはすぐにダウンロードすることができ、ファイルシステムに保存する必要はありませんよ注意してください。

** EDIT - あなたはiTextの翻訳者への独自のXML/HTMLを作成することができ、このようなパターンでサンプルデータを追加する**

<table> 
    <tr> 
     <td>Poster</td> 
     <td>Abstract 1</td> 
     <td><strong><em>Doctor Name 1</em></strong> <strong>Doctor Name 2</strong></td> 
     <td><p>Some really long text <strong>which can have</strong> some different basic HTML <u>formatting in it</u></p></td> 
    </tr> 
    <tr> 
     <td>Presentation</td> 
     <td>Abstract 2</td> 
     <td><strong>Doctor Name 15 </strong><em>Doctor 3</em></td> 
     <td><p>Some really long text which can have some different basic HTML <em>formatting in it</em></p></td> 
    </tr> 
</table>

出典

2017-04-19 Brenden Kehren

パーズ/レンダリングするコンテンツのサンプルを共有できますか？このコンテンツは、リッチテキストエディタのような一貫した書式設定のhtmlの小さなサブセットですか、それとも野生のhtml/cssのものですか？ – COeDev

サンプルデータ@COeDevを追加しました。フォーマットが貧弱なため申し訳ありません。基本的にタグのすべてがデータベース列です。エディタを使わずにマークアップの効果を最大限に引き出すことができた唯一の方法です。 –

"strong"、 "p"、 "em"とhtmlが有効なxml以外のものがあまりない場合、この内容を簡単に解析してitext要素を作成できます。 – COeDev

。

internal interface ICustomElement { IEnumerable<IElement> GetContent(); } 

internal class CustomElementFactory { 
    public ICustomElement GetElement(XmlNode node) { 
    switch (node.Name) { 
     case "p": return new CustomParagraph (node, this); 
     // implement the tags you need using the ICustomElement interface 
     default: // e.g. treat unknown nodes as text 
    } 
} 

public class PdfCreator { 
    public byte[] GetPdf(XmlDocument template) { 
    PdfDocument doc ... 
    CustomElementFactory factory ... 
    foreach(XmlNode node in template.ChildNodes) { 
     doc.AddElements(factory.GetElement(node).GetContent()); 
     // the point why all this is possible in such an easy generic way is that almost every itext element implements the IElement interface and therefore can be added to the document this way. And containers like PdfPCell are taking IElements as well. 
     // Good job itext guys! ;) 
    } 

    return doc.CloseDocument(); 
    } 
} 

// here comes the magic: 

internal class CustomParagraph : ICustomElement { 
    // ctor storing the xmlnode and factory in private field 
    public IEnumerable<IElement> GetContent() { 
    Paragraph p = new Paragraph(); 
    p.Add(node.InnerText); // create a underline or bold or whatever font here when you are implementing the special html tags 

    // if the node has child elements, get their content by calling the factory.GetElement(child).GetContent() for each child. Then loop over the the IElement.Chunks collection of each IElement to add the containing chunks to the paragraph of this scope. This way you will be able to process nested html tags recursively. 
    // find a way to pass the style information of this scope to the factory when processing child nodes, so you will be able to render <strong>bold<u>underlindANDBOLD</u></strong> stuff correctly 

    return new List<IElement> { p }; 
    } 
}

これはいくつかの作業と微調整が必要ですが、実行することができます。

出典

2017-04-21 05:36:48 COeDev

異なるページのループ内でiTextを使用してHTMLのチャンクを解析する

答えて

関連する問題