2016-03-30 13 views
0

内の特定のノードの取得値:ウェブスクレイピング、私は以下のようなHTMLが午前要素

<article class="">     


<p><strong>Jamshedpur:</strong> Three members of a family were beaten to death, allegedly by some villagers on charge of stealing a cattle in Udajo village of West Singhbhum district, police said on Saturday.</p><p>The incident took place on Thursday but the bodies, which were dumped in a jungle boring Naxal-hit Tonto area of the distirct, was yet to be recovered.</p><figure class="article_img"> 

      <img alt="" src="http://img01.ibnlive.in/ibnlive/uploads/2015/05/cowslaughter_reuters1.jpg"> 

      <div class="intro">File Picture 
</div> 
     </figure><!-- <div class="editor_pic clearfix">   
      <div class="fright eprbox ttu"><a href="javascript:void(0)" onclick="opencomm()"><span>View all</span>1025 Comments</a> <a href="javascript:void(0)" onclick="opencomm()" class="sprite_img ep_arrow"></a></div> 
     </div> --> 

     <div class="article_tag"> 
      <a href="/newstopics/cattle.html">#cattle</a> <a href="/newstopics/lynch.html">#lynch</a> <a href="/newstopics/theft.html">#theft</a> 
     </div><div style="height: auto; width: 100%; text-align: center; padding: 0px; margin: auto; position: relative; overflow: hidden; background: transparent none repeat scroll 0% 0%;" class="_adSenceImagePushContainer"><iframe width="728" height="90" frameborder="0" scrolling="no" allowtransparency="true" class="_adSenceImagePush" loadstatus="1" data-width="728" data-height="90" style="height:90px !important; min-height:90px !important; width:728px !important; overflow:hidden;bottom:0px;margin:0 auto 15px;position:relative; background: transparent; z-index: 2; -moz-transform: scale(1); -o-transform: scale(1); -webkit-transform: scale(1); transform: scale(1); max-width:initial !important;" hov-data-host="www.ibnlive.com" hov-data-section="news/*" hov-data-wom="web" hov-data-id="21" hov-data-default="true"></iframe></div><p>The victims were identified as Sukra Koda (30), his cousins Sanatan Laguri (32) and Surja Laguri (25). They were allegedly caught by the villagers and thrashed with lathis on charge of taking away a cattle, said the officer-in charge of the concerned police station, Sahdeo Toppo.</p><p>Toppo said the bodies have allegedly been dumped in the jungle but could not yet been recovered the area being a disturbed one.</p><p>A police team will be sent on Sunday to the spot to recover the bodies.</p><p>Meanwhile, an FIR has been registered against 13 villagers in this connection, he said adding that no one was arrested as yet. </p>    <!--tech poll --> 




     <!--share and comment --> 
     <div class="editor_pic btm_share_box clearfix mtop30"> 

      <div class="social_icon_box fleft pnon bnon"> 
         <ul class="clearfix"> 
         <li> 

<a href="javascript: void(0)" onclick="window.open('https://web.skype.com/share?url=http://www.ibnlive.com/news/india/3-lynched-on-charge-of-cattle-theft-in-jharkhand-1221561.html&amp;lang=en-US&amp;flow_id=e67bffb7-d317-4577-8528-e9cc76e80b38&amp;source=button', '_blank', 'toolbar=no, scrollbars=yes, resizable=yes, width=305, height=665');"><img src="http://static.ibnlive.in.com/pix/ibnhome/ibn_revamp/newsletter/sky.png"></a> 
</li> 
          <li><a href="javascript: void(0)" onclick="window.open('http://twitter.com/share?text=3 lynched on charge of cattle theft in Jharkhand&amp;url=http://www.ibnlive.com/news/india/3-lynched-on-charge-of-cattle-theft-in-jharkhand-1221561.html&amp;via=ibnlive&amp;related=cnnibnbreaking%2Cibnlivetech%2Cibnlivemovies','sharer', 'toolbar=0,status=0,width=620,height=320');"><span class="tech_sprite tweet_icons3"></span><div class="tlTweet"></div></a></li> 
          <li><a href="javascript: void(0)" onclick="window.open('http://www.facebook.com/sharer.php?t=3 lynched on charge of cattle theft in Jharkhand&amp;u=http://www.ibnlive.com/news/india/3-lynched-on-charge-of-cattle-theft-in-jharkhand-1221561.html','sharer', 'toolbar=0,status=0,width=620,height=280');"><span class="tech_sprite fb_icons3"></span><div class="tlFB"></div></a></li>       
           <li class="bnon pr_non"><a href="javascript:void(0);"><span class="more_icon tech_sprite mtop5"></span> More+</a></li> 
         </ul> 
         </div> 
<div class="social_icon_box share_exp_box"> 
        <ul class="clearfix"> 
         <li><a target="_blank" href="http://www.facebook.com/sharer.php?u=http://www.ibnlive.com/news/india/3-lynched-on-charge-of-cattle-theft-in-jharkhand-1221561.html&amp;t=3 lynched on charge of cattle theft in Jharkhand"><span title="Facebook" class="tech_sprite fb_icons3"></span></a></li> 
         <li><a target="_blank" href="http://twitter.com/share?text=3 lynched on charge of cattle theft in Jharkhand&amp;url=http://www.ibnlive.com/news/india/3-lynched-on-charge-of-cattle-theft-in-jharkhand-1221561.html"><span title="Twitter" class="tech_sprite tweet_icons3"></span></a></li> 
         <li><a target="_blank" href="http://pinterest.com/pin/create/link/?url=3 lynched on charge of cattle theft in Jharkhand"><span title="Pinterest" class="tech_sprite print_icons3"></span></a></li> 
         <li><a target="_blank" href="https://plus.google.com/share?url=http://www.ibnlive.com/news/india/3-lynched-on-charge-of-cattle-theft-in-jharkhand-1221561.html"><span title="Google+" class="tech_sprite g_icons3"></span></a></li> 
         <li><a target="_blank" href="https://www.linkedin.com/cws/share?url=http://www.ibnlive.com/news/india/3-lynched-on-charge-of-cattle-theft-in-jharkhand-1221561.html"><span title="Linkedin" class="tech_sprite in_icons3"></span></a></li> 
         <li><a target="_blank" href="http://reddit.com/submit?url=http://www.ibnlive.com/news/india/3-lynched-on-charge-of-cattle-theft-in-jharkhand-1221561.html&amp;title=3 lynched on charge of cattle theft in Jharkhand"><span title="Reddit" class="tech_sprite reddit_icons"></span></a></li> 
         <li><a target="_blank" href="http://www.stumbleupon.com/submit?url=http://www.ibnlive.com/news/india/3-lynched-on-charge-of-cattle-theft-in-jharkhand-1221561.html&amp;title=3 lynched on charge of cattle theft in Jharkhand"><span title="Stumble" class="tech_sprite stumbleupon_icons"></span></a></li> 
         <!-- <li><a href="#" target="_blank"><span class="tech_sprite email_icons" title="Email"></span></a></li> --> 
         <!-- <li class="bnon"><a href="#" target="_blank"><span class="tech_sprite flipboard_icons" title="Flipboard"></span></a></li> --> 
        </ul> 
        <a href="javascript:void(0);" class="sprite sclose_icon"></a> 
       </div> 
      <div class="fright eprbox ttu"><a onclick="clickpage('opencomments');" href="javascript:void(0)"><span>View all</span><div data-disqus-identifier="article_1221561" class="disqus-comment-count">0 Comments</div></a> <a onclick="clickpage('opencomments');" href="javascript:void(0)" class="sprite_img article_carrow"></a></div>  


     </div> 
     <!--share and comment --> 

     </article> 

は私が記事のタグ内のpタグを取得したいです。

$doc = new DOMDocument(); 
    libxml_use_internal_errors(true); 
    if($page) 
    { 
     $doc->loadHTML($page); 
    } 


$paragraph = $doc->getElementsByTagName('article'); 

     $parag=array(); 

     foreach($paragraph as $para) 
     { 

       if($para->nodeValue!="") 
       { 
        $parag[]=$para->nodeValue; 
       } 

     } 

をしかし、私は記事の内側のみのpタグを望む一方で、コードの上に私の記事のタグ内の全コンテンツを返します。私は使用しています。

私はnodeNameとchildNodesのようなプロパティを使用しようとしましたが、それは役に立ちませんでした。 私はウェブスクレイピングに素朴で解決策が見つからないため、以下のリンクを参照しました。tutorialchildnodes

いくつかの例示的な解決策を教えてください。前もって感謝します。

+1

ましソリューション:\t $のxpath =新しいDOMXPathを使うことの最大($ドキュメント)。 \t \t $ news_paras = $ xpath-> query( "// article // p"); \t \t \t \t $ parag = array(); \t \t foreachの($ news_paraとして$ news_paras) \t \t { \t \t \t \t \t \t \t $ parag [] = $ news_para->のnodeValue。 \t \t} – Simer

答えて

0

私は要件上記の溶液は、XPathを使用して満たすことができました

$xpath = new DOMXPath($doc); 
$news_paras = $xpath->query("//article//p"); 
$parag = array(); 
foreach ($news_paras as $news_para) 
{ $parag[]=$news_para->nodeValue; } 
関連する問題