2016-12-06 1 views
0

私は次のような構造でhtmlファイル上で解析を行っています:DOMXPathを使うことの最大のクエリにネスト複数のクラス

<div class="lstImv blackBd12"> 
    <div class="stCl3 stLeft imvImg"> 
     <div class="imgBox">    
      <a class="emp-imgs-link"> 
       <span class="imgFrm frmBig frmLeft"> 
        <img class="emp-img-principal"> 
       </span> 
       <span class="imgFrm frmMd frmTop"> 
        <img class="emp-img-logo"> 
       </span> 
       <span class="imgFrm frmMd frmBot"> 
        <img class="emp-img-foto"> 
       </span>    
      </a> 
     </div> 
     <strong class="imvFse emp-fase">Get_text 1</strong> 
    </div> 
    <div class="imvInf stCl3 stRight"> 
     <div class="infHd"> 
      <div class="hdLeft stCl2"> 
       <strong class="emp-nome infNme colorTxt"></strong> 
       <span class="emp-loc-part1 infLoc">Get_text 2</span> 
       <span class="emp-loc-part2 infLoc">Get_text 3</span> 
      </div> 
      <div class="hdRight stCl1"> 
       <em class="emp-valor-apartir" >Get_text 4</em> 
       <strong class="emp-valor infVlr colorTxt">Get_text 5</strong> 
      </div> 
     </div> 
     <div class="infTxt"> 
      <p class="blackTxt60 emp-descritivo"></p> 
      <ul>     
       <li class="txtBed emp-un-dorms">Get_text 6</li>         
       <li class="txtArea emp-un-area">Get_text 7</li> 
       <li class="txtToilet emp-un-bath">Get_text 8</li> 
       <li class="txtCar emp-un-park">Get_text 9</li> 
      </ul> 
     </div> 
     <div class="infBt"> 
      <a href="/parceiro_cadastro" title="" class="btCadastrese stBt stBtLt colorBg whiteTxt rc9 sh15 emp-btn-cadastre">Get_text 10</a> 
     </div> 
    </div> 
</div> 
<div class="lstImv blackBd12"> 
    <div class="stCl3 stLeft imvImg"> 
     <div class="imgBox">    
      <a class="emp-imgs-link"> 
       <span class="imgFrm frmBig frmLeft"> 
        <img class="emp-img-principal"> 
       </span> 
       <span class="imgFrm frmMd frmTop"> 
        <img class="emp-img-logo"> 
       </span> 
       <span class="imgFrm frmMd frmBot"> 
        <img class="emp-img-foto"> 
       </span>    
      </a> 
     </div> 
     <strong class="imvFse emp-fase">Other Get_text 1</strong> 
    </div> 
    <div class="imvInf stCl3 stRight"> 
     <div class="infHd"> 
      <div class="hdLeft stCl2"> 
       <strong class="emp-nome infNme colorTxt"></strong> 
       <span class="emp-loc-part1 infLoc">Other Get_text 2</span> 
       <span class="emp-loc-part2 infLoc">Other Get_text 3</span> 
      </div> 
      <div class="hdRight stCl1"> 
       <em class="emp-valor-apartir" >Other Get_text 4</em> 
       <strong class="emp-valor infVlr colorTxt">Other Get_text 5</strong> 
      </div> 
     </div> 
     <div class="infTxt"> 
      <p class="blackTxt60 emp-descritivo"></p> 
      <ul>     
       <li class="txtBed emp-un-dorms">Other Get_text 6</li>         
       <li class="txtArea emp-un-area">Other Get_text 7</li> 
       <li class="txtToilet emp-un-bath">Other Get_text 8</li> 
       <li class="txtCar emp-un-park">Other Get_text 9</li> 
      </ul> 
     </div> 
     <div class="infBt"> 
      <a href="/parceiro_cadastro" title="" class="btCadastrese stBt stBtLt colorBg whiteTxt rc9 sh15 emp-btn-cadastre">Other Get_text 10</a> 
     </div> 
    </div> 
</div> 

次のブロック:それは他のタグをカバー

<div class="lstImv blackBd12"></div>

場所target textContentsは、数回繰り返します(この例では、編集後、2つしか入れませんでした)。このコードを次に

<?php 
$html = "exemplo_parse.html"; 
libxml_use_internal_errors(true); 
$dom = new domDocument('1.0', 'utf-8'); 
$dom->loadHTMLFile($html); 
$dom->preserveWhiteSpace = false; 
$xpath = new DOMXPath($dom); 
$content = $xpath->query('//div[@class="lstImv blackBd12"]'); 
foreach($content as $span) 
{ 
    echo "<pre>"; 
     print_r($span); 
    echo "</pre>"; 
} 
?> 

私はそれらの値と2つのオブジェクトを取得:

DOMElement Object 
(
    [tagName] => div 
    [schemaTypeInfo] => 
    [nodeName] => div 
    [nodeValue] => 











     Get_text 1 





       Get_text 2 
       Get_text 3 


       Get_text 4 
       Get_text 5 




      Get_text 6         
       Get_text 7 
       Get_text 8 
       Get_text 9 


      Get_text 10 



    [nodeType] => 1 
    [parentNode] => (object value omitted) 
    [childNodes] => (object value omitted) 
    [firstChild] => (object value omitted) 
    [lastChild] => (object value omitted) 
    [previousSibling] => 
    [nextSibling] => (object value omitted) 
    [attributes] => (object value omitted) 
    [ownerDocument] => (object value omitted) 
    [namespaceURI] => 
    [prefix] => 
    [localName] => div 
    [baseURI] => 
    [textContent] => 











     Get_text 1 





       Get_text 2 
       Get_text 3 


       Get_text 4 
       Get_text 5 




      Get_text 6         
       Get_text 7 
       Get_text 8 
       Get_text 9 


      Get_text 10 



) 
DOMElement Object 
(
    [tagName] => div 
    [schemaTypeInfo] => 
    [nodeName] => div 
    [nodeValue] => 











     Other Get_text 1 





       Other Get_text 2 
       Other Get_text 3 


       Other Get_text 4 
       Other Get_text 5 




      Other Get_text 6         
       Other Get_text 7 
       Other Get_text 8 
       Other Get_text 9 


      Other Get_text 10 



    [nodeType] => 1 
    [parentNode] => (object value omitted) 
    [childNodes] => (object value omitted) 
    [firstChild] => (object value omitted) 
    [lastChild] => (object value omitted) 
    [previousSibling] => (object value omitted) 
    [attributes] => (object value omitted) 
    [ownerDocument] => (object value omitted) 
    [namespaceURI] => 
    [prefix] => 
    [localName] => div 
    [baseURI] => 
    [textContent] => 











     Other Get_text 1 





       Other Get_text 2 
       Other Get_text 3 


       Other Get_text 4 
       Other Get_text 5 




      Other Get_text 6         
       Other Get_text 7 
       Other Get_text 8 
       Other Get_text 9 


      Other Get_text 10 



) 

私がやっているので、道:

<?php 
$html = "exemplo_parse.html"; 
libxml_use_internal_errors(true); 
$dom = new domDocument('1.0', 'utf-8'); 
$dom->loadHTMLFile($html); 
$dom->preserveWhiteSpace = false; 
$xpath = new DOMXPath($dom); 
$content = $xpath->query('//strong[@class="imvFse emp-fase"]'); 
foreach($content as $span) 
{ 
    echo "Key 1 : ".$span->textContent."<br/>"; 
} 
$content = $xpath->query('//strong[@class="emp-nome infNme colorTxt"]'); 
foreach($content as $span) 
{ 
    echo "Key 2 : ".$span->textContent."<br/>"; 
} 
$content = $xpath->query('//span[@class="emp-loc-part1 infLoc"]'); 
foreach($content as $span) 
{ 
    echo "Key 3 : ".$span->textContent."<br/>"; 
} 
$content = $xpath->query('//span[@class="emp-loc-part2 infLoc"]'); 
foreach($content as $span) 
{ 
    echo "Key 4 : ".$span->textContent."<br/>"; 
} 
$content = $xpath->query('//li[@class="txtBed emp-un-dorms"]'); 
foreach($content as $span) 
{ 
    echo "Key 5 : ".$span->textContent."<br/>"; 
} 
$content = $xpath->query('//li[@class="txtArea emp-un-area"]'); 
foreach($content as $span) 
{ 
    echo "Key 6 : ".$span->textContent."<br/>"; 
} 
$content = $xpath->query('//li[@class="txtCar emp-un-park"]'); 
foreach($content as $span) 
{ 
    echo "Key 7 : ".$span->textContent."<br/>"; 
} 
?> 

私がデータを取得しますこの方法:

Key 1 : Get_text 1 
Key 1 : Other Get_text 1 
Key 2 : 
Key 2 : 
Key 3 : Get_text 2 
Key 3 : Other Get_text 2 
Key 4 : Get_text 3 
Key 4 : Other Get_text 3 
Key 5 : Get_text 6 
Key 5 : Other Get_text 6 
Key 6 : Get_text 7 
Key 6 : Other Get_text 7 
Key 7 : Get_text 9 
Key 7 : Other Get_text 9 

つまり、キーを繰り返していますが、キーは順番に(K1、k2、...、k7、k1、k2、...、k7)すなわち、(k1、k1、k2、k2、...、k7、k7)である。

私の悪い英語申し訳ありませんが、私はまだ良いだろう...

答えて

0

は、ここで私が得たソリューションです:

<?php 
$html = <<<HTML 
<div class="lstImv blackBd12"> 
    <div class="stCl3 stLeft imvImg"> 
     <div class="imgBox">    
      <a class="emp-imgs-link"> 
       <span class="imgFrm frmBig frmLeft"> 
        <img class="emp-img-principal"> 
       </span> 
       <span class="imgFrm frmMd frmTop"> 
        <img class="emp-img-logo"> 
       </span> 
       <span class="imgFrm frmMd frmBot"> 
        <img class="emp-img-foto"> 
       </span>    
      </a> 
     </div> 
     <strong class="imvFse emp-fase">Get_text 1</strong> 
    </div> 
    <div class="imvInf stCl3 stRight"> 
     <div class="infHd"> 
      <div class="hdLeft stCl2"> 
       <strong class="emp-nome infNme colorTxt"></strong> 
       <span class="emp-loc-part1 infLoc">Get_text 2</span> 
       <span class="emp-loc-part2 infLoc">Get_text 3</span> 
      </div> 
      <div class="hdRight stCl1"> 
       <em class="emp-valor-apartir" >Get_text 4</em> 
       <strong class="emp-valor infVlr colorTxt">Get_text 5</strong> 
      </div> 
     </div> 
     <div class="infTxt"> 
      <p class="blackTxt60 emp-descritivo"></p> 
      <ul>     
       <li class="txtBed emp-un-dorms">Get_text 6</li>         
       <li class="txtArea emp-un-area">Get_text 7</li> 
       <li class="txtToilet emp-un-bath">Get_text 8</li> 
       <li class="txtCar emp-un-park">Get_text 9</li> 
      </ul> 
     </div> 
     <div class="infBt"> 
      <a href="/parceiro_cadastro" title="" class="btCadastrese stBt stBtLt colorBg whiteTxt rc9 sh15 emp-btn-cadastre">Get_text 10</a> 
     </div> 
    </div> 
</div> 
<div class="lstImv blackBd12"> 
    <div class="stCl3 stLeft imvImg"> 
     <div class="imgBox">    
      <a class="emp-imgs-link"> 
       <span class="imgFrm frmBig frmLeft"> 
        <img class="emp-img-principal"> 
       </span> 
       <span class="imgFrm frmMd frmTop"> 
        <img class="emp-img-logo"> 
       </span> 
       <span class="imgFrm frmMd frmBot"> 
        <img class="emp-img-foto"> 
       </span>    
      </a> 
     </div> 
     <strong class="imvFse emp-fase">Other Get_text 1</strong> 
    </div> 
    <div class="imvInf stCl3 stRight"> 
     <div class="infHd"> 
      <div class="hdLeft stCl2"> 
       <strong class="emp-nome infNme colorTxt"></strong> 
       <span class="emp-loc-part1 infLoc">Other Get_text 2</span> 
       <span class="emp-loc-part2 infLoc">Other Get_text 3</span> 
      </div> 
      <div class="hdRight stCl1"> 
       <em class="emp-valor-apartir" >Other Get_text 4</em> 
       <strong class="emp-valor infVlr colorTxt">Other Get_text 5</strong> 
      </div> 
     </div> 
     <div class="infTxt"> 
      <p class="blackTxt60 emp-descritivo"></p> 
      <ul>     
       <li class="txtBed emp-un-dorms">Other Get_text 6</li>         
       <li class="txtArea emp-un-area">Other Get_text 7</li> 
       <li class="txtToilet emp-un-bath">Other Get_text 8</li> 
       <li class="txtCar emp-un-park">Other Get_text 9</li> 
      </ul> 
     </div> 
     <div class="infBt"> 
      <a href="/parceiro_cadastro" title="" class="btCadastrese stBt stBtLt colorBg whiteTxt rc9 sh15 emp-btn-cadastre">Other Get_text 10</a> 
     </div> 
    </div> 
</div> 
HTML; 

$dom = new domDocument('1.0', 'utf-8'); 
$dom->loadHTML($html); 
$dom->preserveWhiteSpace = false; 
$xpath = new DOMXPath($dom); 


$items = $xpath->query('//div[@class="lstImv blackBd12"]'); 
for($i = 0; $i < $items->length; $i++) 
{ 
    $status = $xpath->query('//strong[@class="imvFse emp-fase"]'); 
    echo "Value  :".$status->item($i)->nodeValue."<br/>";  

    $titulo = $xpath->query('//span[@class="emp-loc-part1 infLoc"]'); 
    echo "Value  :".$titulo->item($i)->nodeValue."<br/>"; 

    $titulo2 = $xpath->query('//span[@class="emp-loc-part2 infLoc"]'); 
    echo "Value  :".$titulo2->item($i)->nodeValue."<br/>"; 

    $valor = $xpath->query('//em[@class="emp-valor-apartir"]'); 
    echo "Value  :".$valor->item($i)->nodeValue."<br/>"; 

    $valor2 = $xpath->query('//strong[@class="emp-valor infVlr colorTxt"]'); 
    echo "Value  :".$valor2->item($i)->nodeValue."<br/>"; 

    $dorm = $xpath->query('//li[@class="txtBed emp-un-dorms"]'); 
    echo "Value  :".$dorm->item($i)->nodeValue."<br/>"; 

    $tam = $xpath->query('//li[@class="txtArea emp-un-area"]'); 
    echo "Value  :".$tam->item($i)->nodeValue."<br/>"; 

} 
?> 

ideone

で見ます
関連する問題