2012-04-23 38 views
4

私はthe one found in this answerに基づくPHPクラスを使用して5つのRSSフィードを解析しています。 5つのうち4つは問題なく動作しますが、one of themは私にいくつかのエラーを与えています。それは不正な形式のXMLやその他の問題ですか?私はRSSフィードのソースを管理することはできませんが、問題が自分のものかどうかをオーナーに知らせることを望みます。XMLを解析するPHPエラー(RSSフィード)

ありがとうございます。

PHPのエラー:(オンラインhttp://ahima.org/RSS/News-Alerts-RSS.aspxで)

Warning: simplexml_load_string() [function.simplexml-load-string]: Entity: line 35: parser error : xmlParseEntityRef: no name in _rss.php on line 59 

Warning: simplexml_load_string() [function.simplexml-load-string]: ne is June 30. The award will be presented at the 84th AHIMA Annual Convention & in _rss.php on line 59 

Warning: simplexml_load_string() [function.simplexml-load-string]:^in _rss.php on line 59 

Warning: simplexml_load_string() [function.simplexml-load-string]: Entity: line 64: parser error : EntityRef: expecting ';' in _rss.php on line 59 

Warning: simplexml_load_string() [function.simplexml-load-string]: e code modifications presented at the ICD-9-CM Coordination and Maintenance (C&M in _rss.php on line 59 

Warning: simplexml_load_string() [function.simplexml-load-string]:^in _rss.php on line 59 

XML/RSSフィード

<?xml version="1.0" encoding="utf-8"?> 
<rss version="2.0"> 
    <channel> 
     <generator>RSS Builder by AHIMA</generator> 
     <title>News And Alerts</title> 
     <link>http://www.ahima.org/</link> 
     <description>News and Alerts from AHIMA.ORG</description> 
     <language>en-us</language> 
     <managingEditor>[email protected]</managingEditor> 
     <webMaster>[email protected]</webMaster> 
     <copyright>2010 AHIMA</copyright> 
     <item> 
      <title>Exclusive Coverage of AHIMA ICD-10 Summit</title> 
      <pubDate>4/13/2012 2:39:54 PM</pubDate> 
      <link>http://journal.ahima.org/icdsummit/</link> 
      <author>[email protected]</author> 
      <category>News - Alerts</category> 
      <description>The summit takes place April 16–17 in Baltimore, MD, and explores the challenges and opportunities involved in the transition to the ICD-10-CM/PCS coding systems. The Journal’s coverage begins April 11 with session previews and comments from the presenters. Keep up to date on the summit by checking this site daily, subscribing to the RRS feed, and following @JournalofAHIMA on Twitter. Follow the Twitter hash tag #ICD10Summit for updates from summit attendees.</description> 
     </item><item> 
      <title>AHIMA: Remain Focused on Expediting ICD-10 Implementation</title> 
      <pubDate>4/10/2012 2:18:22 PM</pubDate> 
      <link>http://www.ahima.org/downloads/pdfs/pr/press-releases/HHS%20Announces%20IDC-10%20Delay.pdf</link> 
      <author>[email protected]</author> 
      <category>News - Alerts</category> 
      <description>CHICAGO – April 10, 2012 – In light of the U.S. Department of Health and Human Services (HHS) proposed one-year delay in implementing ICD-10-CM or ICD-10-PCS for HIPAA covered entities, AHIMA encouraged organizations to remain focused on their implementation efforts. 
</description> 
     </item><item> 
      <title>Call for Nominations: New AHIMA Grace Award</title> 
      <pubDate>3/30/2012 11:45:09 AM</pubDate> 
      <link>/about/grace.aspx</link> 
      <author>[email protected]</author> 
      <category>News - Alerts</category> 
      <description>Grace Award: In Recognition of Excellence in Health Information Management will honor healthcare delivery organizations that demonstrate effective and innovative approaches in using health information to deliver quality healthcare. 

Nomination applications are now available, and the submission deadline is June 30. The award will be presented at the 84th AHIMA Annual Convention & Exhibit in Chicago, September 29-October 4.</description> 
     </item><item> 
      <title>Practice Brief: Mobile Device Security</title> 
      <pubDate>4/13/2012 2:44:19 PM</pubDate> 
      <link>http://library.ahima.org/xpedio/groups/public/documents/ahima/bok1_049463.hcsp?dDocName=bok1_049463</link> 
      <author>[email protected]</author> 
      <category>News - Alerts</category> 
      <description>Mobile devices have pervaded the everyday work environment in healthcare. An organization may use mobile devices to improve clinician workflow, bedside information gathering and reporting, or a host of other care delivery applications. In some cases, individuals may use their own mobile devices to meet their personal workflow requirements. 

Whatever purpose the device serves, healthcare organizations must be prepared to understand all the issues related to mobile device use. 

This practice brief reviews the legal and regulatory requirements that affect mobile device use in healthcare. It also provides best practices for ensuring appropriate safeguards are in place to protect all electronic protected health information (ePHI) used and processed within mobile devices.</description> 
     </item><item> 
      <title>Workflow and EHRs in Small Medical Practices </title> 
      <pubDate>4/13/2012 2:45:16 PM</pubDate> 
      <link>http://perspectives.ahima.org/index.php?option=com_content&amp;view=article&amp;id=247:workflow-and-electronic-health-records-in-small-medical-practices&amp;catid=42:electronic-records&amp;Itemid=88</link> 
      <author>[email protected]</author> 
      <category>News - Alerts</category> 
      <description>This paper analyzes the workflow and implementation of electronic health record (EHR) systems across different functions in small physician offices. We characterize the differences in the offices based on the levels of computerization in terms of workflow, sources of time delay, and barriers to using EHR systems to support the entire workflow. 

The study was based on a combination of questionnaires, interviews, in situ observations, and data collection efforts. This study was not intended to be a full-scale time-and-motion study with precise measurements but was intended to provide an overview of the potential sources of delays while performing office tasks. The study follows an interpretive model of case studies rather than a large-sample statistical survey of practices. To identify time-consuming tasks, workflow maps were created based on the aggregated data from the offices. The results from the study show that specialty physicians are more favorable toward adopting EHR systems than primary care physicians are. The barriers to adoption of EHR systems by primary care physicians can be attributed to the complex workflows that exist in primary care physician offices, leading to nonstandardized workflow structures and practices. Also, primary care physicians would benefit more from EHR systems if the systems could interact with external entities. 

</description> 
     </item><item> 
      <title>AHIMA Comments on Proposed Modification to ICD-9 Procedure Codes</title> 
      <pubDate>4/13/2012 2:46:58 PM</pubDate> 
      <link>http://www.ahima.org/downloads/pdfs/advocacy/AHIMA%20comments_CM_procedure_0312.pdf</link> 
      <author>[email protected]</author> 
      <category>News - Alerts</category> 
      <description>The American Health Information Management Association (AHIMA) respectfully submits the following comments on the proposed procedure code modifications presented at the ICD-9-CM Coordination and Maintenance (C&M) Committee meeting held on March 5.</description> 
     </item><item> 
      <title>AHIMA Foundation Establishes Research Innovation and Leadership Institute</title> 
      <pubDate>4/13/2012 2:47:47 PM</pubDate> 
      <link>http://ahimafoundation.org/PolicyResearch/RILI.aspx</link> 
      <author>[email protected]</author> 
      <category>News - Alerts</category> 
      <description>For the HIM profession to remain relevant and influential we must have a dynamic and expanding knowledge base and defined set of desired skills and expertise. 
To remain relevant we need to expand our knowledge base and stakeout our content area of expertise through mission and discipline critical research. This research must meet standards of scientific rigor and set the foundation for knowledge creation, innovative concept development, and thought leadership. 

To increase influence we need to disseminate knowledge through scholarly processes and publications that inform best practices and influence policy makers. Scholarship must demonstrate our unique expertise and content knowledge base within the healthcare industry. Furthermore, knowledge transfer or dissemination will increase AHIMA brand recognition and enhance brand prestige and prominence. 

To sustain a systematic research initiative AHIMA has established a centralized, high performing Research Innovation and Leadership Institute (RILI) as an enduring mission critical component of the AHIMA Foundation. 
</description> 
     </item> 
    </channel> 
</rss> 

PHPコード:

<?php 

if (!function_exists('strip_html_tags')){ function strip_html_tags($text) 
{ 
    $text = preg_replace(
     array(
      // Remove invisible content 
      '@<head[^>]*?>.*?</head>@siu', 
      '@<style[^>]*?>.*?</style>@siu', 
      '@<script[^>]*?.*?</script>@siu', 
      '@<object[^>]*?.*?</object>@siu', 
      '@<embed[^>]*?.*?</embed>@siu', 
      '@<applet[^>]*?.*?</applet>@siu', 
      '@<noframes[^>]*?.*?</noframes>@siu', 
      '@<noscript[^>]*?.*?</noscript>@siu', 
      '@<noembed[^>]*?.*?</noembed>@siu', 
      // Add line breaks before and after blocks 
      '@</?((address)|(blockquote)|(center)|(del))@iu', 
      '@</?((div)|(h[1-9])|(ins)|(isindex)|(p)|(pre))@iu', 
      '@</?((dir)|(dl)|(dt)|(dd)|(li)|(menu)|(ol)|(ul))@iu', 
      '@</?((table)|(th)|(td)|(caption))@iu', 
      '@</?((form)|(button)|(fieldset)|(legend)|(input))@iu', 
      '@</?((label)|(select)|(optgroup)|(option)|(textarea))@iu', 
      '@</?((frameset)|(frame)|(iframe))@iu', 
     ), 

     array(
      ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', 
      "\$0", "\$0", "\$0", "\$0", "\$0", "\$0", 
      "\$0", "\$0", 
     ), 
     $text); 
    return strip_tags($text); 
} } 

class BlogPost { 
    var $date; 
    var $ts; 
    var $link; 

    var $title; 
    var $text; 
    var $author; 
    var $summary; 
    var $full; 
} 

class BlogFeed { 
    var $posts = array(); 

    function BlogFeed($file_or_url){ 
     if(!eregi('^http:', $file_or_url)) { 
      $feed_uri = $_SERVER['DOCUMENT_ROOT'] .'/shared/xml/'. $file_or_url; 
     } else { 
      $feed_uri = $file_or_url; 
     } 

     $xml_source = file_get_contents($feed_uri); 
     $x = simplexml_load_string($xml_source); 

     if (count($x) == 0) return; 

     foreach($x->channel->item as $item) { 
      $post = new BlogPost(); 
      $post->date = (string) $item->pubDate; 
      $post->ts = strtotime($item->pubDate); 
      $post->link = (string) $item->link; 
      $post->title = (string) $item->title; 
      $post->text = (string) strip_html_tags($item->description); 
      $post->full = (string) $item->description; 
      $post->author = (string) $item->author; 

      $summary = strip_html_tags($post->text); 

      $max_len = 300; 
      if(strlen($summary) > $max_len) { 
       $summary = substr($summary, 0, $max_len) . '...'; 
      } 

      $post->summary = $summary; 

      $this->posts[] = $post; 
     } 
    } 
} 

$blogs = array(
'http://www.hhs.gov/rss/news/hhsnews.xml', 
'http://ahima.org/RSS/News-Alerts-RSS.aspx', 
'http://www.healthcareitnews.com/rss/news', 
'http://www.healthcareitnews.com/resource/feed', 
'http://www.modernhealthcare.com/section/rss05&mime=xml' 
); 

foreach($blogs as $k=>$v){ 
    $blog = new BlogFeed($v); 
    foreach ($blog->posts as $one_item){ 
     /* ... */ 
    } 
} 
+0

これを試してください:http://simplehtmldom.sourceforge.net/古いですが、かなりうまく動作します。 – Eugene

+1

XMLフィードにエラーが含まれているため、SimpleXMLが解析しない可能性があります。 – Repox

答えて

14

まあ、それはフィードが固定取得に比べて、かなりではないかもしれませんが、これは解決策である:

$xml_source = str_replace(array("&amp;", "&"), array("&", "&amp;"), file_get_contents($feed_uri)); 
    $x = simplexml_load_string($xml_source); 

まず、私は戻ってALL &を変換することを確認するために、通常&&amp;を置き換えます再度&amp;

+1

+1と受け入れ、シンプルだが効果的です - ありがとうございます! –

+0

素晴らしい(y)!!!私はもう2,3時間を節約しました、すでに3時間を費やしました。どうもありがとう – atif

3

問題はXMLにあります。具体的には、 '84th AHIMA Annual Convention & Exhibit'というフレーズに '&'という文字があります。これはエスケープする必要があります。 http://www.xmlvalidation.com/のようなオンラインのXMLバリデーターを使用することで、扱っているXMLに問題があるかどうかを知ることができます。

+0

+1このリソースをありがとうございました.W3Cのバリデーターは私の親友の一人ですから、XMLバリデーションサービスが存在するはずです。 –

0

他の回答とコメントに記載されているように、ソースXMLは壊れており、XMLパーサーは無効な入力を拒否することになっています。 libxmlにはこの壊れたXMLを読み込ませる "回復"モードがありますが、 "& sid"部分が失われてしまい、役に立たないでしょう。

チャンスがあるのが好きな人は、入力の種類を固定して何とかしてみてください。いくつかの文字列置換を使用して、URLのクエリ部分にあるように見えるアンパサンドをエスケープすることができます。

$xml = file_get_contents('broken.xml'); 
    // replace & followed by a bunch of letters, numbers 
    // and underscores and an equal sign with &amp; 
    $xml = preg_replace('#&(?=[a-z_0-9]+=)#', '&amp;', $xml); 
    $sxe = simplexml_load_string($xml); 

これはもちろん、ハックではなく、状況を解決する唯一の良い方法は、XMLプロバイダに発電機の修正を依頼することです。なぜなら、壊れたXMLが生成された場合、他のエラーが気付かれずに何かスリップするのを誰が知っているのでしょうか

関連する問題