のpython lxmlのetree.tostring（）\ rは、私が解析しようとした

にエンコーディングを持つ理由：のpython lxmlのetree.tostring（） rは、私が解析しようとした

request = urllib2.Request(url="http://2012.qq.com/sports/") 
    response = urllib2.urlopen(request) 
    content = response.read() 
    uni_content = content.decode("gb2312", "ignore") 
    tecent = uni_content.encode("utf-8") 

    tecent_page = etree.HTML(tecent, parser=etree.HTMLParser(encoding='utf-8')) 
    test_tags = tecent_page.xpath("/html/body/div[@class='page']/div[@class='layout']/div/div[@class='bd']/ul[@class='list']/li") 

    for i, item in enumerate(test_tags): 
     content = etree.tostring(item, encoding="utf-8", pretty_print=True) 
     print content

は、なぜこのような結果：

<li class="item">&#13; 
         <a class="pic" href="http://2012.qq.com/sports/judo/index.htm" target="_blank"><img width="96" height="96" src="http://mat1.gtimg.com/2012/samanthasun/allevents/roudao.png" alt="柔道"/></a>&#13; 
         <p><a href="http://2012.qq.com/sports/judo/index.htm" target="_blank">柔道</a></p>&#13; 
         <p><a href="http://2012.qq.com/l/sports/judo/judochn/list2011079114946.htm" target="_blank">新闻</a> | <a href="http://2012.qq.com/l/photos/33xiangmu/roudao/list2011079115124.htm" target="_blank">图片</a> | <a href="http://2012.qq.com/l/video/xm/vjudo/list.htm" target="_blank">视频</a></p>&#13; 
        </li>&#13;

なぜそれががありますか？
すべての行にがあります。どうして？

出典

2016-05-13 rabbage

元の文書（http://2012.qq.com/sports/）はCR+LF line breaksです。 tecent = uni_content.encode("utf-8").replace('\r\n', '\n')

： Carriage returnは、あなたが簡単な回避策を使用することができ、コード13

を持っています

出典

2016-05-13 06:21:57

のpython lxmlのetree.tostring（）\ rは、私が解析しようとした

答えて

関連する問題