治療。 LinkExtractorの予期しないシンボル

私はScrapyライブラリを勉強しており、小さなクローラを作ろうとしています。治療。 LinkExtractorの予期しないシンボル

はここで、クローラーのルールです：

rules = (
    Rule(LinkExtractor(restrict_xpaths='//div[@class="wrapper"]/div[last()]/a[@class="pagenav"][last()]')), 
    # Rule(LinkExtractor(restrict_xpaths='//span[@class="update_title"]/a'), callback='parse_item'), 
)

しかし、私は、このエラーメッセージが出ます：

DEBUG: Crawled (200) <GET http://web/category.php?id=4&> (referer: None) 
DEBUG: Crawled (404) <GET http://web/%0D%0Acategory.php?id=4&page=2&s=d> (referer: http://web/category.php?id=4&) 
DEBUG: Ignoring response <404 http://web/%0D%0Acategory.php?id=4&page=2&s=d>: HTTP status code is not handled or not allowed

ここでは、HTMLのように見える方法は次のとおりです。

<a class="pagenav" href=" category.php?id=4&page=8&s=d& ">8</a> 
| 
<a class="pagenav" href=" category.php?id=4&page=9&s=d& ">9</a> 
| 
<a class="pagenav" href=" category.php?id=4&page=10&s=d& ">10</a> 
|   
<a class="pagenav" href=" category.php?id=4&page=2&s=d& ">Next ></a>

この％の0Dだところ誰かが説明することができます％0Aはどこから来たの？親切、マキシム。

UPD：は、私は簡単な関数

def process_value(value): 
    value = value.strip() 
    print value 
    return value

を作り、

に

rules = (
    Rule(LinkExtractor(restrict_xpaths='//div[@class="wrapper"]/div[last()]/a[@class="pagenav"][last()]', process_value=process_value)), 
    # Rule(LinkExtractor(restrict_xpaths='//span[@class="update_title"]/a'), callback='parse_item'), 
)

printコマンドは、このルールを変更：

Crawled (200) <GET http://web/category.php?id=4&>(referer: None) 
http://web/ 
category.php?id=4&page=2&s=d& 
Crawled (404) <GET http://web/%0D%0Acategory.php?%0D=&id=4&page=2&s=d>(referer: http://web/category.php?id=4&)

出典

2016-05-06 Maxim Pavlov

あなたは 'href'タグを解凍するコードを表示できますか？ – Rahul

私の推測によれば、最初に相対URLを ''はずしてから、リクエストする必要があります。ストリップすると 'carriage return-％0D'と' line feed-％0A'文字が削除されます。 – Rahul

ありがとうございます。何らかの理由で.strip（）が動作しません:( –