Scrapy：検索結果をループすると、最初の項目のみが返されます。

検索ページを通過し、その中のすべての結果をループしてサイトを削っています。しかし、それは各ページの最初の結果を返すように見えるだけです。また、スタートページの結果にも当てはまるとは思わない。Scrapy：検索結果をループすると、最初の項目のみが返されます。

第2に、価格はUnicode（£記号）の何らかの種類として戻ってきています。価格を残すだけではどうすれば削除できますか？

'regular_price': [u'\xa38.59'],

ここでHTMLれる： http://pastebin.com/F8Lud0hu

がここにクモです：

import scrapy 
import random 
from scrapy.spiders import CrawlSpider, Rule 
from scrapy.linkextractors import LinkExtractor 
from scrapy.selector import Selector 
from cdl.items import candleItem 

class cdlSpider(CrawlSpider): 
     name = "cdl" 
     allowed_domains = ["www.xxxx.co.uk"] 
     start_urls = ['https://www.xxxx.co.uk/advanced_search_result.php'] 

     rules = [ 
       Rule(LinkExtractor(
         allow=['advanced_search_result\.php\?sort=2a&page=\d*']), 
         callback='parse_listings', 
         follow=True) 
     ] 

     def parse_listings(self, response): 
       sel = Selector(response) 
       urls = sel.css('a.product_img') 

       for url in urls: 
         url = url.xpath('@href').extract()[0] 
         return scrapy.Request(url,callback=self.parse_item) 

     def parse_item(self, response): 

       candle = candleItem() 

       n = response.css('.prod_info_name h1') 

       candle['name'] = n.xpath('.//text()').extract()[0] 


       if response.css('.regular_price'): 
         candle['regular_price'] = response.css('.regular_price').xpath('.//text()').extract() 
       else: 
         candle['was_price'] = response.css('.was_price strong').xpath('.//text()').extract() 
         candle['now_price'] = response.css('.now_price strong').xpath('.//text()').extract() 

       candle['referrer'] = response.request.headers.get('Referer', None) 
       candle['url'] = response.request.url 

       yield candle

出典

2016-08-31 Dan H

はい、parse_listingメソッドのため最初の結果しか返されません（最初のURLが返され、それを生成する必要があります）。私ものようなものだろうその場合

def parse_listings(self, response): 
    for url in response.css('a.product_img::attr(href)').extract(): 
     yield Request(url, callback=self.parse_item)

：私のような何かをするだろう

class CdlspiderSpider(CrawlSpider): 
    name = 'cdlSpider' 
    allowed_domains = ['www.xxxx.co.uk'] 
    start_urls = ['https://www.xxxx.co.uk/advanced_search_result.php'] 

    rules = [ 
     Rule(LinkExtractor(allow='advanced_search_result\.php\?sort=2a&page=\d*')), 
     Rule(LinkExtractor(restrict_css='a.product_img'), callback='parse_item') 
     ] 

    def parse_item(self, response): 
     ... 
     if response.css('.regular_price'): 
      candle['regular_price'] = response.css('.regular_price::text').re_first(r'\d+\.?\d*') 
     else: 
      candle['was_price'] = response.css('.was_price strong::text').re_first(r'\d+\.?\d*') 
      candle['now_price'] = response.css('.now_price strong::text').re_first(r'\d+\.?\d*') 
     ... 
     return candle

出典

2016-08-31 16:36:55 Wilfredo

を、リターンscrapy.Request（URL、コールバック= self.parse_item）からの変更をありがとうTo： yield scrapy.Request（url、callback = self.parse_item）完全に動作します。 –

こんにちは、後続の質問申し訳ありません。私が掻き集めている価格のいくつかは数千にあり、コンマでフォーマットされています（例えば£1,190.00）。これらのものでは、正規表現の条件はここでの価格とちょうど "1"と一致します。通常の価格は正常に動作します。これをどのように修正するかについての提案がありますか？ありがとう –

正規表現 '\ d + \。？\ d * ''を ''（？：\ d {1,3} [、\。]）* \ d \ d'に変更してください。 – Guillaume

ちょうどこのような空の文字列に置き換える、£を削除するには：

pricewithpound = u'\xa38.59' 
price = pricewithpound.replace(u'\xa3', '')

治療上の問題を調査するには、HTML ource？

出典

2016-08-31 12:44:40 Guillaume

Scrapy：検索結果をループすると、最初の項目のみが返されます。

答えて

関連する問題