scrapy

私はこのページをこするだとCSVファイルの行ずつ要素を書きますこれはCSVファイルで、各行に1つのムービーのレコードが含まれています。scrapy

これは私が書いた蜘蛛です：CSVファイルの

1）各行：

class simple_spider(scrapy.Spider): 
    name = 'movies_spider' 
    allowed_domains = ['mymcpl.org'] 
    download_delay = 1 


    start_urls = ['http://www.mymcpl.org/cfapps/botb/movie.cfm?browse={}'.format(letter) for letter in string.uppercase] # ['http://www.mymcpl.org/cfapps/botb/movie.cfm'] 


    def parse(self, response): 
     xpaths = {'book':'//*[@id="main"]/tr[{}]/td[2]/text()[1]', 
        'author':'//*[@id="main"]/tr[{}]/td[2]/a/text()', 
        'movie':'//*[@id="main"]/tr[{}]/td[1]/text()[1]', 
        'movie_year':'//*[@id="main"]/tr[{}]/td[1]/a/text()'} 

     data = {key:[] for key in xpaths} 
     for row in range(2,len(response.xpath('//*[@id="main"]/tr').extract()) + 1): 
      for key in xpaths.keys(): 
       value = response.xpath(xpaths[key].format(row)).extract_first() 
       data[key] = (value) 
     yield data.values()

をクモを実行する：

scrapy runspider m_spider.py output.csv

私はここに二つの問題を抱えています現在のレコードのみが含まれていますが、前のすべてのレコードも含まれています辞書の値を追加していませんが

2）スパイダーはstart_urlsのページだけを削っています。

出典

2016-08-08 Luis Ramon Ramirez Rodriguez

Scrapyにはすでに組み込みのcsvエクスポータがあります。あなたがする必要があるのは、アイテムを産出することであり、治療はそれらのアイテムをCSVファイルに出力するだけです。

def parse(self, response): 
    xpaths = {'book':'//*[@id="main"]/tr[{}]/td[2]/text()[1]', 
       'author':'//*[@id="main"]/tr[{}]/td[2]/a/text()', 
       'movie':'//*[@id="main"]/tr[{}]/td[1]/text()[1]', 
       'movie_year':'//*[@id="main"]/tr[{}]/td[1]/a/text()'} 
    return {key:[] for key in xpaths}

それからちょうど：

scrapy crawl myspider --output results.csv

*は、CSV部分に注意し、scrapyはまた、出力は単に引数にファイルの拡張子を変更し、代わりにCSV形式のJSON（ライン）.jsonと.jlすることができます。

出典

2016-08-08 06:42:10 Granitosaurus

答えて

関連する問題