シンプルなWebスクレイプを実行しようとしています。目標は、dt gm tmとntvクラスをcsvにダンプすることです。ここでは分かりやすくするためにjsonです。一歩ずつ。ここ 治療NBAスケジュールが正しく照合されない
はクモです:import scrapy
class QuotesSpider(scrapy.Spider):
name = "schedule"
start_urls = [
'http://www.nba.com/schedules/national_tv_schedule/',
]
def parse(self, response):
for game in response.css('td'):
yield {
'date': game.css('td.dt::text').extract(),
'time': game.css('td.tm::text').extract(),
}
は本当に簡単 - しかし、そのように吐き出す:
[
{"date": ["Sat, Oct 1", " ", "Sun, Oct 2", "Mon, Oct 3", " ", " ", " ", " ", " ", " "], "time": ["7:30 pm", "8:00 pm", "8:00 pm", "2:30 pm", "8:00 pm", "8:00 pm", "8:30 pm", "9:00 pm", "10:00 pm", "10:00 pm", "7:00 pm", "7:00 pm", "8:00 pm", "8:00 pm", "10:00 pm", "10:30 pm", "2:30 pm", "7:00 pm", "10:00 pm", "10:30 pm", "7:00 pm", "7:00 pm", "7:30 pm", "7:30 pm", "8:00 pm", "10:30 pm", "10:00 pm"]},
{"date": [], "time": []},
{"date": [], "time": []},
{"date": [], "time": []},
{"date": [], "time": []},
{"date": [], "time": []},
{"date": ["Sat, Oct 1"], "time": []},
{"date": [], "time": []},
{"date": [], "time": ["7:30 pm"]},
{"date": [], "time": []},
{"date": [" "], "time": []},
{"date": [], "time": []},
{"date": [], "time": ["8:00 pm"]},
{"date": [], "time": []},
{"date": ["Sun, Oct 2"], "time": []},
{"date": [], "time": []},
{"date": [], "time": ["8:00 pm"]},
{"date": [], "time": []},
{"date": ["Mon, Oct 3"], "time": []},
{"date": [], "time": []},
{"date": [], "time": ["2:30 pm"]},
{"date": [], "time": []},
{"date": [" "], "time": []},
{"date": [], "time": []},
{"date": [], "time": ["8:00 pm"]},
{"date": [], "time": []},
{"date": [" "], "time": []},
{"date": [], "time": []},
{"date": [], "time": ["8:00 pm"]},
{"date": [], "time": []},
{"date": [" "], "time": []},
{"date": [], "time": []},
{"date": [], "time": ["8:30 pm"]},
{"date": [], "time": []},
{"date": [" "], "time": []},
{"date": [], "time": []},
{"date": [], "time": ["9:00 pm"]},
{"date": [], "time": []},
{"date": [" "], "time": []},
{"date": [], "time": []},
{"date": [], "time": ["10:00 pm"]},
{"date": [], "time": []},
{"date": [" "], "time": []},
{"date": [], "time": []},
{"date": [], "time": ["10:00 pm"]},
{"date": [], "time": []}
]
最初の辞書には、正しい順序で正しいデータを持っていますが、ない(簡略化のために切り捨て)照合。次のdictは、最初のdictのデータを正しく一致させていません。私は改行を取るためにwhile文を試しましたが、失敗しました。
提案がありますか?私はScrapyチュートリアルを使ってこれを構築しました。私は最終的に正しい日付を挿入する必要があることを知っています。
response.css文字列の最後には引用符が必要です。それははるかにうまくいく、あなたの助けをあなたにそんなにありがとう! – areeekay
右オタクについては申し訳ありません –