Python：XPathを使用してテーブルからデータを取得する

http://projects.fivethirtyeight.com/election-2016/delegate-targets/の下部にあるテーブルからデータを取得しようとしています。Python：XPathを使用してテーブルからデータを取得する

import requests 
from lxml import html 

url = "http://projects.fivethirtyeight.com/election-2016/delegate-targets/" 
response = requests.get(url) 
doc = html.fromstring(response.text) 


tables = doc.findall('.//table[@class="delegates desktop"]') 
election = tables[0] 
election_rows = election.findall('.//tr') 
def extractCells(row, isHeader=False): 
    if isHeader: 
     cells = row.findall('.//th') 
    else: 
     cells = row.findall('.//td') 
    return [val.text_content() for val in cells] 

import pandas 

def parse_options_data(table): 
    rows = table.findall(".//tr") 
    header = extractCells(rows[1], isHeader=True) 
    data = [extractCells(row, isHeader=False) for row in rows[2:]] 
    return pandas.DataFrame(data, columns=header) 

election_data = parse_options_data(election) 
election_data

私は候補者の名前（『トランプ』、『クルス』、『Kasich』）との一番上の行とのトラブルを抱えています。 trクラス= "トップ"の下にあり、今はtrクラス= "ボトム"（「ウォン/ターゲット」という行から始まる）だけです。

ご協力いただきありがとうございます。

出典

2016-03-28 Lucy

候補名は0行目にある：

candidates = [val.text_content() for val in rows[0].findall('.//th')[1:]]

あるいは、同じextractCells()関数再利用する場合：ここで

candidates = extractCells(rows[0], isHeader=True)[1:]

[1:]のスライスが最初の空thセルをスキップすることです。

出典

2016-03-28 02:17:31 alecxe

いいえ（ハードコードされていますが、実行しています）。

def parse_options_data(table): 
    rows = table.findall(".//tr") 
    candidate = extractCells(rows[0], isHeader=True)[1:]                                    
    header = extractCells(rows[1], isHeader=True)[:3] + candidate 
    data = [extractCells(row, isHeader=False) for row in rows[2:]] 
    return pandas.DataFrame(data, columns=header)

出典

2016-03-28 04:12:07 han058

Python：XPathを使用してテーブルからデータを取得する

答えて

関連する問題