BeautifulSoupでデータを抽出し、CSVに出力

前の質問で述べたように、私はPythonで美しいスープを使ってウェブサイトから気象データを取得しています。私はこれらのコードを使用して必要な情報を取得するために管理BeautifulSoupでデータを抽出し、CSVに出力

<channel> 
<title>2 Hour Forecast</title> 
<source>Meteorological Services Singapore</source> 
<description>2 Hour Forecast</description> 
<item> 
<title>Nowcast Table</title> 
<category>Singapore Weather Conditions</category> 
<forecastIssue date="18-07-2016" time="03:30 PM"/> 
<validTime>3.30 pm to 5.30 pm</validTime> 
<weatherForecast> 
<area forecast="TL" lat="1.37500000" lon="103.83900000" name="Ang Mo Kio"/> 
<area forecast="SH" lat="1.32100000" lon="103.92400000" name="Bedok"/> 
<area forecast="TL" lat="1.35077200" lon="103.83900000" name="Bishan"/> 
<area forecast="CL" lat="1.30400000" lon="103.70100000" name="Boon Lay"/> 
<area forecast="CL" lat="1.35300000" lon="103.75400000" name="Bukit Batok"/> 
<area forecast="CL" lat="1.27700000" lon="103.81900000" name="Bukit Merah"/>` 
<channel>

：ここ

import requests 
from bs4 import BeautifulSoup 
import urllib3 

#getting the ValidTime 

r = requests.get('http://www.nea.gov.sg/api/WebAPI/? 
dataset=2hr_nowcast&keyref=781CF461BB6606AD907750DFD1D07667C6E7C5141804F45D') 
soup = BeautifulSoup(r.content, "xml") 
time = soup.find('validTime').string 
print "validTime: " + time 

#getting the date 

for currentdate in soup.find_all('item'): 
    element = currentdate.find('forecastIssue') 
    print "date: " + element['date'] 

#getting the time 

for currentdate in soup.find_all('item'): 
    element = currentdate.find('forecastIssue') 
    print "time: " + element['time'] 

for area in soup.find('weatherForecast').find_all('area'): 
    area_attrs_li = [area.attrs for area in soup.find('weatherForecast').find_all('area')] 
    print area_attrs_li

は私の結果は以下のとおりです。

{'lat': u'1.34039000', 'lon': u'103.70500000', 'name': u'Jurong West', 
'forecast': u'LR'}, {'lat': u'1.31200000', 'lon': u'103.86200000', 'name': 
u'Kallang', 'forecast': u'LR'},

は、ウェブサイトがどのように見えるかです

どうすれば結果からuを削除できますか？私はグーグルで見つけた方法を試してみましたが、うまく動作していないようです。

私はPythonで強くなく、かなりの間これに固執しています。

EDIT：私はこれをやってみました：

f = open("C:\\scripts\\nea.csv" , 'wt') 

try: 
for area in area_attrs_li: 
writer = csv.writer(f) 
writer.writerow((time, element['date'], element['time'], area_attrs_li)) 

finally: 
    f.close() 

print open("C:/scripts/nea.csv", 'rt').read()

それは私がレコードがCSVで重複していると離れた領域を分割したい、しかし、働いていた：

ありがとうございました。

出典

2016-07-26 plzhelpmi

あなたの「ウェブサイト」は、プレーンXMLのように見えます –

はい、私はその単純なXMLを信じています – plzhelpmi

質問2については、あなたはあなたを削除する必要はありません。これはUnicodeを表しており、これはPythonが文字列を内部的に表現する方法であり、ファイルに書き込むときではありません。あなたがしている問題を説明する必要があります –

EDIT 1 -topic：

あなたが不足しているエスケープ文字：

C:\scripts>python neaweather.py 
File "neaweather.py", line 30 
writer.writerow(('time', 'element['date']', 'element['time']', 'area_attrs_li')) 

writer.writerow(('time', 'element[\'date\']', 'element[\'time\']', 'area_attrs_li') 
           ^

にSyntaxError：無効な構文

EDIT 2：

あなたが値を挿入する場合：

writer.writerow((time, element['date'], element['time'], area_attrs_li))

EDIT 3：

異なるラインに結果を分割する：

for area in area_attrs_li: 
    writer.writerow((time, element['date'], element['time'], area)

EDIT 4：分割は全く正しくありませんが、それがデータを解析して分割する方法の理解を与えなければなりませんあなたのニーズに合わせて変更してください。再び面積の素子を分割するあなたの画像に表示として、あなたは

for area in area_attrs_li: 
    # cut off the characters you don't need 
    area = area.replace('[','') 
    area = area.replace(']','') 
    area = area.replace('{','') 
    area = area.replace('}','') 

    # remove other characters 
    area = area.replace("u'","\"").replace("'","\"") 

    # split the string into a list 
    areaList = area.split(",") 

    # create your own csv-seperator 
    ownRowElement = ';'.join(areaList) 

    writer.writerow((time, element['date'], element['time'], ownRowElement)

にOfftopicそれを解析することができます：をこれが私の作品：

import csv 
import json 

x="""[ 
    {'lat': u'1.34039000', 'lon': u'103.70500000', 'name': u'Jurong West','forecast': u'LR'} 
]""" 

jsontxt = json.loads(x.replace("u'","\"").replace("'","\"")) 

f = csv.writer(open("test.csv", "w+")) 

# Write CSV Header, If you dont need that, remove this line 
f.writerow(['lat', 'lon', 'name', 'forecast']) 

for jsontext in jsontxt: 
    f.writerow([jsontext["lat"], 
       jsontext["lon"], 
       jsontext["name"], 
       jsontext["forecast"], 
       ])

出典

2016-07-26 07:46:14 user2853437

質問は、JSONではなくXMLについてです。 –

こんにちは、それは別の列にそれを分割して動作します:)私はそれがウェブサイトを実行したい場合、あなたのコードを編集するには？ :) – plzhelpmi

合意。質問にはJSON –

BeautifulSoupでデータを抽出し、CSVに出力

答えて

関連する問題