bs4 Selectors Not Take taken "image：title"

私はウェブサイト上のすべての製品名を取得するためにウェブサイトスクレーパーを作っています。私は、製品のタイトルを見つけるときにKeyErrorを取得し続けます。bs4 Selectors Not Take taken "image：title"

HTML：

<url> 
    <loc> 
    https://shop.havenshop.ca/products/cassady-sunglasses-indigo-gunmetal 
    </loc> 
    <lastmod>2017-10-19T08:53:44-07:00</lastmod> 
    <changefreq>daily</changefreq> 
    <image:image> 
    <image:loc> https://cdn.shopify.com/s/files/1/0051/7042/products/Cassady_SunglassesIndigoGunmetal1.jpg?v=1436564480</image:loc> 
    <image:title>"Cassady" Sunglasses Indigo/Gunmetal</image:title> 
    </image:image> 
</url>

Pythonのコード：

session = requests.session() 
sitemap = session.get(link) 
data = sitemap.text 
soup = BeautifulSoup(data, "lxml") 
items = soup.find_all("url") 
for i in range(len(items)): 
    for item in items[i]: 
     print items[i]["image:image"]["image:title"]

エラー：

KeyError: 'image:title'

出典

2017-12-17 Michael

プリントキー 'プリント（項目[I] [ "画像：画像"]。キー（））' – furas

ループのためのあなたの内側である 'アイテム内のアイテムの[I]： 'なぜ、items [i] [" image：image "] [" image：title "]'を印刷していますか？ item ["image：image"] ["image：title"] 'ではありませんか？ –

@JohnGordonこれは "TypeError：文字列インデックスは整数でなければなりません"を返します – Michael

HTMLではない、それがXMLです。 via属性にアクセスするのではなく、名前空間の要素を見つけるべきです。これは、あなたの価値を与える：

items[i].find('image.title')

完全な例：あなたが唯一のすべての場合

：

だけの意見：

for url in soup.find_all('url'): 
    if 'Cassady' in url.find('image:title').text: 
     print(url.find('image:loc').text)

出典

2017-12-17 06:07:32 Vetsin

これはnone型を返しますか？ – Michael

'items [0] .find（ 'image：title'）'は私のために '" Cassady "Sunglasses Indigo/Gunmetal'を返します。あなたのコードには他の多くのエラーがあります。 – Vetsin

私が使っているループはxmlのタグをすべて通り抜けているので、画像：titleが "Cassady"のサングラスインディゴ/ガンメタルになるまで行くつもりです。タグタグ – Michael

最良のオプションは、BS4溶液で行くれます商品名を入力してください。必要に応じて正規表現を使用することもできます。

import re 
pattern=r'<image:title>"(\w.+?)<\/image:title>' 
with open('file.txt','r') as f: #instead of file you can directly pass the url content via bs4 parser 
    match=re.finditer(pattern,f.read()) 
    for i in match: 
     print(i.group(1))

出力：辞書で

Cassady" Sunglasses Indigo/Gunmetal

出典

2017-12-17 07:14:25

bs4 Selectors Not Take taken "image：title"

答えて

関連する問題