BS4で最も読まれたタイトルを抽出する

ニュースページの[Most Read]セクションでタイトルを抽出します。これはこれまで私が持っていたものですが、私はすべてのタイトルを手に入れています。私はちょうど最も読み込みセクションにそれらをしたい。BS4で最も読まれたタイトルを抽出する

import requests 
from bs4 import BeautifulSoup 

base_url = 'https://www.michigandaily.com/section/opinion' 
r = requests.get(base_url) 
soup = BeautifulSoup(r.text, "html5lib") 

for story_heading in soup.find_all(class_= "views-field views-field-title"): 
    if story_heading.a: 
     print(story_heading.a.text.replace("\n", " ").strip()) 
    else: 
     print(story_heading.contents[0].strip())`

出典

2016-04-14 John

あなたは、ほとんど読んで記事のためだけのdivコンテナにあなたの範囲を限定する必要があります。

import requests 
from bs4 import BeautifulSoup 

base_url = 'https://www.michigandaily.com/section/opinion' 
r = requests.get(base_url) 
soup = BeautifulSoup(r.text, "html5lib") 

most_read_soup = soup.find_all('div', {'class': 'view-id-most_read'})[0] 

for story_heading in most_read_soup.find_all(class_= "views-field views-field-title"): 
    if story_heading.a: 
     print(story_heading.a.text.replace("\n", " ").strip()) 
    else: 
     print(story_heading.contents[0].strip())

出典

2016-04-14 18:01:18

あなたは最上位の読みのdivから特定のタグを取得するためにCSSセレクタを使用することができます。

from bs4 import BeautifulSoup 

base_url = 'https://www.michigandaily.com/section/opinion' 
r = requests.get(base_url) 
soup = BeautifulSoup(r.text, "html5lib") 
css = "div.pane-most-read-panel-pane-1 a" 
links = [a.text.strip() for a in soup.select(css)]

あなたを与えるだろうどの：

[u'Michigan in Color: Anotha One', u'Migos trio ends 2016 SpringFest with Hill Auditorium concert', u'Migos dabs their way to a seminal moment for Ann Arbor hip hop', u'Best of Ann Arbor 2016', u'Best of Ann Arbor 2016: Full List']

出典

2016-04-14 19:51:33

BS4で最も読まれたタイトルを抽出する

答えて

関連する問題