Pythonで<h1></h1>のhrefリンクを抽出するには？

-2

私はPythonで新しく、私はWebスクレイピングを学ぼうとしています。Pythonで<h1></h1>のhrefリンクを抽出するには？

私は、次のコードを持っており、HREFまたはリンクを印刷/取得する方法を知っていただきたいと思います：

< .h1> <た.aのhref = "https://www.nytimes.com/tips ">秘密のニュースヒントを得ましたか？

2017-02-25 Andrew Ong

http://stackoverflow.com/questions/42173719/how-to-use-regular-expression-to-retrieve-data-in-python/42173798#42173798 –

類似した別のhttps：// stackoverflow。 com/questions/3075550/how-can-i-get-href-links-html-using-python – Tudor

あなたはこの仕事を成し遂げるためにBeautifulSoupを使用することができます。

from urllib.request import urlopen 
from bs4 import BeautifulSoup 
import re 

response = urlopen("http://someurl.com") 
page_source = response.read() 
soup = BeautifulSoup(page_source, 'html.parser') 
x = soup.find_all('h1') 
print (x)

次にあなたがしなければならないすべては出力からreモジュールおよび抽出データを使用しています。

出典

2017-02-25 09:27:58

Pythonで<h1></h1>のhrefリンクを抽出するには？

答えて

関連する問題