Regex Pythonを使用して

私は、特定のURLからダウンロードされたパターンから特定の値を取得しようとしていますが、成功しません。パターンの一部は次のとおりです。私はタイトルおよびファイルから特定の値を持つ第一<td>McCartney</td>をキャッチすると、JSONファイルとしてそれをプリントアウトしたいRegex Pythonを使用して

"<a href="/wiki/All_My_Loving" title="All My Loving">All My Loving</a>"</td>\n<td style="text-align:center;">1963</td>\n<td><i>UK: <a href="/wiki/With_the_Beatles" title="With the Beatles">With the Beatles</a><br />\nUS: <a href="/wiki/Meet_The_Beatles!" class="mw-redirect" title="Meet The Beatles!">Meet The Beatles!</a></i></td>\n<td>McCartney</td>\n<td>McCartney</td>\n<td style="text-align:center;">\xe2\x80\x94</td>\n<td style="text-align:center;"><span style="display:none" class="sortkey">7001450000000000000\xe2\x99\xa0</span>45</td>\n<td></td>\n</tr>\n<tr>\n<td>"<a href="/wiki/All_Things_Must_Pass_(song)" title="All Things Must Pass (song)">All Things Must Pass</a>"</td>\n<td style="text-align:center;">1969</td>\n<td><i><a href="/wiki/Anthology_3" title="Anthology 3">Anthology 3</a></i></td>\n<td>Harrison</td>\n<td>Harrison</td>\n<td style="text-align:center;">\xe2\x80\x94</td>\n<td style="text-align:center;">\xe2\x80\x94</td>\n<td></td>\n</tr>\n<tr>\n<td>"<a href="/wiki/All_Together_Now_(The_Beatles_song)" class="mw-redirect" title="All Together Now (The Beatles song)">All Together Now</a>"</td>\n<td style="text-align:center;">1967</td>\n<td><i><a href="/wiki/Yellow_Submarine_(album)" title="Yellow Submarine (album)">Yellow Submarine</a></i></td>\n<td>McCartney, with Lennon</td>\n<td>McCartney, with Lennon</td>\n<td style="text-align:center;">\xe2\x80\x94</td>\n<td style="text-align:center;">\xe2\x80\x94</td>\n<td></td>\n</tr>\n<tr>\n<td>"

。

FORループを正規表現で実行できますか？どのように私はそれをPythonを使用して行うことができますか？

おかげで、

出典

2016-12-26 yahav10

あなたはHTMLが（例えばBeautifulSoupなど）HTMLパーサを使用し解析したい場合は正規表現ではありません。

from bs4 import BeautifulSoup 

html = '''<a href="/wiki/All_My_Loving" title="All My Loving">All My Loving</a>"</td>\n<td style="text-align:center;">1963</td>\n<td><i>UK: <a href="/wiki/With_the_Beatles" title="With the Beatles">With the Beatles</a><br />\nUS: <a href="/wiki/Meet_The_Beatles!" class="mw-redirect" title="Meet The Beatles!">Meet The Beatles!</a></i></td>\n<td>McCartney</td>\n<td>McCartney</td>\n<td style="text-align:center;">\xe2\x80\x94</td>\n<td style="text-align:center;"><span style="display:none" class="sortkey">7001450000000000000\xe2\x99\xa0</span>45</td>\n<td></td>\n</tr>\n<tr>\n<td>"<a href="/wiki/All_Things_Must_Pass_(song)" title="All Things Must Pass (song)">All Things Must Pass</a>"</td>\n<td style="text-align:center;">1969</td>\n<td><i><a href="/wiki/Anthology_3" title="Anthology 3">Anthology 3</a></i></td>\n<td>Harrison</td>\n<td>Harrison</td>\n<td style="text-align:center;">\xe2\x80\x94</td>\n<td style="text-align:center;">\xe2\x80\x94</td>\n<td></td>\n</tr>\n<tr>\n<td>"<a href="/wiki/All_Together_Now_(The_Beatles_song)" class="mw-redirect" title="All Together Now (The Beatles song)">All Together Now</a>"</td>\n<td style="text-align:center;">1967</td>\n<td><i><a href="/wiki/Yellow_Submarine_(album)" title="Yellow Submarine (album)">Yellow Submarine</a></i></td>\n<td>McCartney, with Lennon</td>\n<td>McCartney, with Lennon</td>\n<td style="text-align:center;">\xe2\x80\x94</td>\n<td style="text-align:center;">\xe2\x80\x94</td>\n<td></td>\n</tr>\n<tr>\n<td> 
''' 

soup = BeautifulSoup(html, 'html.parser') 
a = soup.find('a') # will only find the first <a> tag 
print(a.attrs['title']) 

tds = soup.find_all('td') # will find all <td> tags 
for td in tds: 
    if 'McCartney' in td.text: 
     print(td) 

# All My Loving 
# <td>McCartney</td> 
# <td>McCartney</td> 
# <td>McCartney, with Lennon</td> 
# <td>McCartney, with Lennon</td>

出典

2016-12-26 08:22:20 DeepSpace

こんにちは、私のコードであなたの答えを使用しましたが、私はそれを続ける方法を理解できません。 – yahav10

こんにちは、私のコードであなたの答えを使用しましたが、私はそれを続ける方法を知らない。私が書いたコードは次のとおりです。BS4のインポートBeautifulSoup URL = "https://en.wikipedia.org/wiki/List_of_songs_recorded_by_the_Beatles" 要求= urllib.request.Request（URL）応答= urllibはから urllib.request インポート.request.urlopen（リクエスト） HTML = response.read（）スープ= BeautifulSoup（HTML、 'html.parser'） = soup.find（ 'A'）位のみ（最初タグプリントを見つけますa.attrs [ 'タイトル']） TDS = soup.find_all（ 'TD'）＃がTDSでTDのためにすべてのタグを見つける： 'マッカートニー' と 'レノン' td.textであれば：プリント（TD ） – yahav10

1.どのような意味で続けますか？このコードは、あなたが求めたすべてのデータをかなり集めています。 2. Wikipediaには、ページ自体を解析するのではなく、使うべきAPIがあります。 https://www.mediawiki.org/wiki/API:Main_page – DeepSpace

Regex Pythonを使用して

答えて

関連する問題