html.parser
であなたのURLの詩を取得しようとすると、あなたと同じ問題が発生しました.htmlはin the poem
の位置で切り捨てられました。
import requests
from bs4 import BeautifulSoup
poem_page = requests.get("https://www.poetryfoundation.org/poems-and-poets/poems/detail/57956")
poem_soup = BeautifulSoup(poem_page.text, "html.parser")
poem_div = poem_soup.find('div', class_='poem')
print poem_div
OUTPUT:
<div class="poem" data-view="ContentView">
<div style="text-indent: -1em; padding-left: 1em;">It seems a certain fear underlies everything. <br/></div><div style="text-indent: -1em; padding-left: 1em;">If I were to tell you something profound<br/></div><div style="text-indent: -1em; padding-left: 1em;"> it would be useless, as every single thing I know<br/></div><div style="text-indent: -1em; padding-left: 1em;"> is not timeless. I am particularly risk-averse.<br/></div><div style="text-indent: -1em; padding-left: 1em;"><br/></div><div style="text-indent: -1em; padding-left: 1em;">I choose someone else over me every time, <br/></div><div style="text-indent: -1em; padding-left: 1em;">as I'm sure they'll finish the task at hand, <br/></div><div style="text-indent: -1em; padding-left: 1em;">which is to say that whatever is in front of us<br/></div><div style="text-indent: -1em; padding-left: 1em;"> will get done if I'm not in charge of it.<br/></div><div style="text-indent: -1em; padding-left: 1em;"><br/></div><div style="text-indent: -1em; padding-left: 1em;">There is a limit to the number of times <br/></div><div style="text-indent: -1em; padding-left: 1em;">I can practice every single kind of mortification <br/></div><div style="text-indent: -1em; padding-left: 1em;">(of the flesh?). I can turn toward you and say <em>yes, <br/></em></div><div style="text-indent: -1em; padding-left: 1em;">it was you in the poem</div></div>
しかしlxml
にパーサを変更し、すべてがOKです。
import requests
from bs4 import BeautifulSoup
poem_page = requests.get("https://www.poetryfoundation.org/poems-and-poets/poems/detail/57956")
poem_soup = BeautifulSoup(poem_page.text, "lxml")
poem_div = poem_soup.find('div', class_='poem')
# print poem_div
for s in poem_div.find_all('div'):
print list(s.children)[0]
OUTPUT:
It seems a certain fear underlies everything.
If I were to tell you something profound
it would be useless, as every single thing I know
is not timeless. I am particularly risk-averse.
<br/>
I choose someone else over me every time,
as I'm sure they'll finish the task at hand,
which is to say that whatever is in front of us
will get done if I'm not in charge of it.
<br/>
There is a limit to the number of times
I can practice every single kind of mortification
(of the flesh?). I can turn toward you and say
it was you in the poem. But when we met,
<br/>
you were actually wearing a shirt, and the poem
wasn't about you or your indecipherable tattoo.
The poem is always about me, but that one time
I was in love with the memory of my twenties
<br/>
so I was, for a moment, in love with you
because you remind me of an approaching
subway brushing hair off my face with
its hot breath. Darkness. And then light,
<br/>
the exact goldness of dawn fingering
that brick wall out my bedroom window
on Smith Street mornings when I'd wake
next to godknowswho but always someone
<br/>
who wasn't a mistake, because what kind
of mistakes are that twitchy and joyful
even if they're woven with a particular
thread of regret: the guy who used
<br/>
my toothbrush without asking,
I walked to the end of a pier with him,
would have walked off anywhere with him
until one day we both landed in California
<br/>
when I was still young, and going West
meant taking a laptop and some clothes
in a hatchback and learning about produce.
I can turn toward you, whoever you are,
<br/>
and say you are my lover simply because
I say you are, and that is, I realize,
a tautology, but this is my poem. I claim
nothing other than what I write, and even that,
<br/>
I'd leave by the wayside, since the only thing
to pack would be the candlesticks, and
even those are burned through, thoroughly
replaceable. Who am I kidding? I don't
<br/>
own anything worth packing into anything.
We are cardboard boxes, you and I, stacked
nowhere near each other and humming
different tunes. It is too late to be writing this.
<br/>
I am writing this to tell you something less
than neutral, which is to say I'm sorry.
It was never you. It was always you:
your unutterable name, this growl in my throat.
<br/>
それはその奇妙 '
どのバージョンのlibxmlなどをインストールしましたか? –
@Oregano、あなたがlxmlを試してもうまくいかなかったと答えたとき、あなたの受け入れられた答えはどうやって動くのですか? –