2016-08-13 4 views
0

次のspan要素からテキストをテキストセクションに分割せずに取得する必要があります。xpathまたはcssクエリを使用してspanからテキストを取得

// *私のXPathクエリを適用するのが

<span class="a-size-base review-text">I purchased this from Fry's Electronics. 
 
<br/> 
 
<br/> 
 
The picture is quite good after tweaking the settings. An HDMI feed from my PC results in very clear text with no distortion. Be sure to turn down the sharpness to avoid artifacts around text. I think this screen may offer 4:4:4 chroma subsampling based on the attached test image. I'm very pleased with the viewing angles and the screen is definitely usable for more than just straight ahead viewing. 
 
<br/> 
 
<br/> 
 
I wasn't planning on using the Smart features, but the Netflix app works really well and is responsive enough to not become annoyed. The wifi streaming playback is very smooth, but navigating the folder structure is horribly slow. The interface insists on creating thumbnails for each movie file, which takes forever if you have a directory with many files. I would much rather just see a detailed list without thumbnails. When you finally do find your desired movie the playback is very good. If you keep the directory contents small (~10 items or fewer) you may not have any problems. 
 
<br/> 
 
<br/> 
 
The unit is very thin and light and setup was a breeze. You just have to put in 4 screws to attach the base and then you're ready to go. The power adapter comes with a "brick" style converter. The remote is well laid out and the menus are easy to navigate without feeling cumbersome. 
 
<br/> 
 
<br/> 
 
The stand is 8" deep x 22.25" wide. The TV stands 26.5" from table top to the top of the bezel with stand attached. The TV is 42.75" wide from outside bezel edge to outside bezel edge. 
 
<br/> 
 
<br/> 
 
Overall I'm very pleased with what this offers in the $400-500 range. (I actually paid $398 but that was after some customer service adjustments at Fry's). 
 
<br/> 
 
<br/> 
 
NOTE: If you see any strange distortion in the images it's likely a result of the camera, image compression, and resizing. Some of the strange patterns seen in the images are not present when viewing in person. 
 
</span>

[(CONCAT( ""、@class、 "")、CONCAT( ""、「レビューが含まれています-text」、 ""))] /テキスト()

私はこれを取得:

Text='I purchased this from Fry's Electronics.' 
 
Text='' 
 
Text='The picture is quite good after tweaking the settings. An HDMI feed from my PC results in very clear text with no distortion. Be sure to turn down the sharpness to avoid artifacts around text. I think this screen may offer 4:4:4 chroma subsampling based on the attached test image. I'm very pleased with the viewing angles and the screen is definitely usable for more than just straight ahead viewing.' 
 
Text='' 
 
Text='I wasn't planning on using the Smart features, but the Netflix app works really well and is responsive enough to not become annoyed. The wifi streaming playback is very smooth, but navigating the folder structure is horribly slow. The interface insists on creating thumbnails for each movie file, which takes forever if you have a directory with many files. I would much rather just see a detailed list without thumbnails. When you finally do find your desired movie the playback is very good. If you keep the directory contents small (~10 items or fewer) you may not have any problems.' 
 
Text='' 
 
Text='The unit is very thin and light and setup was a breeze. You just have to put in 4 screws to attach the base and then you're ready to go. The power adapter comes with a "brick" style converter. The remote is well laid out and the menus are easy to navigate without feeling cumbersome.' 
 
Text='' 
 
Text='The stand is 8" deep x 22.25" wide. The TV stands 26.5" from table top to the top of the bezel with stand attached. The TV is 42.75" wide from outside bezel edge to outside bezel edge.' 
 
Text='' 
 
Text='Overall I'm very pleased with what this offers in the $400-500 range. (I actually paid $398 but that was after some customer service adjustments at Fry's).' 
 
Text='' 
 
Text='NOTE: If you see any strange distortion in the images it's likely a result of the camera, image compression, and resizing. Some of the strange patterns seen in the images are not present when viewing in person.'

私は破損せずにテキストの一つのブロックを取得したいと思います。私は、このXPathのテスターを使用しています http://www.freeformatter.com/xpath-tester.html

答えて

0

scrapyセレクタの便利な機能は、セレクタ連鎖です。ここで

は例です1.1シェルセッションをscrapy:

~$ scrapy shell 
2016-08-16 12:20:57 [scrapy] INFO: Scrapy 1.1.1 started (bot: scrapybot) 
2016-08-16 12:20:57 [scrapy] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0, 'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter'} 
(...) 
In [1]: html = '''<span class="a-size-base review-text">I purchased this from Fry's Electronics. 
    ...: <br/> 
    ...: <br/> 
    ...: The picture is quite good after tweaking the settings. An HDMI feed from my PC results in very clear text with no distortion. Be sure to turn down the sharpness to avoid artifacts around text. I think this screen may offer 4:4:4 chroma subsampling based on the attached test image. I'm very pleased with the viewing angles and the screen is definitely usable for more than just straight ahead viewing. 
    ...: <br/> 
    ...: <br/> 
    ...: I wasn't planning on using the Smart features, but the Netflix app works really well and is responsive enough to not become annoyed. The wifi streaming playback is very smooth, but navigating the folder structure is horribly slow. The interface insists on creating thumbnails for each movie file, which takes forever if you have a directory with many files. I would much rather just see a detailed list without thumbnails. When you finally do find your desired movie the playback is very good. If you keep the directory contents small (~10 items or fewer) you may not have any problems. 
    ...: <br/> 
    ...: <br/> 
    ...: The unit is very thin and light and setup was a breeze. You just have to put in 4 screws to attach the base and then you're ready to go. The power adapter comes with a "brick" style converter. The remote is well laid out and the menus are easy to navigate without feeling cumbersome. 
    ...: <br/> 
    ...: <br/> 
    ...: The stand is 8" deep x 22.25" wide. The TV stands 26.5" from table top to the top of the bezel with stand attached. The TV is 42.75" wide from outside bezel edge to outside bezel edge. 
    ...: <br/> 
    ...: <br/> 
    ...: Overall I'm very pleased with what this offers in the $400-500 range. (I actually paid $398 but that was after some customer service adjustments at Fry's). 
    ...: <br/> 
    ...: <br/> 
    ...: NOTE: If you see any strange distortion in the images it's likely a result of the camera, image compression, and resizing. Some of the strange patterns seen in the images are not present when viewing in person. 
    ...: </span>''' 

In [2]: import scrapy 

In [3]: selector = scrapy.Selector(text=html) 

In [4]: selector.css('span.review-text').xpath('string()').extract_first() 
Out[4]: 'I purchased this from Fry\'s Electronics.\n\n\nThe picture is quite good after tweaking the settings. An HDMI feed from my PC results in very clear text with no distortion. Be sure to turn down the sharpness to avoid artifacts around text. I think this screen may offer 4:4:4 chroma subsampling based on the attached test image. I\'m very pleased with the viewing angles and the screen is definitely usable for more than just straight ahead viewing.\n\n\nI wasn\'t planning on using the Smart features, but the Netflix app works really well and is responsive enough to not become annoyed. The wifi streaming playback is very smooth, but navigating the folder structure is horribly slow. The interface insists on creating thumbnails for each movie file, which takes forever if you have a directory with many files. I would much rather just see a detailed list without thumbnails. When you finally do find your desired movie the playback is very good. If you keep the directory contents small (~10 items or fewer) you may not have any problems.\n\n\nThe unit is very thin and light and setup was a breeze. You just have to put in 4 screws to attach the base and then you\'re ready to go. The power adapter comes with a "brick" style converter. The remote is well laid out and the menus are easy to navigate without feeling cumbersome.\n\n\nThe stand is 8" deep x 22.25" wide. The TV stands 26.5" from table top to the top of the bezel with stand attached. The TV is 42.75" wide from outside bezel edge to outside bezel edge.\n\n\nOverall I\'m very pleased with what this offers in the $400-500 range. (I actually paid $398 but that was after some customer service adjustments at Fry\'s).\n\n\nNOTE: If you see any strange distortion in the images it\'s likely a result of the camera, image compression, and resizing. Some of the strange patterns seen in the images are not present when viewing in person.\n' 

In [5]: print(selector.css('span.review-text').xpath('string()').extract_first()) 
I purchased this from Fry's Electronics. 


The picture is quite good after tweaking the settings. An HDMI feed from my PC results in very clear text with no distortion. Be sure to turn down the sharpness to avoid artifacts around text. I think this screen may offer 4:4:4 chroma subsampling based on the attached test image. I'm very pleased with the viewing angles and the screen is definitely usable for more than just straight ahead viewing. 


I wasn't planning on using the Smart features, but the Netflix app works really well and is responsive enough to not become annoyed. The wifi streaming playback is very smooth, but navigating the folder structure is horribly slow. The interface insists on creating thumbnails for each movie file, which takes forever if you have a directory with many files. I would much rather just see a detailed list without thumbnails. When you finally do find your desired movie the playback is very good. If you keep the directory contents small (~10 items or fewer) you may not have any problems. 


The unit is very thin and light and setup was a breeze. You just have to put in 4 screws to attach the base and then you're ready to go. The power adapter comes with a "brick" style converter. The remote is well laid out and the menus are easy to navigate without feeling cumbersome. 


The stand is 8" deep x 22.25" wide. The TV stands 26.5" from table top to the top of the bezel with stand attached. The TV is 42.75" wide from outside bezel edge to outside bezel edge. 


Overall I'm very pleased with what this offers in the $400-500 range. (I actually paid $398 but that was after some customer service adjustments at Fry's). 


NOTE: If you see any strange distortion in the images it's likely a result of the camera, image compression, and resizing. Some of the strange patterns seen in the images are not present when viewing in person. 


In [6]: print(selector.css('span.review-text').xpath('normalize-space()').extract_first()) 
I purchased this from Fry's Electronics. The picture is quite good after tweaking the settings. An HDMI feed from my PC results in very clear text with no distortion. Be sure to turn down the sharpness to avoid artifacts around text. I think this screen may offer 4:4:4 chroma subsampling based on the attached test image. I'm very pleased with the viewing angles and the screen is definitely usable for more than just straight ahead viewing. I wasn't planning on using the Smart features, but the Netflix app works really well and is responsive enough to not become annoyed. The wifi streaming playback is very smooth, but navigating the folder structure is horribly slow. The interface insists on creating thumbnails for each movie file, which takes forever if you have a directory with many files. I would much rather just see a detailed list without thumbnails. When you finally do find your desired movie the playback is very good. If you keep the directory contents small (~10 items or fewer) you may not have any problems. The unit is very thin and light and setup was a breeze. You just have to put in 4 screws to attach the base and then you're ready to go. The power adapter comes with a "brick" style converter. The remote is well laid out and the menus are easy to navigate without feeling cumbersome. The stand is 8" deep x 22.25" wide. The TV stands 26.5" from table top to the top of the bezel with stand attached. The TV is 42.75" wide from outside bezel edge to outside bezel edge. Overall I'm very pleased with what this offers in the $400-500 range. (I actually paid $398 but that was after some customer service adjustments at Fry's). NOTE: If you see any strange distortion in the images it's likely a result of the camera, image compression, and resizing. Some of the strange patterns seen in the images are not present when viewing in person. 
+0

ありがとう@paul trmbrth。素晴らしい解決策! – Brayoni

0

stringに全体<span>要素を変換します。最初の<span>要素については、この唯一の作品は条件に一致することを

string(
    //*[contains(concat(" ", @class, " a-size-base review-text"), concat(" ", "review-text", " "))] 
) 

お知らせ。 XPath 2.0のでは、あなたは<span>要素の任意の数で動作しますstring-join()た使用することができます。

string-join( 
    //*[contains(concat(" ", @class, " a-size-base review-text"), concat(" ", "review-text", " "))]/text(), 
    "" 
) 
+0

私はので、私は '文字列join'を使用することはできません_xpath 1.0_をサポートしていること** lxmlの**を使用しています。要素全体を 'string'に変換するとします。 _xpath_クエリはリストの代わりに1つの文字列を返すようです。 – Brayoni

+0

以下は、scrapyシェルのリストを返します。 'response.xpath( '// * [concat(" "、@class、")、concat( ""、 "review-text"、 ""))])。 – Brayoni

0

私はPythonの正規表現を使用してhtmlタグを削除するプロセスを投稿していました。

re.sub(r'<span class="a-size-base review-text">|<br>|</span>', "", text) 

私は@ har07の提案を試みましたが、

  • scrapyしかサポートしていlxmlのを使用していますXPath 1.0のので、私は私がしようとしたとき、私は私のXPathクエリからセレクタのリストを取得できませんでした2.0
  • のXPathで提供されていますstring-joinを活用することができませんでした。あなたはCSSの選択を開始し、その後、このようなstring()normalize-space()としてXPath文字列の方法を、適用することができるよう
関連する問題