ruby regexで検索して置き換えます

私はHTMLを含むMySQLの列にテキストのBLOBフィールドを持っています。私はマークアップのいくつかを変更する必要があるので、私はルビースクリプトでそれをやると思った。ここではRubyは無関係ですが、その答えを見ることはうれしいでしょう。マークアップは次のようになります。ruby regexで検索して置き換えます

<h5>foo</h5> 
    <table> 
    <tbody> 
    </tbody> 
    </table> 

<h5>bar</h5> 
    <table> 
    <tbody> 
    </tbody> 
    </table> 

<h5>meow</h5> 
    <table> 
    <tbody> 
    </tbody> 
    </table>

私は一人で、文字列の残りの部分を残しながら<h2>something_else</h2>に各テキストのちょうど最初の<h5>foo</h5>のブロックを変更する必要があります。

Rubyを使用して、適切なPCRE正規表現を取得できないようです。

出典

2011-01-16 randombits

htmlにregexを使用する代わりに、HTMLパーサーを使用することを検討してください。それは[多くの]（http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags）、[多くの]（http：// stackoverflow。 com/questions/590747/use-regular-expression-to-parse-html-why-not）、[many]（http://stackoverflow.com/questions/6751105/why-its-not-possible-to-use -regex-to-parse-html-xml-a-formal-explanation-in-la？lq = 1）回前に、RegexパーサーはHTMLを正確に解析することができません。 –

具体的には、[Nokogiri]（http://nokogiri.org）を使用してHTMLを読み込んで操作し、その結果を出力することをお勧めします。 – Phrogz

# The regex literal syntax using %r{...} allows/in your regex without escaping 
new_str = my_str.sub(%r{<h5>[^<]+</h5>}, '<h2>something_else</h2>')

String#sub代わりString#gsubの原因が発生する最初の交換を使用。あなたが動的に「fooは」何であるかを選択する必要がある場合は、正規表現リテラルの文字列補間を使用することができます。

new_str = my_str.sub(%r{<h5>#{searchstr}</h5>}, "<h2>#{replacestr}</h2>")

あなたは「fooは」何であるかを知っていればその後、再び、あなたは正規表現を必要としない。

new_str = my_str.sub("<h5>searchstr</h5>", "<h2>#{replacestr}</h2>")

かさえ：

：あなたは、交換を把握するために、コードを実行する必要がある場合

my_str[ "<h5>searchstr</h5>" ] = "<h2>#{replacestr}</h2>"

、あなたはサブのブロック形式を使用することができます正規表現<h5>[^<]+<\/h5>と

new_str = my_str.sub %r{<h5>([^<]+)</h5>} do |full_match| 
    # The expression returned from this block will be used as the replacement string 
    # $1 will be the matched content between the h5 tags. 
    "<h2>#{replacestr}</h2>" 
end

出典

2011-01-16 01:51:17 Phrogz

優秀な回答、ありがとうございます – Alp

助けてくださいたくさん..ありがとう – Jaydipsinh

使用String.gsub：

>> current = "<h5>foo</h5>\n <table>\n <tbody>\n </tbody>\n </table>" 
>> updated = current.gsub(/<h5>[^<]+<\/h5>/){"<h2>something_else</h2>"} 
=> "<h2>something_else</h2>\n <table>\n <tbody>\n </tbody>\n </table>"

注、あなたが快適にin your browserをルビー正規表現をテストすることができます。

出典

2011-01-16 01:54:56 miku

リンクが壊れています。 –

HTMLまたはXMLを解析または変更する必要があるときはいつでも、パーサーに届きます。私は正規表現やinstringを気にすることはほとんどありません。ここで

は任意の正規表現せずに、鋸山を使用してそれを行う方法は次のとおりです。

# >> <h5>foo</h5> 
# >> <table><tbody></tbody></table><h5>bar</h5> 
# >> <table><tbody></tbody></table><h5>meow</h5> 
# >> <table><tbody></tbody></table>

これは実行した後、次のとおりです：

text = <<EOT 
<h5>foo</h5> 
    <table> 
    <tbody> 
    </tbody> 
    </table> 

<h5>bar</h5> 
    <table> 
    <tbody> 
    </tbody> 
    </table> 

<h5>meow</h5> 
    <table> 
    <tbody> 
    </tbody> 
    </table> 
EOT 

require 'nokogiri' 

fragment = Nokogiri::HTML::DocumentFragment.parse(text) 
print fragment.to_html 

fragment.css('h5').select{ |n| n.text == 'foo' }.each do |n| 
    n.name = 'h2' 
    n.content = 'something_else' 
end 

print fragment.to_html

解析した後、これは鋸山がフラグメントから戻ってきたものです

# >> <h2>something_else</h2> 
# >> <table><tbody></tbody></table><h5>bar</h5> 
# >> <table><tbody></tbody></table><h5>meow</h5> 
# >> <table><tbody></tbody></table>

出典

2011-01-16 02:12:08

ruby​​ regexで検索して置き換えます

答えて

関連する問題

ruby regexで検索して置き換えます