いくつかのキーワードを含む文字列をPythonを使ってリストに分割します

Ubuntuで/ etc/network/interfacesの設定ファイルを解析しようとしていますので、各文字列が指定されたキーワードで始まる文字列リストに分割する必要があります。マニュアルに従っていくつかのキーワードを含む文字列をPythonを使ってリストに分割します

：

ファイルがゼロ以上 "のiface"、 "マッピング"、 "自動車"、 "allow-" および "ソース" スタンザで構成されています。だから、

ファイルが含まれている場合：私は、リストを取得したいと思い

auto lo eth0 
allow-hotplug eth1 

iface eth0-home inet static 
    address 192.168.1.1 
    netmask 255.255.255.0

：

[ '自動LO eth0を'、 '許可-ホットプラグeth1の'、「ifaceをeth0-ホームINET静的\ nのアドレス... ']

は今、私はこのような機能を持っている：

def get_sections(text): 
    start_indexes = [s.start() for s in re.finditer('auto|iface|source|mapping|allow-', text)] 
    start_indexes.reverse() 
    end_idx = -1 
    res = [] 
    for i in start_indexes: 
     res.append(text[i: end_idx].strip()) 
     end_idx = i 
     res.reverse() 
    return res

しかし、それは素晴らしいではありません...

出典

2012-01-18 marcinpz

また、あなたは明らかに、[confparse]（http://code.google.com/p/confparse/）のようなものを使用することができますネットワークインターフェイスファイルをサポートします。 – Chewie

start_indexesからスライスを直接抽出することで、このコードをかなり簡単にすることができます。 –

あなたは、単一の正規表現でそれを行うことができます。

>>> reobj = re.compile("(?:auto|allow-|iface)(?:(?!(?:auto|allow-|iface)).)*(?<!\s)", re.DOTALL) 
>>> result = reobj.findall(subject) 
>>> result 
['auto lo eth0', 'allow-hotplug eth1', 'iface eth0-home inet static\n address 192.168.1.1\n netmask 255.255.255.0']

説明：もちろん

(?:auto|allow-|iface) # Match one of the search terms 
(?:      # Try to match... 
(?!     # (as long as we're not at the start of 
    (?:auto|allow-|iface) # the next search term): 
)      # 
.      # any character. 
)*      # Do this any number of times. 
(?<!\s)     # Assert that the match doesn't end in whitespace

することができますまた、あなたのコメントで要求されたタプルのリストに結果をマップしてください：

>>> reobj = re.compile("(auto|allow-|iface)\s*((?:(?!(?:auto|allow-|iface)).)*)(?<!\s)", re.DOTALL) 
>>> result = [tuple(match.groups()) for match in reobj.finditer(subject)] 
>>> result 
[('auto', 'lo eth0'), ('allow-', 'hotplug eth1'), ('iface', 'eth0-home inet static\n address 192.168.1.1\n netmask 255.255.255.0')]

出典

2012-01-18 12:37:54

初めての方は少し複雑ですが、私のバージョンよりも短くてはるかに優れています。しかし（グループ、ストリング）のリストを得ることは難しいでしょうか。 ''（auto '、' auto lo eth0 '）、（' iface '、iface eth0 inet static'）、...] '?? – marcinpz

もちろん、問題ありません。私の編集を参照してください。 –

はい。それが私の望むものです。ありがとうございました:) – marcinpz

開始標識を計算したときに、きれいなソリューションに近づいていました。これらを使用すると、必要なスライスを抽出するために、単一の行を追加することができます。

indicies = [s.start() for s in re.finditer(
      'auto|iface|source|mapping|allow-', text)] 
answer = map(text.__getslice__, indicies, indicies[1:] + [len(text)])

出典

2012-01-18 13:22:14

これはいいですが、私のための少しの修正が必要です： 'map（text .__ getslice__、indicies、indicies [1：] + [len（text）]）' – marcinpz

@marcinpz 。私はこれが巨大で毛深い正規表現を作るよりはるかにクリーンであると思う。 –

いくつかのキーワードを含む文字列をPythonを使ってリストに分割します

答えて

関連する問題