Pythonのテキストファイルの複数の行から特定の2つの数値を抽出する方法

2本のGPSアンテナから緯度を測定する非常に大きなテキストファイルがあります。ファイルにはガベージデータがたくさんあり、そこから緯度の測定値を抽出する必要があります。これらは、他のテキストの他の行の間に、時折発生します。それらが発生した行は次のようになります。Pythonのテキストファイルの複数の行から特定の2つの数値を抽出する方法

12:34:56.789 78:90:12.123123123 BLAH_BLAH blahblah :  LAT #1 MEAS=-80[deg], LAT #2 MEAS=-110[deg] blah_BlHaBKBjFkjsa.c

私は必要な数字は「LAT #1 MEAS=-80[deg]」と「LAT #2 MEAS=-110[deg]」とのものです。したがって、基本的に-80と-110です。

残りのテキストは重要ではありません。

08:59:07.603 08:59:05.798816 PAL_PARR_INTF TraceModule GET [email protected] :82 drv_Shm.c (../../../PALCommon/Platform_EV/HAL/Common/driver/Shm/src) 525 
08:59:07.603 08:59:05.798816 PAL_PARR_INTF TraceModule xdma is not running drv_Shm.c (../../../PALCommon/Platform_EV/HAL/Common/driver/Shm/src) 316 
08:59:07.603 08:59:05.798847 PAL_PARR_INTF TraceModule DMA is activated drv_Shm.c (../../../PALCommon/Platform_EV/HAL/Common/driver/Shm/src) 461 
08:59:10.847 08:59:09.588001 UHAL_SRCH TraceFlow :  LAT #1 MEAS=-80[deg], LAT #2 MEAS=-110[deg] uhal_CHmcpPschMultiPath.c (../../../HEDGE/UL1/UHAL_3XX/Searcher/Code/Src) 1596  
08:59:11.440 08:59:10.876819 UHAL_COMMON TraceWarning cellRtgSlot=0 cellRtgChip=1500 CELLK_ACTIVE=1 boundary RSN 232482 current RSN 232482 boundarySFN 508 currentSFN 508 uhal_Hmcp.c (../../../HEDGE/UL1/UHAL_3XX/platform/Code/Src) 2224  
08:59:11.440 08:59:10.877277 UHAL_SRCH TraceWarning uhal_HmcpSearcherS1LISR: status_reg(0xf0100000) uhal_CHmcpPschMultiPath.c (../../../HEDGE/UL1/UHAL_3XX/Searcher/Code/Src) 1497  
08:59:11.440 08:59:10.877307 UHAL_COMMON TraceWarning uhal_HmcpSearcherSCDLISR is called. uhal_CHmcpPschMultiPath.c (../../../HEDGE/UL1/UHAL_3XX/Searcher/Code/Src) 1512  
08:59:11.440 08:59:10.877338 UHAL_SRCH TraceFlow :  LAT #1 MEAS=-78[deg], LAT #2 MEAS=-110[deg] uhal_CHmcpPschMultiPath.c (../../../HEDGE/UL1/UHAL_3XX/Searcher/Code/Src) 1596

今、私はファイルを開いて、これらの値を取得するためのコードを使用していますが、それは動作しません。ここで

は、入力ファイルからのサンプルテキストです。私はプログラミングに慣れていないので、どこに間違っているのか分かりません。

import re                  

    # Importing 're' for using regular expressions 

file_dir=raw_input('Enter the complete Directory of the file (eg c:\\abc.txt):') # Providing the user with a choice to open their file in .txt format 
with open(file_dir, 'r') as f: 
    lat_lines= f.read()               # storing the data in a variable 

# Declaring the two lists to hold the numbers 
raw_lat1 = [] 
raw_lat2 = [] 

start_1 = 'LAT #1 MEAS=' 
end_1 = '[de' 

start_2 = 'LAT #2 MEAS=' 
end_2 = '[de' 

x = re.findall(r'start_1(.*?)end_1',lat_lines,re.DOTALL) 
raw_lat1.append(x) 

y = re.findall(r'start_2(.*?)end_2',lat_lines,re.DOTALL) 
raw_lat2.append(y)

出典

2016-12-20 uddin M

これはそれを行う必要があります（それは正規表現を使用していないが、それはまだ動作します）

answer = [] 
with open('file.txt') as infile: 
    for line in infile: 
     if "LAT #1 MEAS=" not in line: continue 
     if "LAT #2 MEAS=" not in line: continue 
     splits = line.split('=') 
     temp = [0,0] 
     for i,part in enumerate(splits): 
      if part.endswith("LAT #1 MEAS"): temp[0] = int(splits[i+1].split(None,1)[0].split('[',1)[0]) 
      elif part.endswith("LAT #2 MEAS"): temp[1] = int(splits[i+1].split(None,1)[0].split('[',1)[0]) 
     answer.append(temp)

出典

2016-12-20 00:34:18 inspectorG4dget

お返事ありがとうございます。私はそれを試しましたが、それは私に空のリストを与えます。「回答」リストを印刷すると、[]、[]、[]、[]、[]、[]、[]、[] –

@uddinM：元の投稿を編集して、入力ファイルのサンプルを含めて、正しくテストできるようにしてください – inspectorG4dget

質問にサンプルを追加しました。 –

私はここから見ることができる正規表現を持つ夫婦の問題があります。 re.findallコールでは、start_1とend_2を変数と同じように使用していますが、正規表現では実際には生の文字"start_1"と"end_1"などとして扱います。正規表現の文字列で変数を使用するには、代わりに書式文字列を使用する必要があります。例：あなたは.*end_1を使用する場合

r'%s(.*?)%s' % (start_1, end_1)

また、これは任意の文字に一致しますので、ライン上のend_1の最終出現するまですべての文字と一致します。 LAT #1とLAT #2は同じ方法で終わるので、それ以外の文字列があれば、実際には-80 [deg]、LAT＃2 MEAS = -110 [de "

と一致します大括弧を正規表現で使用する場合は、それらをエスケープする必要があります。リテラルブラケットは、正規表現内の文字セットを指定するために使用されます。

ここでは、変数lineにサンプル文字列"12:34:56.789 78:90:12.123123123 BLAH_BLAH blahblah : LAT #1 MEAS=-80[deg], LAT #2 MEAS=-110[deg] blah_BlHaBKBjFkjsa.c"が含まれていると仮定した例です。このスニペットをファイル全体に合わせて調整する必要があるかもしれません。

prefix = r'LAT %s MEAS=(-?\d+)\[deg\]' # includes format string for the variable part of the expression. 
p1 = r'#1' 
p2 = r'#2 
x = re.findall(prefix % p1, line) 
y = re.findall(prefix % p2, line)

出典

2016-12-20 00:56:39 xgord

Pythonのテキストファイルの複数の行から特定の2つの数値を抽出する方法

答えて

関連する問題