Regexと一致しない文字の並びを確認する方法

現在、後でコンパイラの一部になる字句スキャナを実装しようとしています。プログラムは正規表現を使用して入力プログラムファイルを照合します。一連の空白以外の文字が正規表現にマッチすると、マッチした入力のセクションがトークンに変換され、他のトークンの残りの部分はパーサーに送られます。正しいトークンが正しく出力されるようにコードが働いていますが、空白以外の文字が見つかった場合にスキャナが例外を呼び出すようにする必要があります（メソッドno_token()と呼ばれます）。与えられた正規表現。これは私の最初の投稿ですので、私の投稿を改善するためのヒントがあれば教えてください。私に知らせてください。質問やコードに関する詳細情報が必要な場合は質問してください。Regexと一致しない文字の並びを確認する方法

def get_token(self): 
    '''Returns the next token and the part of input_string it matched. 
     The returned token is None if there is no next token. 
     The characters up to the end of the token are consumed. 
     Raise an exception by calling no_token() if the input contains 
     extra non-white-space characters that do not match any token.''' 
    self.skip_white_space() 
    # find the longest prefix of input_string that matches a token 
    token, longest = None, '' 
    for (t, r) in Token.token_regexp: 
     match = re.match(r, self.input_string[self.current_char_index:]) 
     if match is None: 
      self.no_token() 
     elif match and match.end() > len(longest): 
      token, longest = t, match.group() 
    self.current_char_index += len(longest) 
    return (token, longest)

あなたは私が

if match is None: 
    self.no_token()

を使用してみましたが、これは例外を生成し、開始時にプログラムを終了し、何もトークンが返されませんが、私はこれをコメントアウト場合は、コードが正常に動作している見ることができるように。明らかに、空白以外の文字が正規表現と一致しない場合や、後の開発段階で問題を引き起こす場合は、このセクションが必要です。skip_white_space()は、空白以外のすべての空白を消費しますcharacter、正規表現はtoken_regexpに格納され、self.input_string[self.current_char_index:])は現在のcharを与えます。 .txtファイルとしてプログラムの

：

z := 2; 
if z < 3 then 
    z := 1 
end

出力をno_tokenの呼び出しなしでは、次のとおりです。正しいが

ID z 

BEC 

NUM 2 

SEM 

IF 

ID z 

LESS 

NUM 3 

THEN 

ID z 

BEC 

NUM 1 

END

私はno_token（）の呼び出しを実装しようとすると、私が取得：

lexical error: no token found at the start of z := 2; 
if z < 3 then 
    z := 1 
end

SERはno_token()方法の出力があれば何であります正規表現と一致しない文字がスキャナに実装されていますが、これはこの入力には当てはまりません。ここの文字列はすべて有効です。

出典

2016-05-10 saleem

最終作業コードだっ乾杯は、[否定先読みアサーション]を使用することができます（https://docs.python.org/3/library/re.html#regular -expression-syntax）を使用する必要があります。長い説明とかなり無関係なコードよりも答えを得るのに、[最小、検証可能、完全な例]（/ help/mvce）が本当に役立つでしょう。 – Kupiakos

速い返答をありがとう。私はhttps://docs.python.org/2/library/re.htmlを読んで、これが私を助けることができるかどうかを見ていきます。あなたが検証可能で、最小限の完全な答えを言ったら、入力と期待される出力の例を意味しますか？プログラム全体でno_token（）を呼び出すのは唯一の呼び出しであり、エラーが発生した理由はコードが無関係であることがわかりません – saleem

入力と予想される出力の例はほとんどありません。 – Kupiakos

すべてソートされています。

def get_token(self): 
    '''Returns the next token and the part of input_string it matched. 
     The returned token is None if there is no next token. 
     The characters up to the end of the token are consumed. 
     Raise an exception by calling no_token() if the input contains 
     extra non-white-space characters that do not match any token.''' 
    self.skip_white_space() 
    # find the longest prefix of input_string that matches a token 
    token, longest = None, '' 
    for (t, r) in Token.token_regexp: 
     match = re.match(r, self.input_string[self.current_char_index:]) 
     if match and match.end() > len(longest): 
      token, longest = t, match.group() 

    self.current_char_index += len(longest) 
    if token == None and self.current_char_index < len(self.input_string): 
     self.no_token() 
    return (token, longest)

は、あなたの質問に答えるために

出典

2016-05-11 00:32:46 saleem

Regexと一致しない文字の並びを確認する方法

答えて

関連する問題