。
$ cat r.awk
BEGIN {
re_wrd = "^[A-Za-z]+" # what we consider a word
re_sep = "^." # the rest is a separator
}
function advance() { # sets `tag' and `tok'; eats a part of `line'
if (match(line, re_wrd)) tag = "wrd"
else if (match(line, re_sep)) tag = "sep"
tok = substr(line, 1, RLENGTH)
line = substr(line, RLENGTH + 1 )
}
function process_sep() { # copy to output
ans = ans tok
}
function process_wrd() {
sub(/^word/, "preword", tok) # replace only at the beginning
ans = ans tok
}
{
line = $0; ans = tag = tok = ""
while (length(line) > 0) {
advance()
# uncomment for tracing
# print tag, "<" tok ">" | "cat 1>&2"
if (tag == "sep") process_sep()
else if (tag == "wrd") process_wrd()
}
print ans
}
使用法:
$ echo 'preword...microsoftword word wordword,word.word-preword' | awk -f r.awk
preword...microsoftword preword prewordword,preword.preword-preword
トレース:
wrd <preword>
sep <.>
sep <.>
sep <.>
wrd <microsoftword>
sep < >
wrd <word>
sep < >
wrd <wordword>
sep <,>
wrd <word>
sep <.>
wrd <word>
sep <->
wrd <preword>
は 'foreword'は' forepreword'になるべきでしょうか?要件を明確にし、すべてのユースケースをカバーする簡潔でテスト可能なサンプル入力と予想される出力を表示します。 –