2016-09-14 6 views
0

私は、複数の文字列と列を一致させたいブタコードを書いています。例えば。エラー2998:未処理の内部エラーです。 null - Apache Pig

A = FOREACH A1 GENERATE 
    c1, c2, c3, 

--i have substituted junk values-- 

case 
when ( (
     column_name matches '.*abc.*' 
    OR column_name matches '.*sdf.*' 
    OR column_name matches '.*bcd.*' 
    OR column_name MATCHES '.*def.*' 
    OR column_name MATCHES '.*efg.*' 
    OR column_name MATCHES '.*ggg.*' 
    OR column_name MATCHES '.*ghi.*' 
    OR column_name MATCHES '.*hij.*' 
    OR column_name MATCHES '.*ijk.*' 
    OR column_name MATCHES '.*jkl.*' 
    OR column_name MATCHES '.*klm.*' 
    OR column_name MATCHES '.*lmn.*' 
    or column_name matches '.*mno.*' 
    or column_name matches '.*mnb.*' 
    or column_name matches '.*opq.*' 
    or column_name matches '.*pqr.*' 
    or column_name matches '.*qrs.*' 
    or column_name matches '.*stuv.*' 
    or column_name matches '.*tuvw.*' 
    or column_name matches '.*wxy.*' 
    or column_name matches '.*tuvwx.*' 
    or column_name matches '.*xyz.*' 
    . 
    . 
    . 
    . 
    . 
    ) then 1 
      else 0 as c4; 

ORたcolumn_nameの数が一致したときに「---」文が672を超えたことが観察されたが、豚のスクリプトがエラーで実行に失敗し :

Pig Stack Trace 
--------------- 
ERROR 2998: Unhandled internal error. null 

java.lang.StackOverflowError 
     at java.util.zip.Deflater.ensureOpen(Deflater.java:543) 
     at java.util.zip.Deflater.deflate(Deflater.java:426) 
     at java.util.zip.Deflater.deflate(Deflater.java:352) 
     at java.util.zip.DeflaterOutputStream.deflate(DeflaterOutputStream.java:251) 
     at java.util.zip.DeflaterOutputStream.write(DeflaterOutputStream.java:211) 
     at java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1876) 
     at java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1840) 
     at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1533) 
     at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) 
     at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) 
     at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) 
     at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) 
     at java.util.ArrayList.writeObject(ArrayList.java:742) 
     at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) 
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
     at java.lang.reflect.Method.invoke(Method.java:606) 
     at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988) 
     at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495) 
     at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) 
     at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) 
     at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) 
     at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) 
     at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) 
     at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) 
     at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) 
     at java.util.ArrayList.writeObject(ArrayList.java:742) 
     at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) 
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
     at java.lang.reflect.Method.invoke(Method.java:606) 
     at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988) 
     at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495) 
     at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) 
     at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) 
     at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) 
     at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) 
     at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) 
     at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) 
     at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) 
     at java.util.ArrayList.writeObject(ArrayList.java:742) 
     at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) 
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 

が親切にお勧めこの要件を満たすための解決策または代替策。

答えて

0

カスタムフィルター機能1を作成することをお勧めします。ここでは、RAM消費量をコントロールします。 RegExを必要とせず、部分文字列検索を行うことは可能です。

+0

UPD:フィルタリングしないで生成するので、評価関数にする必要があります。https://pig.apache.org/docs/r0.16.0/udf.html#eval-functions – patrungel

+0

UDFを実行し、必要な値のセット( 'abc | def | ghi | jkl | mno')のような列の値(部分文字列)を検索します。 これは正しいことですか?@patrungel? – Suyog

+0

UDFを書くのは正しいですが、リスト内の列の値を検索するのではなく、少なくともあなたはパターン_in_値を検索するという印象を受けました。 – patrungel

関連する問題