sparkのテキストから特定の文字を削除する

My Sparkのデータフレームの列に、奇妙な文字がいくつかあります。私はそれを取り除きたかった。特定の列を選択して.show（）を実行すると、次のように表示される。sparkのテキストから特定の文字を削除する

Dominant technology firm seeks ambitious, assertive, confident, headstrong salesperson to lead our organization into the next era! If you are ready to thrive in a highly competitive environment, this is the job for you. ¥ Superior oral and written communication skills¥ Extensive experience with negotiating and closing sales ¥ Outspoken ¥ Thrives in competitive environment¥ Self-reliant and able to succeed in an independent setting ¥ Manage portfolio of clients ¥ Aggressively close sales to exceed quarterly quotas ¥ Deliver expertise to clients as needed ¥ Lead the company into new markets |

ご覧いただいた文字は¥です。しかし、私はテスト文字列でそれを行う際に

File "/Users/i854319/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream 
    vs = list(itertools.islice(iterator, batch)) 
    File "/Users/i854319/spark/python/pyspark/sql/functions.py", line 1563, in <lambda> 
    func = lambda _, it: map(lambda x: returnType.toInternal(f(*x)), it) 
    File "<ipython-input-32-864efe6f3257>", line 3, in <lambda> 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128)

：

私はそれはしかし、エラーをスローしたデータフレームの[詳細]列

from pyspark.sql.functions import udf 

charReplace=udf(lambda x: x.replace('¥','')) 

train_cleaned=train_triLabel.withColumn('dsescription',charReplace('description')) 
train_cleaned.show(2,truncate=False)

からこれを削除するには、次のコードを書きました文字はreplaceメソッドによって認識されます。

s='hello ¥' 
print s 
s.replace('¥','') 
 
hello ¥ 
Out[37]: 
'hello '

どこが間違っているのですか？リテラル

出典

2017-01-20 Baktaawar

Unicodeを使用：

charReplace = udf(lambda x: x.replace(u'¥',''))

出典

2017-01-21 00:13:55 user7337271

AAH。どのような愚かな間違い。ありがとう、トン！ – Baktaawar

sparkのテキストから特定の文字を削除する

答えて

関連する問題