どのように私は非印刷のUnicode文字をPythonで認識できますか？

ランダムな文字を使ってUnicode文字列を生成しようとしています。私は文字列中に印刷可能でない文字を持つことを望んでいません。 'unichr（codepoint）'関数を使用しています。私はコードポイントをUnicodeに変換し、 'unicode.encode（' utf-8 '）'を使用しています。 string.printableを使用しようとしましたが、それはASCIIのみをカバーします。どのように私は非印刷のUnicode文字をPythonで認識できますか？

出典

2017-01-20 DEEPAK WANI

例外処理を使用する方法について[MCVE] – MYGz

ジャストアイデアが、何を提供？それは悪い習慣とみなされるだろうか？ – spicypumpkin

印刷できない文字は何と考えていますか？どのフォントですか？上記コード印刷用 –

unicodedataライブラリを使用できます。

import unicodedata 

def strip_string(self, string): 
    """Cleans a string based on a whitelist of printable unicode categories 
    You can find a full list of categories here: 
    http://www.fileformat.info/info/unicode/category/index.htm 
    """ 
    letters  = ('LC', 'Ll', 'Lm', 'Lo', 'Lt', 'Lu') 
    numbers  = ('Nd', 'Nl', 'No') 
    marks  = ('Mc', 'Me', 'Mn') 
    punctuation = ('Pc', 'Pd', 'Pe', 'Pf', 'Pi', 'Po', 'Ps') 
    symbol  = ('Sc', 'Sk', 'Sm', 'So') 
    space  = ('Zs',) 

    allowed_categories = letters + numbers + marks + punctuation + symbol + space 

    return u''.join([ c for c in string if unicodedata.category(c) in allowed_categories ])

出典：https://gist.github.com/Jonty/6705090

出典

2017-01-20 10:27:35

'؁'（ 'U + 0601'）を「印刷可能」と考えてみませんか？ –

潜在的に印刷可能であっても、依然としてフォントサポートの問題があります。ほとんどのフォントは、すべての印刷可能なUnicode文字をサポートしていません。 –

@一二三ユーザーが印刷可能と考えるものはまあまあです。 U + 0601を印刷可能と見なすには、リストに 'Cf 'を追加します。 –

どのように私は非印刷のUnicode文字をPythonで認識できますか？

答えて

関連する問題