Picasaのアルバムのタイトルエンコーディング。ユニコードではない？

私は、GoogleのPicasaサービス用のシンプルなクライアントを作成しました。私が望むのは、アルバムのタイトル名のフォルダを作成し、このフォルダにサービスからオリジナルの写真をダウンロードすることです。Picasaのアルバムのタイトルエンコーディング。ユニコードではない？

IOError: [Errno 2] No such file or directory: '\xd0\x9e\xd1\x81\xd0\xb5\xd0\xbd\xd1\x8c\Autumnal-Equinox.jpg'

コードサンプル：私は例外IOErrorを得たタイトル内の任意の非ラテン文字がある場合は

import gdata.photos.service 
import gdata.media 
import os 
import urllib2 

gd_client = gdata.photos.service.PhotosService() 

username = 'cha.com.ua' 
albums = gd_client.GetUserFeed(user=username) 
for album in albums.entry: 
     photos = gd_client.GetFeed(
      '/data/feed/api/user/%s/albumid/%s?kind=photo' % (
       username, album.gphoto_id.text)) 

     for photo in photos.entry: 
      destination = os.path.join(album.title.text, photo.title.text) 
      out = open(destination, 'wb') 
      out.write(urllib2.urlopen(photo.content.src).read()) 
      out.close()

は私が.decode('utf-8')でタイトルを解読しようとした、それは仕事をdoes't。

出典

2011-09-09 smirnoffs

どのようなエラーが解読しますか？タイトルのタイプは何ですか（repr（photo.title.text）として出力してください） – rocksportrocker

それはすでにデコードされています。あなたは '.encode（ 'utf-8'）'を試しましたか？ –

@rocksportrocker 'repr（album.title.text）'は 'str： '\ xd0 \ x92 \ xd0 \ xb8 \ xd0 \ xb4 \ xd0 \ xb8 \ xd0 \ xb7 \ xd0 \ xbe \ xd0 \ xba \ xd0 \ xbdを返します。 \ xd0 \ xb0'' – smirnoffs

あなたは言う：

@rocksportrocker repr(album.title.text) returns str: 
'\xd0\x92\xd0\xb8\xd0\xb4 \xd0\xb8\xd0\xb7 \xd0\xbe\xd0\xba\xd0\xbd\xd0\xb0'

と

@d-k Yep, I've tried it. The result is the same. 
For example repr(album.title.text.encode('utf-8')) returns str: 
'\xd0\x92\xd0\xb8\xd0\xb4 \xd0\xb8\xd0\xb7 \xd0\xbe\xd0\xba\xd0\xbd\xd0\xb0'

これは本当のことはできません。最初の文が正しければ、第二の原因となります。上記また

>>> foo = '\xd0\x92\xd0\xb8\xd0\xb4 \xd0\xb8\xd0\xb7 \xd0\xbe\xd0\xba\xd0\xbd\xd0\xb0' 
>>> from unicodedata import name 
>>> for uc in foo.decode('utf8'): 
...  print "U+%04X" % ord(uc), name(uc) 
... 
U+0412 CYRILLIC CAPITAL LETTER VE 
U+0438 CYRILLIC SMALL LETTER I 
U+0434 CYRILLIC SMALL LETTER DE 
U+0020 SPACE 
U+0438 CYRILLIC SMALL LETTER I 
U+0437 CYRILLIC SMALL LETTER ZE 
U+0020 SPACE 
U+043E CYRILLIC SMALL LETTER O 
U+043A CYRILLIC SMALL LETTER KA 
U+043D CYRILLIC SMALL LETTER EN 
U+0430 CYRILLIC SMALL LETTER A 
>>>

かなりのテキストとは違っている：

UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128)

あなたのstrオブジェクトはUTF-8でエンコードされたキリル文字列であることが表示されますエラーメッセージ： '\ XD0 \ x9e \ XD1 \ X81 \ XD0 \ XB5 \ XD0 \ XBD \ XD1 \ x8c \秋-Equinox.jpg'

>>> bar = '\xd0\x9e\xd1\x81\xd0\xb5\xd0\xbd\xd1\x8c\Autumnal-Equinox.jpg' 
>>> for uc in bar.decode('utf8'): 
...  print "U+%04X" % ord(uc), name(uc) 
... 
U+041E CYRILLIC CAPITAL LETTER O 
U+0441 CYRILLIC SMALL LETTER ES 
U+0435 CYRILLIC SMALL LETTER IE 
U+043D CYRILLIC SMALL LETTER EN 
U+044C CYRILLIC SMALL LETTER SOFT SIGN 
U+005C REVERSE SOLIDUS 
U+0041 LATIN CAPITAL LETTER A 
U+0075 LATIN SMALL LETTER U 
U+0074 LATIN SMALL LETTER T 
# snipped the remainder

REVERSE SOLIDUS（バックスラッシュ）を実行していることを示している Windowsの場合WindowsはUTF-8をgrokしません。すべてのテキストを入力時にUnicodeに変換します。すべてのパスとファイル名にUnicodeを使用します。単純な例：

>>> bar = '\xd0\x9e\xd1\x81\xd0\xb5\xd0\xbd\xd1\x8c.txt' 
>>> ubar = bar.decode('utf8') 
>>> print repr(ubar) 
u'\u041e\u0441\u0435\u043d\u044c.txt' 
>>> f = open(ubar, 'wb') 
>>> f.write('hello\n') 
>>> f.close() 
>>> open(ubar, 'rb').read() 
'hello\n'

出典

2011-09-10 00:06:28

あなたは正しいです。それはCyrrillicだ。 'album_title_decoded = album.title.text.decode（ 'utf8'）' 'destination = os.path.join（album_title_decoded、photo.title.text）'はうまくいきます。ご協力いただきありがとうございます。 – smirnoffs

@smirnoffs： 'photo.title.text'（この場合））がASCIIであるためにのみうまくいきます。将来のためには、私が言ったことをやってみてください。**すべてのテキストを入力時にUnicodeに変換してください。 **すべての**パスと**ファイル名にUnicodeを使用する** –

Picasaのアルバムのタイトルエンコーディング。ユニコードではない？

答えて

関連する問題