2016-11-07 13 views
3

私はPythonとPandasを初めて使用しています。私はPandas DataframeをネストされたJSONに変換しようとしています。関数.to_json()は、私の目的に十分な柔軟性を与えません。Pandas DataframeをネストされたJSONに変換する

,ID,Location,Country,Latitude,Longitude,timestamp,tide 
0,1,BREST,FRA,48.383,-4.495,1807-01-01,6905.0 
1,1,BREST,FRA,48.383,-4.495,1807-02-01,6931.0 
2,1,BREST,FRA,48.383,-4.495,1807-03-01,6896.0 
3,1,BREST,FRA,48.383,-4.495,1807-04-01,6953.0 
4,1,BREST,FRA,48.383,-4.495,1807-05-01,7043.0 
2508,7,CUXHAVEN 2,DEU,53.867,8.717,1843-01-01,7093.0 
2509,7,CUXHAVEN 2,DEU,53.867,8.717,1843-02-01,6688.0 
2510,7,CUXHAVEN 2,DEU,53.867,8.717,1843-03-01,6493.0 
2511,7,CUXHAVEN 2,DEU,53.867,8.717,1843-04-01,6723.0 
2512,7,CUXHAVEN 2,DEU,53.867,8.717,1843-05-01,6533.0 
4525,9,MAASSLUIS,NLD,51.918,4.25,1848-02-01,6880.0 
4526,9,MAASSLUIS,NLD,51.918,4.25,1848-03-01,6700.0 
4527,9,MAASSLUIS,NLD,51.918,4.25,1848-04-01,6775.0 
4528,9,MAASSLUIS,NLD,51.918,4.25,1848-05-01,6580.0 
4529,9,MAASSLUIS,NLD,51.918,4.25,1848-06-01,6685.0 
6540,8,WISMAR 2,DEU,53.898999999999994,11.458,1848-07-01,6957.0 
6541,8,WISMAR 2,DEU,53.898999999999994,11.458,1848-08-01,6944.0 
6542,8,WISMAR 2,DEU,53.898999999999994,11.458,1848-09-01,7084.0 
6543,8,WISMAR 2,DEU,53.898999999999994,11.458,1848-10-01,6898.0 
6544,8,WISMAR 2,DEU,53.898999999999994,11.458,1848-11-01,6859.0 
8538,10,SAN FRANCISCO,USA,37.806999999999995,-122.465,1854-07-01,6909.0 
8539,10,SAN FRANCISCO,USA,37.806999999999995,-122.465,1854-08-01,6940.0 
8540,10,SAN FRANCISCO,USA,37.806999999999995,-122.465,1854-09-01,6961.0 
8541,10,SAN FRANCISCO,USA,37.806999999999995,-122.465,1854-10-01,6952.0 
8542,10,SAN FRANCISCO,USA,37.806999999999995,-122.465,1854-11-01,6952.0 

を繰り返し、多くの情報があると私はこのようなJSONがしたい:ここ

は、(カンマ区切り、CSVに)データフレームのいくつかのデータポイントである

[ 
{ 
    "ID": 1, 
    "Location": "BREST", 
    "Latitude": 48.383, 
    "Longitude": -4.495, 
    "Country": "FRA", 
    "Tide-Data": { 
     "1807-02-01": 6931, 
     "1807-03-01": 6896, 
     "1807-04-01": 6953, 
     "1807-05-01": 7043 
    } 
}, 
{ 
    "ID": 5, 
    "Location": "HOLYHEAD", 
    "Latitude": 53.31399999999999, 
    "Longitude": -4.62, 
    "Country": "GBR", 
    "Tide-Data": { 
     "1807-02-01": 6931, 
     "1807-03-01": 6896, 
     "1807-04-01": 6953, 
     "1807-05-01": 7043 
    } 
} 
] 

これをどのように達成できますか?

+0

[ 'pandas.DataFrame.to_json'](HTTP: //pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_json.html)には多くのオプションがあります。オプションで必要なものを手に入れることができるかどうかを確認してください。 –

+0

特に 'orient'オプションをチェックしてください。 –

+0

どのように表示されません。同じ情報を何度も繰り返していますが、列のタイムスタンプと潮をネストする必要があります。 – Felix

答えて

6

UPDATE:

In [102]: j = (df.groupby(['ID','Location','Country','Latitude','Longitude'], as_index=False) 
    ...:  .apply(lambda x: x[['timestamp','tide']].to_dict('r')) 
    ...:  .reset_index() 
    ...:  .rename(columns={0:'Tide-Data'}) 
    ...:  .to_json(orient='records')) 
    ...: 

結果(フォーマット済み):

In [103]: print(json.dumps(json.loads(j), indent=2, sort_keys=True)) 
[ 
    { 
    "Country": "FRA", 
    "ID": 1, 
    "Latitude": 48.383, 
    "Location": "BREST", 
    "Longitude": -4.495, 
    "Tide-Data": [ 
     { 
     "tide": 6905.0, 
     "timestamp": "1807-01-01" 
     }, 
     { 
     "tide": 6931.0, 
     "timestamp": "1807-02-01" 
     }, 
     { 
     "tide": 6896.0, 
     "timestamp": "1807-03-01" 
     }, 
     { 
     "tide": 6953.0, 
     "timestamp": "1807-04-01" 
     }, 
     { 
     "tide": 7043.0, 
     "timestamp": "1807-05-01" 
     } 
    ] 
    }, 
    { 
    "Country": "DEU", 
    "ID": 7, 
    "Latitude": 53.867, 
    "Location": "CUXHAVEN 2", 
    "Longitude": 8.717, 
    "Tide-Data": [ 
     { 
     "tide": 7093.0, 
     "timestamp": "1843-01-01" 
     }, 
     { 
     "tide": 6688.0, 
     "timestamp": "1843-02-01" 
     }, 
     { 
     "tide": 6493.0, 
     "timestamp": "1843-03-01" 
     }, 
     { 
     "tide": 6723.0, 
     "timestamp": "1843-04-01" 
     }, 
     { 
     "tide": 6533.0, 
     "timestamp": "1843-05-01" 
     } 
    ] 
    }, 
    { 
    "Country": "DEU", 
    "ID": 8, 
    "Latitude": 53.899, 
    "Location": "WISMAR 2", 
    "Longitude": 11.458, 
    "Tide-Data": [ 
     { 
     "tide": 6957.0, 
     "timestamp": "1848-07-01" 
     }, 
     { 
     "tide": 6944.0, 
     "timestamp": "1848-08-01" 
     }, 
     { 
     "tide": 7084.0, 
     "timestamp": "1848-09-01" 
     }, 
     { 
     "tide": 6898.0, 
     "timestamp": "1848-10-01" 
     }, 
     { 
     "tide": 6859.0, 
     "timestamp": "1848-11-01" 
     } 
    ] 
    }, 
    { 
    "Country": "NLD", 
    "ID": 9, 
    "Latitude": 51.918, 
    "Location": "MAASSLUIS", 
    "Longitude": 4.25, 
    "Tide-Data": [ 
     { 
     "tide": 6880.0, 
     "timestamp": "1848-02-01" 
     }, 
     { 
     "tide": 6700.0, 
     "timestamp": "1848-03-01" 
     }, 
     { 
     "tide": 6775.0, 
     "timestamp": "1848-04-01" 
     }, 
     { 
     "tide": 6580.0, 
     "timestamp": "1848-05-01" 
     }, 
     { 
     "tide": 6685.0, 
     "timestamp": "1848-06-01" 
     } 
    ] 
    }, 
    { 
    "Country": "USA", 
    "ID": 10, 
    "Latitude": 37.807, 
    "Location": "SAN FRANCISCO", 
    "Longitude": -122.465, 
    "Tide-Data": [ 
     { 
     "tide": 6909.0, 
     "timestamp": "1854-07-01" 
     }, 
     { 
     "tide": 6940.0, 
     "timestamp": "1854-08-01" 
     }, 
     { 
     "tide": 6961.0, 
     "timestamp": "1854-09-01" 
     }, 
     { 
     "tide": 6952.0, 
     "timestamp": "1854-10-01" 
     }, 
     { 
     "tide": 6952.0, 
     "timestamp": "1854-11-01" 
     } 
    ] 
    } 
] 

OLD答え:

あなたはgroupby()apply()to_json()メソッドを使用してそれを行うことができます

j = (df.groupby(['ID','Location','Country','Latitude','Longitude'], as_index=False) 
     .apply(lambda x: dict(zip(x.timestamp,x.tide))) 
     .reset_index() 
     .rename(columns={0:'Tide-Data'}) 
     .to_json(orient='records')) 

出力:

In [112]: print(json.dumps(json.loads(j), indent=2, sort_keys=True)) 
[ 
    { 
    "Country": "FRA", 
    "ID": 1, 
    "Latitude": 48.383, 
    "Location": "BREST", 
    "Longitude": -4.495, 
    "Tide-Data": { 
     "1807-01-01": 6905.0, 
     "1807-02-01": 6931.0, 
     "1807-03-01": 6896.0, 
     "1807-04-01": 6953.0, 
     "1807-05-01": 7043.0 
    } 
    }, 
    { 
    "Country": "DEU", 
    "ID": 7, 
    "Latitude": 53.867, 
    "Location": "CUXHAVEN 2", 
    "Longitude": 8.717, 
    "Tide-Data": { 
     "1843-01-01": 7093.0, 
     "1843-02-01": 6688.0, 
     "1843-03-01": 6493.0, 
     "1843-04-01": 6723.0, 
     "1843-05-01": 6533.0 
    } 
    }, 
    { 
    "Country": "DEU", 
    "ID": 8, 
    "Latitude": 53.899, 
    "Location": "WISMAR 2", 
    "Longitude": 11.458, 
    "Tide-Data": { 
     "1848-07-01": 6957.0, 
     "1848-08-01": 6944.0, 
     "1848-09-01": 7084.0, 
     "1848-10-01": 6898.0, 
     "1848-11-01": 6859.0 
    } 
    }, 
    { 
    "Country": "NLD", 
    "ID": 9, 
    "Latitude": 51.918, 
    "Location": "MAASSLUIS", 
    "Longitude": 4.25, 
    "Tide-Data": { 
     "1848-02-01": 6880.0, 
     "1848-03-01": 6700.0, 
     "1848-04-01": 6775.0, 
     "1848-05-01": 6580.0, 
     "1848-06-01": 6685.0 
    } 
    }, 
    { 
    "Country": "USA", 
    "ID": 10, 
    "Latitude": 37.807, 
    "Location": "SAN FRANCISCO", 
    "Longitude": -122.465, 
    "Tide-Data": { 
     "1854-07-01": 6909.0, 
     "1854-08-01": 6940.0, 
     "1854-09-01": 6961.0, 
     "1854-10-01": 6952.0, 
     "1854-11-01": 6952.0 
    } 
    } 
] 

PSあなたはJSONファイルに直接書き込むことができますidentsの気にしない場合:

(df.groupby(['ID','Location','Country','Latitude','Longitude'], as_index=False) 
    .apply(lambda x: dict(zip(x.timestamp,x.tide))) 
    .reset_index() 
    .rename(columns={0:'Tide-Data'}) 
    .to_json('/path/to/file_name.json', orient='records')) 
+1

うわー!あなたは素晴らしいです。それは完璧に動作します! – Felix

+0

@Felix、嬉しいことに助けてもらいました:) – MaxU

+0

"Tide-Data":{"timestamp": "1848-07-01"、 "tide": "6957.0"}このデータは、 。あなたの機能で何を変えなければなりませんか? – Felix

関連する問題