2017-03-07 4 views
0

私のforループの次の行に無効な構文があると私のpysparkコンソールから通知されています。コンソールは、それがにSyntaxErrorを持っているスキーマ= StructType(フィールド)の行までループのために実行しますが、ループのために私にはよさそうだ...ここ構文エラーsparksqlデータフレームのスキーマを定義しています

from pyspark import SparkContext 
from pyspark.sql import SQLContext 
from pyspark.sql.types import * 
sqlContext = SQLContext(sc) 

lines = sc.textFile('file:///home/w205/hospital_compare/surveys_responses.csv') 
parts = lines.map(lambda l: l.split(',')) 
surveys_responses = parts.map(lambda p: (p[0:33])) 
schemaString = 'Provider Number, Hospital Name, Address, City, State, ZIP Code, County Name, Communication with Nurses Achievement Points, Communication with Nurses Improvement Points, Communication with Nurses Dimension Score, Communication with Doctors Achievement Points, Communication with Doctors Improvement Points, Communication with Doctors Dimension Score, Responsiveness of Hospital Staff Achievement Points, Responsiveness of Hospital Staff Improvement Points, Responsiveness of Hospital Staff Dimension Score, Pain Management Achievement Points, Pain Management Improvement Points, Pain Management Dimension Score, Communication about Medicines Achievement Points, Communication about Medicines Improvement Points, Communication about Medicines Dimension Score, Cleanliness and Quietness of Hospital Environment Achievement Points, Cleanliness and Quietness of Hospital Environment Improvement Points, Cleanliness and Quietness of Hospital Environment Dimension Score, Discharge Information Achievement Points, Discharge Information Improvement Points, Discharge Information Dimension Score, Overall Rating of Hospital Achievement Points, Overall Rating of Hospital Improvement Points, Overall Rating of Hospital Dimension Score, HCAHPS Base Score, HCAHPS Consistency Score' 
fields = [] 
for field_name in schemaString.split(", "): 
    if field_name != ("HCAHPS Base Score" | "HCAHPS Consistency Score"): 
     fields.append(StructField(field_name, StringType(), True)) 
    else: 
     fields.append(StructField(field_name, IntegerType(), True)) 
schema = StructType(fields) 
+0

なぜこの '' FIELD_NAME =( "HCAHPSの基本スコア" | "HCAHPSの一貫性スコア") ''使用 '' FIELD_NAMEない( "HCAHPSの基本スコア"、 "HCAHPS整合性スコア")で! " –

答えて

1

|はそう!=条件と間違っているしません使用: -

from pyspark import SparkContext 
from pyspark.sql import SQLContext 
from pyspark.sql.types import * 
sqlContext = SQLContext(sc) 

lines = sc.textFile('file:///home/w205/hospital_compare/surveys_responses.csv') 
parts = lines.map(lambda l: l.split(',')) 
surveys_responses = parts.map(lambda p: (p[0:33])) 
schemaString = 'Provider Number, Hospital Name, Address, City, State, ZIP Code, County Name, Communication with Nurses Achievement Points, Communication with Nurses Improvement Points, Communication with Nurses Dimension Score, Communication with Doctors Achievement Points, Communication with Doctors Improvement Points, Communication with Doctors Dimension Score, Responsiveness of Hospital Staff Achievement Points, Responsiveness of Hospital Staff Improvement Points, Responsiveness of Hospital Staff Dimension Score, Pain Management Achievement Points, Pain Management Improvement Points, Pain Management Dimension Score, Communication about Medicines Achievement Points, Communication about Medicines Improvement Points, Communication about Medicines Dimension Score, Cleanliness and Quietness of Hospital Environment Achievement Points, Cleanliness and Quietness of Hospital Environment Improvement Points, Cleanliness and Quietness of Hospital Environment Dimension Score, Discharge Information Achievement Points, Discharge Information Improvement Points, Discharge Information Dimension Score, Overall Rating of Hospital Achievement Points, Overall Rating of Hospital Improvement Points, Overall Rating of Hospital Dimension Score, HCAHPS Base Score, HCAHPS Consistency Score' 
fields = [] 
for field_name in schemaString.split(", "): 
    if field_name not in ("HCAHPS Base Score", "HCAHPS Consistency Score"): 
     fields.append(StructField(field_name, StringType(), True)) 
    else: 
     fields.append(StructField(field_name, IntegerType(), True)) 
schema = StructType(fields) 
関連する問題