2011-12-06 5 views
2

私はcsvファイルを持っています。ここでは、カラム1の値が同じで、新しいCSVファイルにその値を集約したカラム2のカラムのすべてのuniq値を見つけようとしています。私はそれはので、ここで道混乱に聞こえるの例ですが、知っている:ユニークなアイテムを見つけようとするより速いCSV +

元のファイルfoo.csvのサンプル:

"Boom Lifts","Model Number","Manufacturer","Platform Height","Horizontal Outreach","Lift Capacity" 
"Boom Lifts","Model Number","Platform Height","Horizontal Outreach","Up & Over Height","Platform Capacity" 
"Boom Lifts","Model Number","Platform Height","Horizontal Outreach","Up & Over Height" 
"Pusharound Lifts","Model Number","Manufacturer","Platform Height","Stowed Height" 
"Scissor Lifts","Model Number","Manufacturer","Platform Height","Stowed Height","Overall Dimensions","Platform Extension" 
"Scissor Lifts","Overall Dimensions","Platform Size","Platform Extension","Lift Capacity" 

理想的なアウトカムbar.csv:

"Boom Lifts","Model Number","Manufacturer","Platform Height","Horizontal Outreach","Lift Capacity","Up & Over Height","Platform Capacity",,, 
"Pusharound Lifts","Model Number","Manufacturer","Platform Height","Stowed Height" 
"Scissor Lifts","Model Number","Manufacturer","Platform Height","Stowed Height","Overall Dimensions","Platform Size","Platform Extension","Lift Capacity" 

の各行はさまざまな長さで、かなり大きなファイル(5k行以上)ですが、マッチング/文字列操作の仕方について私の頭を全く傷つけています。そして、はい、それらの行のいくつかは、空のセルがあるところで、後にコンマがあります。私はより速いCSVを使用していますので、これを行う方法があれば、それは素晴らしいでしょう。

ポインター?私のMBBを手に取らないようにしてくれるものが好きですか?

a = [ 
    ["Boom Lifts","Model Number","Manufacturer","Platform Height","Horizontal Outreach","Lift Capacity"] 
    ["Boom Lifts","Model Number","Platform Height","Horizontal Outreach","Up & Over Height","Platform Capacity"] 
    ["Boom Lifts","Model Number","Platform Height","Horizontal Outreach","Up & Over Height"] 
    ["Pusharound Lifts","Model Number","Manufacturer","Platform Height","Stowed Height"] 
    ["Scissor Lifts","Model Number","Manufacturer","Platform Height","Stowed Height","Overall Dimensions","Platform Extension"] 
    ["Scissor Lifts","Overall Dimensions","Platform Size","Platform Extension","Lift Capacity"] 
] 

a.group_by {|e| e[0]}.map {|e| e.flatten.uniq} 

はあなたを取得します:

[ 
    ["Boom Lifts", "Model Number", "Manufacturer", "Platform Height", "Horizontal Outreach", "Lift Capacity", "Up & Over Height", "Platform Capacity"] 
    ["Pusharound Lifts", "Model Number", "Manufacturer", "Platform Height", "Stowed Height"] 
    ["Scissor Lifts", "Model Number", "Manufacturer", "Platform Height", "Stowed Height", "Overall Dimensions", "Platform Extension", "Platform Size", "Lift Capacity"] 
] 

は瞬時ではありませんが、あなたのMBPをダウンさせるべきではありませんあなたがより速くCSVと2D配列にそれを得ることができると仮定すると、

+0

したがって、最初の列はキーとして扱うことができ、b)その後のすべての列をリスト内の値として扱うことができます。最後に、このリストに一意の値を含めるには...? bar.csvの最後の行は、 "Overall Dimension"と "Platform Extensions"を繰り返します。繰り返される値はOKですか? – buruzaemon

+0

私の悪い、全体的なディメンションとプラットフォーム拡張を繰り返すべきではありません。私はより高速なCSVを使用して、1つのファイルfoo.csvを読み込み、別のbar.csvを吐き出すことができます。ありがとう。 – MarkL

答えて

関連する問題