2
私はcsvファイルを持っています。ここでは、カラム1の値が同じで、新しいCSVファイルにその値を集約したカラム2のカラムのすべてのuniq値を見つけようとしています。私はそれはので、ここで道混乱に聞こえるの例ですが、知っている:ユニークなアイテムを見つけようとするより速いCSV +
元のファイルfoo.csvのサンプル:
"Boom Lifts","Model Number","Manufacturer","Platform Height","Horizontal Outreach","Lift Capacity"
"Boom Lifts","Model Number","Platform Height","Horizontal Outreach","Up & Over Height","Platform Capacity"
"Boom Lifts","Model Number","Platform Height","Horizontal Outreach","Up & Over Height"
"Pusharound Lifts","Model Number","Manufacturer","Platform Height","Stowed Height"
"Scissor Lifts","Model Number","Manufacturer","Platform Height","Stowed Height","Overall Dimensions","Platform Extension"
"Scissor Lifts","Overall Dimensions","Platform Size","Platform Extension","Lift Capacity"
理想的なアウトカムbar.csv:
"Boom Lifts","Model Number","Manufacturer","Platform Height","Horizontal Outreach","Lift Capacity","Up & Over Height","Platform Capacity",,,
"Pusharound Lifts","Model Number","Manufacturer","Platform Height","Stowed Height"
"Scissor Lifts","Model Number","Manufacturer","Platform Height","Stowed Height","Overall Dimensions","Platform Size","Platform Extension","Lift Capacity"
の各行はさまざまな長さで、かなり大きなファイル(5k行以上)ですが、マッチング/文字列操作の仕方について私の頭を全く傷つけています。そして、はい、それらの行のいくつかは、空のセルがあるところで、後にコンマがあります。私はより速いCSVを使用していますので、これを行う方法があれば、それは素晴らしいでしょう。
ポインター?私のMBBを手に取らないようにしてくれるものが好きですか?
a = [
["Boom Lifts","Model Number","Manufacturer","Platform Height","Horizontal Outreach","Lift Capacity"]
["Boom Lifts","Model Number","Platform Height","Horizontal Outreach","Up & Over Height","Platform Capacity"]
["Boom Lifts","Model Number","Platform Height","Horizontal Outreach","Up & Over Height"]
["Pusharound Lifts","Model Number","Manufacturer","Platform Height","Stowed Height"]
["Scissor Lifts","Model Number","Manufacturer","Platform Height","Stowed Height","Overall Dimensions","Platform Extension"]
["Scissor Lifts","Overall Dimensions","Platform Size","Platform Extension","Lift Capacity"]
]
a.group_by {|e| e[0]}.map {|e| e.flatten.uniq}
はあなたを取得します:
[
["Boom Lifts", "Model Number", "Manufacturer", "Platform Height", "Horizontal Outreach", "Lift Capacity", "Up & Over Height", "Platform Capacity"]
["Pusharound Lifts", "Model Number", "Manufacturer", "Platform Height", "Stowed Height"]
["Scissor Lifts", "Model Number", "Manufacturer", "Platform Height", "Stowed Height", "Overall Dimensions", "Platform Extension", "Platform Size", "Lift Capacity"]
]
は瞬時ではありませんが、あなたのMBPをダウンさせるべきではありませんあなたがより速くCSVと2D配列にそれを得ることができると仮定すると、
したがって、最初の列はキーとして扱うことができ、b)その後のすべての列をリスト内の値として扱うことができます。最後に、このリストに一意の値を含めるには...? bar.csvの最後の行は、 "Overall Dimension"と "Platform Extensions"を繰り返します。繰り返される値はOKですか? – buruzaemon
私の悪い、全体的なディメンションとプラットフォーム拡張を繰り返すべきではありません。私はより高速なCSVを使用して、1つのファイルfoo.csvを読み込み、別のbar.csvを吐き出すことができます。ありがとう。 – MarkL