私はこのようになり、データ持っている個々の列

から迅速にファイルを作成します。私はこのようになり、データ持っている個々の列

-1 1:-0.394668 2:-0.794872 3:-1 4:-0.871341 5:0.9365 6:0.75597 
1 1:-0.463641 2:-0.897436 3:-1 4:-0.871341 5:0.44378 6:0.121824 
1 1:-0.469432 2:-0.897436 3:-1 4:-0.871341 5:0.32668 6:0.302529 
-1 1:-0.241547 2:-0.538462 3:-1 4:-0.871341 5:0.9994 6:0.987166 
1 1:-0.757233 2:-0.948718 3:-1 4:-0.871341 5:-0.33904 6:0.915401 
1 1:-0.167147 2:-0.589744 3:-1 4:-0.871341 5:0.95078 6:0.991566

最初の列はクラスで、次の6列が機能しているし。私は個々の機能のために6ファイルを作成したいと思います。例えば

my_input_feat1.txtは

-1 1:-0.394668 
    1 1:-0.463641 
    ... 
    1 1:-0.757233 
    1 1:-0.167147

my_input_feat2.txtがそうで

-1 2:-0.794872 
... 
1 2:-0.589744

とが含まれています含まれています。私はこれを行うPerlコードを持っていますが、それはひどく遅いです。これをより速く行う方法はありますか？通常、入力ファイルには100K行が含まれます。

use strict; 
use Data::Dumper; 
use Carp; 
my $input = $ARGV[0] || "myinput.txt"; 




my $INFILE_file_name = $input;  # input file name 

open (INFILE, '<', $INFILE_file_name) 
    or croak "$0 : failed to open input file $INFILE_file_name : $!\n"; 

    my $out1 = $input."_feat_1.txt"; 
    my $out2 = $input."_feat_2.txt"; 
    my $out3 = $input."_feat_3.txt"; 
    my $out4 = $input."_feat_4.txt"; 
    my $out5 = $input."_feat_5.txt"; 
    my $out6 = $input."_feat_6.txt"; 

    unlink($out1); 
    unlink($out2); 
    unlink($out3); 
    unlink($out4); 
    unlink($out5); 
    unlink($out6); 

    print "$out1\n"; 

while (<INFILE>) { 
    chomp; 
    my @els = split(/\s+/,$_); 
    my $lbl = $els[0]; 

    my $OUTFILE1_file_name = $out1;  # output file name 
    open (OUTFILE1, '>>', $OUTFILE1_file_name) 
     or croak "$0 : failed to open output file $OUTFILE1_file_name : $!\n"; 
    print OUTFILE1 "$lbl $els[1]\n"; 
    close (OUTFILE1);   # close output file 

    my $OUTFILE2_file_name = $out2;  # output file name 
    open (OUTFILE2, '>>', $OUTFILE2_file_name) 
     or croak "$0 : failed to open output file $OUTFILE2_file_name : $!\n"; 
    print OUTFILE2 "$lbl $els[2]\n"; 
    close (OUTFILE2);   # close output file 

    # Etc.. until OUTFILE 6 

} 

close (INFILE);

出典

2011-01-11 neversaint

#!/usr/bin/sh 

for i in `seq 1 $1`; do 
    cut -f1,$i $2 > ${2}_$i; 
done

または

#!/usr/bin/perl 

use warnings; use strict; 

my $input_file = $ARGV[0]; 
my %handles; 

while (<>) { 
    my ($class, @features) = split /\s+/; 

    for my $i (1 .. @features) { 
     open $handles{$i}, '>', $input_file . "_$i" or die $! 
     unless exists $handles{$i}; 

     print {$handles{$i}} join(' ', $class, $features[$i - 1]), "\n";  
    } 
} 

while (my (undef, $handle) = each %handles) { 
    close $handle or die $!; 
}

出典

2011-01-11 08:29:07

シェルスクリプトはOKですか？

awk '{print $1" "$2}' data.txt > feat1_file.txt 
awk '{print $1" "$3}' data.txt > feat2_file.txt 
awk '{print $1" "$4}' data.txt > feat3_file.txt 
awk '{print $1" "$5}' data.txt > feat4_file.txt 
awk '{print $1" "$6}' data.txt > feat5_file.txt 
awk '{print $1" "$7}' data.txt > feat6_file.txt

出典

2011-01-11 07:52:58 eumiro

ありがとうございます。しかし、私は入力名の変数を持っていたいと思います。だから私はそのようなファイルを動的に異なる入力名を与えて作成することができます。機能の数は6より大きくても少なくてもよいことに注意してください。 – neversaint

@neversaint：上記のファイル名をシェル変数に抽象化し、すべてをBashスクリプトに入れるのは簡単です。 data.txtをたとえば、 $（FILENAME）を呼び出し、awkへの呼び出しの前にFILENAMEを設定してください。 – unwind

あなたは、whileループの外にオープン/クローズ、出力ファイルを移動する必要があります。

出典

2011-01-11 08:48:28 Toto

+1 - これはおそらく、OPのコードのオーバーヘッドの最大の原因の1つです – bdonlan

私はこのようになり、データ持っている個々の列

答えて

関連する問題