2017-07-11 1 views
1

U-SQLのテキストファイルからレコードを抽出する必要があります。最初の行は他の行と異なり、日付を含みます。私は最初の行をスキップする必要がありますが、私は最初の行から日付をコピーし、すべての行の新しい列に貼り付ける必要があります。最終的なu-sql出力クエリでは、このように、各行の最初の列には、ファイルの最初の行からコピーされた同じデータが格納されます。詳細については添付の画像ファイルをご覧ください。U-SQLで特定の行の値を持つ新しい列を追加

enter image description here

私にこのタスクを完了するために正しいU-SQLクエリを提案してください。

答えて

3

別の方法があります。私はサンプルでSearchLog.tsvを使ってこれを実証しました。ファイルの先頭に、FROM 01JAN17 TO 31JAN17という行を追加しました。

//Skip the first row and read all the other rows 
@searchlog = 
    EXTRACT UserId   int, 
      Start   DateTime, 
      Region   string, 
      Query   string, 
      Duration  int, 
      Urls   string, 
      ClickedUrls  string 
    FROM @"/Samples/Data/SearchLogWithHeader.tsv" 
    USING Extractors.Tsv(skipFirstNRows: 1); 

//Extract all the text in the same file but don't parse out the individual columns 
@searchlogAllText = 
    EXTRACT rowText string 
    FROM @"/Samples/Data/SearchLogWithHeader.tsv" 
    USING Extractors.Text(delimiter: '\n'); 

//Find a pattern that works for you and use the .NET expressions that match the string 
@searchlogHeaderDate = 
    SELECT rowText.Split(' ')[1] AS FromDate FROM @searchlogAllText WHERE rowText.StartsWith("FROM");          

@output = SELECT * FROM @searchlogHeaderDate CROSS JOIN @searchlog; 

OUTPUT @output 
    TO @"/Samples/Output/SearchLog_output.tsv" 
    USING Outputters.Tsv(); 

入力:

FROM 01JAN17 TO 31JAN17 
399266 2/15/2012 11:53:16 AM en-us how to make nachos 73 www.nachos.com;www.wikipedia.com NULL 
382045 2/15/2012 11:53:18 AM en-gb best ski resorts 614 skiresorts.com;ski-europe.com;www.travelersdigest.com/ski_resorts.htm ski-europe.com;www.travelersdigest.com/ski_resorts.htm 
382045 2/16/2012 11:53:20 AM en-gb broken leg 74 mayoclinic.com/health;webmd.com/a-to-z-guides;mybrokenleg.com;wikipedia.com/Bone_fracture mayoclinic.com/health;webmd.com/a-to-z-guides;mybrokenleg.com;wikipedia.com/Bone_fracture 
106479 2/16/2012 11:53:50 AM en-ca south park episodes 24 southparkstudios.com;wikipedia.org/wiki/Sout_Park;imdb.com/title/tt0121955;simon.com/mall southparkstudios.com 
906441 2/16/2012 11:54:01 AM en-us cosmos 1213 cosmos.com;wikipedia.org/wiki/Cosmos:_A_Personal_Voyage;hulu.com/cosmos NULL 
351530 2/16/2012 11:54:01 AM en-fr microsoft 241 microsoft.com;wikipedia.org/wiki/Microsoft;xbox.com NULL 
640806 2/16/2012 11:54:02 AM en-us wireless headphones 502 www.amazon.com;reviews.cnet.com/wireless-headphones;store.apple.com www.amazon.com;store.apple.com 
304305 2/16/2012 11:54:03 AM en-us dominos pizza 60 dominos.com;wikipedia.org/wiki/Domino's_Pizza;facebook.com/dominos dominos.com 
460748 2/16/2012 11:54:04 AM en-us yelp 1270 yelp.com;apple.com/us/app/yelp;wikipedia.org/wiki/Yelp,_Inc.;facebook.com/yelp yelp.com 
354841 2/16/2012 11:59:01 AM en-us how to run 610 running.about.com;ehow.com;go.com running.about.com;ehow.com 
354068 2/16/2012 12:00:33 PM en-mx what is sql 422 wikipedia.org/wiki/SQL;sqlcourse.com/intro.html;wikipedia.org/wiki/Microsoft_SQL wikipedia.org/wiki/SQL 
674364 2/16/2012 12:00:55 PM en-us mexican food redmond 283 eltoreador.com;yelp.com/c/redmond-wa/mexican;agaverest.com NULL 
347413 2/16/2012 12:11:55 PM en-gr microsoft 305 microsoft.com;wikipedia.org/wiki/Microsoft;xbox.com NULL 
848434 2/16/2012 12:12:35 PM en-ch facebook 10 facebook.com;facebook.com/login;wikipedia.org/wiki/Facebook facebook.com 
604846 2/16/2012 12:13:55 PM en-us wikipedia 612 wikipedia.org;en.wikipedia.org;en.wikipedia.org/wiki/Wikipedia wikipedia.org 
840614 2/16/2012 12:13:56 PM en-us xbox 1220 xbox.com;en.wikipedia.org/wiki/Xbox;xbox.com/xbox360 xbox.com/xbox360 
656666 2/16/2012 12:15:55 PM en-us hotmail 691 hotmail.com;login.live.com;msn.com;en.wikipedia.org/wiki/Hotmail NULL 
951513 2/16/2012 12:17:00 PM en-us pokemon 63 pokemon.com;pokemon.com/us;serebii.net pokemon.com 
350350 2/16/2012 12:18:17 PM en-us wolfram 30 wolframalpha.com;wolfram.com;mathworld.wolfram.com;en.wikipedia.org/wiki/Stephen_Wolfram NULL 
641615 2/16/2012 12:19:55 PM en-us kahn 119 khanacademy.org;en.wikipedia.org/wiki/Khan_(title);answers.com/topic/genghis-khan;en.wikipedia.org/wiki/Khan_(name) khanacademy.org 
65 2/16/2012 12:20:03 PM en-us clothes 732 gap.com;overstock.com;forever21.com;footballfanatics.com/college_washington_state_cougars footballfanatics.com/college_washington_state_cougars 
651777 2/16/2012 12:20:33 PM en-us food recipes 183 allrecipes.com;foodnetwork.com;simplyrecipes.com foodnetwork.com 
666352 2/16/2012 12:21:03 PM en-us weight loss 630 en.wikipedia.org/wiki/Weight_loss;webmd.com/diet;exercise.about.com webmd.com/diet 

出力:

"01JAN17" 399266 2012-02-15T11:53:16.0000000 "en-us" "how to make nachos" 73 "www.nachos.com;www.wikipedia.com" "NULL" 
"01JAN17" 382045 2012-02-15T11:53:18.0000000 "en-gb" "best ski resorts" 614 "skiresorts.com;ski-europe.com;www.travelersdigest.com/ski_resorts.htm" "ski-europe.com;www.travelersdigest.com/ski_resorts.htm" 
"01JAN17" 382045 2012-02-16T11:53:20.0000000 "en-gb" "broken leg" 74 "mayoclinic.com/health;webmd.com/a-to-z-guides;mybrokenleg.com;wikipedia.com/Bone_fracture" "mayoclinic.com/health;webmd.com/a-to-z-guides;mybrokenleg.com;wikipedia.com/Bone_fracture" 
"01JAN17" 106479 2012-02-16T11:53:50.0000000 "en-ca" "south park episodes" 24 "southparkstudios.com;wikipedia.org/wiki/Sout_Park;imdb.com/title/tt0121955;simon.com/mall" "southparkstudios.com" 
"01JAN17" 906441 2012-02-16T11:54:01.0000000 "en-us" "cosmos" 1213 "cosmos.com;wikipedia.org/wiki/Cosmos:_A_Personal_Voyage;hulu.com/cosmos" "NULL" 
"01JAN17" 351530 2012-02-16T11:54:01.0000000 "en-fr" "microsoft" 241 "microsoft.com;wikipedia.org/wiki/Microsoft;xbox.com" "NULL" 
"01JAN17" 640806 2012-02-16T11:54:02.0000000 "en-us" "wireless headphones" 502 "www.amazon.com;reviews.cnet.com/wireless-headphones;store.apple.com" "www.amazon.com;store.apple.com" 
"01JAN17" 304305 2012-02-16T11:54:03.0000000 "en-us" "dominos pizza" 60 "dominos.com;wikipedia.org/wiki/Domino's_Pizza;facebook.com/dominos" "dominos.com" 
"01JAN17" 460748 2012-02-16T11:54:04.0000000 "en-us" "yelp" 1270 "yelp.com;apple.com/us/app/yelp;wikipedia.org/wiki/Yelp,_Inc.;facebook.com/yelp" "yelp.com" 
"01JAN17" 354841 2012-02-16T11:59:01.0000000 "en-us" "how to run" 610 "running.about.com;ehow.com;go.com" "running.about.com;ehow.com" 
"01JAN17" 354068 2012-02-16T12:00:33.0000000 "en-mx" "what is sql" 422 "wikipedia.org/wiki/SQL;sqlcourse.com/intro.html;wikipedia.org/wiki/Microsoft_SQL" "wikipedia.org/wiki/SQL" 
"01JAN17" 674364 2012-02-16T12:00:55.0000000 "en-us" "mexican food redmond" 283 "eltoreador.com;yelp.com/c/redmond-wa/mexican;agaverest.com" "NULL" 
"01JAN17" 347413 2012-02-16T12:11:55.0000000 "en-gr" "microsoft" 305 "microsoft.com;wikipedia.org/wiki/Microsoft;xbox.com" "NULL" 
"01JAN17" 848434 2012-02-16T12:12:35.0000000 "en-ch" "facebook" 10 "facebook.com;facebook.com/login;wikipedia.org/wiki/Facebook" "facebook.com" 
"01JAN17" 604846 2012-02-16T12:13:55.0000000 "en-us" "wikipedia" 612 "wikipedia.org;en.wikipedia.org;en.wikipedia.org/wiki/Wikipedia" "wikipedia.org" 
"01JAN17" 840614 2012-02-16T12:13:56.0000000 "en-us" "xbox" 1220 "xbox.com;en.wikipedia.org/wiki/Xbox;xbox.com/xbox360" "xbox.com/xbox360" 
"01JAN17" 656666 2012-02-16T12:15:55.0000000 "en-us" "hotmail" 691 "hotmail.com;login.live.com;msn.com;en.wikipedia.org/wiki/Hotmail" "NULL" 
"01JAN17" 951513 2012-02-16T12:17:00.0000000 "en-us" "pokemon" 63 "pokemon.com;pokemon.com/us;serebii.net" "pokemon.com" 
"01JAN17" 350350 2012-02-16T12:18:17.0000000 "en-us" "wolfram" 30 "wolframalpha.com;wolfram.com;mathworld.wolfram.com;en.wikipedia.org/wiki/Stephen_Wolfram" "NULL" 
"01JAN17" 641615 2012-02-16T12:19:55.0000000 "en-us" "kahn" 119 "khanacademy.org;en.wikipedia.org/wiki/Khan_(title);answers.com/topic/genghis-khan;en.wikipedia.org/wiki/Khan_(name)" "khanacademy.org" 
"01JAN17"65 2012-02-16T12:20:03.0000000 "en-us" "clothes" 732 "gap.com;overstock.com;forever21.com;footballfanatics.com/college_washington_state_cougars" "footballfanatics.com/college_washington_state_cougars" 
"01JAN17" 651777 2012-02-16T12:20:33.0000000 "en-us" "food recipes" 183 "allrecipes.com;foodnetwork.com;simplyrecipes.com" "foodnetwork.com" 
"01JAN17" 666352 2012-02-16T12:21:03.0000000 "en-us" "weight loss" 630 "en.wikipedia.org/wiki/Weight_loss;webmd.com/diet;exercise.about.com" "webmd.com/diet" 
2

私は通常のU-SQL(すなわちないカスタムUDO)と、このような何かを行うことができたし、このsample file

USING rx = System.Text.RegularExpressions.Regex; 

DECLARE @inputFilepath string = "input/input71.txt"; 
DECLARE @outputFilepath string = "output/output71.csv"; 


// Get the first line; use silent option to skip all other lines 
// ie which have more than one column 
@file = 
    EXTRACT headerLine string 
    FROM @inputFilepath 
    USING Extractors.Text(delimiter : '|', silent : true); 


// Get the start date from the header 
@header = 
    SELECT 
     headerLine, 
     rx.Match(headerLine, @"FROM (?<startDate>\d{2}[A-Z]{3}\d{2}) TO (?<endDate>\d{2}[A-Z]{3}\d{2})").Groups["startDate"].ToString() AS startDate 
    FROM @file 
    WHERE headerLine.Contains("FROM"); 



// Get the rest of the lines; skip the header row explicity 
// don't use 'silent' as it should not be required (as we're skipping header row) 
@body = 
    EXTRACT runDate string, 
      col1 int, 
      col2 int, 
      col3 int 
    FROM @inputFilepath 
    USING Extractors.Text(delimiter : '|', skipFirstNRows : 1); 


@result = 
    SELECT h.startDate, p.* 
    FROM @header AS h 
     CROSS JOIN 
      @body AS p; 


// Export as csv 
OUTPUT @result 
TO @outputFilepath 
USING Outputters.Csv(quoting:false); 

マイ結果:

Results

これでした単純な例でU-SQLの機能を実証し、RegExとset操作を組み合わせることができます。同様のことがあなたのために働くかどうかを見てください。

関連する問題