ハイブで最も近い隣人を見つけるには？どの窓関数ですか？

テーブルハイブで最も近い隣人を見つけるには？どの窓関数ですか？

$cat data.csv 

ID,State,City,Price,Flag 
1,CA,A,95,0 
2,CA,A,96,1 
3,CA,A,195,1 
4,NY,B,124,0 
5,NY,B,128,1 
6,NY,C,24,0 
7,NY,C,27,1 
8,NY,C,29,0 
9,NY,C,39,1

期待される結果を考える：上記のフラグ= 0と各IDについて

ID0, ID1 
1,2 
4,5 
6,7 
8,7

を、私たちは "同じで、フラグ= 1から別のIDを見つけたいです「州」と「市」、および最も近い価格。

私は2つのラフな愚かなアイデアを持っている：

方法1.

Use a left outer join with the table itself on 
    (a.State=b.State and a.City=b.city and a.Flag=0 and b.Flag=1), 
    where a.Flag=0 and b.Flag=1, 

    and then use RANK() over (partitioned by a.State,a.City order by a.Price - b.Price) as rank 
    where rank=1

方法2

Use a left outer join with the table itself, 
on 
(a.State=b.State and a.City=b.city and a.Flag=0 and b.Flag=1), 
where a.Flag=0 and b.Flag=1, 

and then Use Distribute by a.State,a.City Sort by Price_Diff ASC limit 1

ハイブにおける最近傍を見つけるための最善の方法は何ですか？貴重なアドバイスをいただければ幸いです！

出典

2016-09-13 user3007888

select a.id, b.id , min(abs(b.price-a.price)) as delta 
from data as a 
    inner join data as b 
      on a.country=b.country and 
       a.flag=0 and b.flag=1 and 
       a.city=b.city 
group by a.id, b.id 
order by delta asc;

これは、最後の3行は第4

select a.id as id0, b.id as id1, abs(b.price-a.price) as delta, 
     rank() over (partition by a.country, a.city order by abs(b.price-a.price)) 
from data as a 
     inner join data as b 
      on a.country=b.country and 
      a.flag=0 and b.flag=1 and 
      a.city=b.city;

に使用したのと同じIDを持っていること。これは、返され

1 2 1 <--- 
8 7 2 <--- 
6 7 3 <--- 
4 5 4 <--- 
8 9 10 
6 9 15 
1 3 100

問題が返さ

id0 id1 prc rank 
    1 2 1 1 <--- 
    1 3 100 2 
    4 5 4 1 <--- 
    8 7 2 1 <--- 
    6 7 3 2 
    8 9 10 3 
    6 9 15 4

6,7が欠けています何とか正しいです。

6,NY,C,24,0 
7,NY,C,27,1 
8,NY,C,29,0 
9,NY,C,39,1

（6,7）、（6,9）、（8,7）、（8,9）の最低価格差は（8,7）です。（あいまいな結合）

このトピックについてのあなたのビデオが大好きです：Big Data Analytics Using Window Functions

出典

2016-09-13 20:41:43 ozw1z5rd

ハイブで最も近い隣人を見つけるには？どの窓関数ですか？

答えて

関連する問題