apache-pig - 如何在猪拉丁中结合连接两个袋子

apache-pig - 如何在猪拉丁中结合连接两个袋子,第1张

我有两个数据集:

A = {uid, url}; B = {uid, url};

现在我做cogroup

C = COGROUP A BY uid, B BY uid;

我希望将C更改为{group AS uid, DISTINCT A.url B.url};

我的问题是如何连接两个包A.url和B.url?

或者换句话说,如何在多列上执行DISTINCT

最佳答案:

2 个答案:

答案 0 :(得分:0)

这不是你所期待的,但这是我从你的问题中理解的:

C = JOIN A BY uid, B BY uid;
D = DISTINCT C;

连接按以下方式完成:

E = FOREACH D GENERATE CONCAT(A::uid,B::uid); 

答案 1 :(得分:0)

A = LOAD 'A' using PigStorage() as (uid,url);
B = LOAD 'B' using PigStorage() as (uid,url);
C = JOIN A by uid ,B by uid;
D = FOREACH C GENERATE ,CONCAT(A::url,B::url);
E= DISTINCT D;
dump E;
本文经用户投稿或网站收集转载,如有侵权请联系本站。

发表评论

0条回复