'How to remove the "," comma which i am getting before a column value after joining data from 2 files

I am trying to run a pig script which picks data from 2 separate strings (string1, string2). And then using tokenize to separate them and later

Input:

String1 ="cow moon"; 
string2 ="cow over moon"

Expected Output:

cow, doc1
cow, doc2
over, doc1
moon, doc1
moon, doc1

Script:

tokenised_words1 = FOREACH stopWord1 GENERATE TOKENIZE(string1);

data1 = FOREACH tokenised_words1  GENERATE FLATTEN($0);

tokenised_words2 = FOREACH stopWord2 GENERATE TOKENIZE(string2);

data2 = FOREACH tokenised_words2  GENERATE FLATTEN($0);

corres_word_presentin_doc1 = FOREACH data1 GENERATE $0, 'doc1';

corres_word_presentin_doc2 = FOREACH data2 GENERATE $0,' doc2';

output_data = JOIN corres_word_presentin_doc1 BY ($0,$1) FULL OUTER,corres_word_presentin_doc2 BY ($0,$1);

dump output_data;

after that i am getting the output like below.

Current Output:

(cow,string1,,)
(,,cow, string2)
(,,moon, string1)
(moon,string1,,)
(,,over, string2)

Why i am getting extra commas in those values?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source