'How to remove the "," comma which i am getting before a column value after joining data from 2 files
I am trying to run a pig script which picks data from 2 separate strings (string1, string2). And then using tokenize to separate them and later
Input:
String1 ="cow moon";
string2 ="cow over moon"
Expected Output:
cow, doc1
cow, doc2
over, doc1
moon, doc1
moon, doc1
Script:
tokenised_words1 = FOREACH stopWord1 GENERATE TOKENIZE(string1);
data1 = FOREACH tokenised_words1 GENERATE FLATTEN($0);
tokenised_words2 = FOREACH stopWord2 GENERATE TOKENIZE(string2);
data2 = FOREACH tokenised_words2 GENERATE FLATTEN($0);
corres_word_presentin_doc1 = FOREACH data1 GENERATE $0, 'doc1';
corres_word_presentin_doc2 = FOREACH data2 GENERATE $0,' doc2';
output_data = JOIN corres_word_presentin_doc1 BY ($0,$1) FULL OUTER,corres_word_presentin_doc2 BY ($0,$1);
dump output_data;
after that i am getting the output like below.
Current Output:
(cow,string1,,)
(,,cow, string2)
(,,moon, string1)
(moon,string1,,)
(,,over, string2)
Why i am getting extra commas in those values?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
