'from an input file obtain an output file by processing it with RDD Scala. Spark

I have a data set "measures.csv", it is requested to be processed using Apache Spark (Only with RDD, not DataFrame and not DataSet) to generate a text file called "estimate.txt". The text file "estimate.txt" must be generated only with the data corresponding to the thermometer with identifier "TM0" (not TM1,not TM2,..) and will have the following format:

Column 1-->date
Column 2-145-->144 measurements corresponding to the date indicated in Column 1
Column 146-->Mean of the 144 values of the day following the date indicated in Column 1

The input file "measures.csv" would have this format:

and this output, for example:

Device,Date,00:00,00:10,00:20,00:30,00:40,00:50,01:00,.,22:00,22:10,22:20,22:30,22:40,22:50,23:00,23:10,23:20,23:30,23:40,23:50
TM0,2013-12-01,36.719997,36.18,35.639999,34.559998,..,30.239998,30.239998,30.239998,31.319998,31.319998,30.239998
TM0,2013-12-02,36.719997,36.18,35.639999,34.559998,..30.239998,30.239998,30.239998,31.319998,31.319998,30.239912
TM0,2013-12-03,36.719997,36.18,35.639999,34.5599982,..,30.239998,30.239998,30.239998,31.319998,31.319998,30.239918
TM0,2013-12-04,36.719997,36.18,35.639999,34.559998,...31.5874241,30.23999,30.239998,31.319998,31.319998,30.239932
TM1,2013-12-05,36.719997,36.18,35.639999,34.559998,...,30.239998,30.239998,30.239998,31.319998,31.319998,30.239957
TM1,2013-12-06,36.719997,36.18,35.639999,34.559998,...,30.239998,30.239998,30.239998,31.319998,31.319998,30.239906
TM1,2013-12-07,36.719997,36.18,35.639999,34.559998,...,30.239998,30.239998,30.239998,31.319998,31.319998,30.239915   
...  
.......

How could I get the output file "estimated.tex" with the following format?:

Looking like this, for example:

Date,00:00,00:10,00:20,00:30,00:40,00:50,01:00,01:10,01:20,01:30,01:40,01:50,02:00,02:10,02:20,02:30,…..,22:00,22:10,22:20,22:30,22:40,22:50,23:00,23:10,23:20,23:30,23:40,23:50,average
2013-12-01,36.719997,36.18,35.639999,34.559998,34.559998,34.019997,32.399998,……… …..32.399998,31.859999,30.239998,30.239998,30.239998,31.319998,31.319998,30.239998,30.32
2013-12-02,36.719997,36.18,35.639999,34.559998,34.559998,34.019997,32.399998,……………..32.399998,31.859999,30.239998,30.239998,30.239998,31.319998,31.319998,30.239912,30.70
2013-12-03,36.719997,36.18,35.639999,34.559998,34.559998,34.019997,32.399998,……………..32.399998,31.859999,30.239998,30.239998,30.239998,31.319998,31.319998,30.239918,29.98
2013-12-04,36.719997,36.18,35.639999,34.559998,34.559998,34.019997,32.399998,……………..32.399998,31.859999,30.239998,30.239998,30.239998,31.319998,31.319998,30.239932

I hop explained myself well, I am a beginner in Scala and Spark. I understand that it is probably not important, but as an IDE I use Eclipse and for the moment it works fine for me.

Thanks in advance.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'from an input file obtain an output file by processing it with RDD Scala. Spark

Sources

Related Questions