'Parquet Read for Column Comparison in c#?

I have a 20-million entry parquet file with 3 columns

file-path
create-date
mod-date

eg

file-path                create-date     mod-date
/bad                     1649890805      1649890805
/bad/10/1/one.json       1649890806      1649890806
/good/4/32/two.json      1649890805      1649890805
/good/5/0/three.json     1649890812      1649890813

I do not want any file-path that begins with "/bad", but any other StartsWith is fine.

I need every value of file-path and mod-date where mod-date is greater than create-date. If it is too slow to obtain every file path, a count would be fine. In the above set, I would only want "/good/5/0/three.json" with "1649890813" to be returned.

I picked through docs at https://github.com/elastacloud/parquet-dotnet/blob/master/src/Parquet.Test/Reader/ParquetCsvComparison.cs and the unit test code to understand how to do a read, but still don't understand enough to know how to do what I want to do. So I have the below but am stuck.

using (var fileStream = File.OpenRead(ParquetFilePath))
{
    using (var prr = new prr(fileStream, new ParquetOptions { TreatByteArrayAsString = true }))
    {
            for (var prC = 0; prC < prr.RowGroupCount; prC++)
            {
                var lg = prr.ReadEntireRowGroup(prC); //debug viewing
                using (ParquetRowGroupReader grr = prr.OpenRowGroup(prC))
                {
                    DataColumn[] columns = dataFields.Select(grr.ReadColumn).ToArray();
                    // How to compare all column data from columns 1 and 2?
                }
            }
        }
}

Thanks.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source