'std::execution breaks file writing

The program seems to exit without waiting for the file system to apply the inner changes when integrating with std::execution.

I recently tried PPL, TBB, OpenMP and std::exection to find the fastest parallel lib for a particular work running on my machine. The work is to recursively convert some files into another form, which is basically:

#include <iostream>
#include <filesystem>
#include <fstream>
#include <vector>
#include <iterator>

using namespace std;
namespace fs = std::filesystem;

const auto CurrentPath = fs::current_path();

fs::path GetOutputPath(const fs::directory_entry& Entry)
{
    return CurrentPath / "Converted" / fs::relative(Entry, CurrentPath);
}

// Convert and save a file entry.
void ConvertFile(const fs::directory_entry& Entry)
{
    ifstream IStream(Entry.path(), ios::in | ios::binary);
    noskipws(IStream);

    ofstream OStream(GetOutputPath(Entry), ios::out | ios::trunc | ios::binary);

    // Read the data.
    // Intentional `uint8_t` for some special needs.
    vector<uint8_t> Data;
    Data.reserve(Entry.file_size());
    Data.assign(istream_iterator<uint8_t>(IStream), {});

    // Some changes to `Data`.
    // Left blank to simplify the problem.

    // Write to the output file.
    OStream.write(reinterpret_cast<char *>(&Data[0]), Data.size());
}

// Convert and save a directory entry.
void ConvertDirectory(const fs::directory_entry& Entry);

// Convert and save an entry.
void Convert(const fs::directory_entry& Entry)
{
    // Recursively loop directories
    if (Entry.is_directory())
    {
        ConvertDirectory(Entry);
    }
    else
    {
        ConvertFile(Entry);
    }
}

void ConvertDirectory(const fs::directory_entry& Entry)
{
    // Not using `recursive_directory_iterator` since its order is not guaranteed.
    // I think manual recursion can always minimize the number of calls to create a directory.
    vector<fs::directory_entry> SubEntries(fs::directory_iterator(Entry), {});
    fs::create_directory(GetOutputPath(Entry));

    // ** Parallel ** part.
    for_each(SubEntries.cbegin(), SubEntries.cend(), Convert);
}

int main(int argc, char *argv[])
{
    ConvertDirectory(fs::directory_entry(CurrentPath / "Test"));
}

With PPL (ppl.h), I change the ** Parallel ** part to:

    concurrency::parallel_for_each(SubEntries.cbegin(), SubEntries.cend(), Convert);

With TBB (oneapi/tbb.h):

    tbb::parallel_for_each(SubEntries.cbegin(), SubEntries.cend(), Convert);

With std::execution:

    for_each(execution::par, SubEntries.cbegin(), SubEntries.cend(), Convert);

Results:
Test 1: 700 files in 300 dirs, 1.5MB per file, avg values from 10 tests per cell
Test 2: 50k files in 5k dirs, 0.15MB per file, avg values from 20 tests per cell

Test 1 Test 2
(Original) 2.8s 130s
PPL 0.59s 37s
TBB 0.58s 36.8s
std::execution 0.55s 32.6s

The improvements are quite obvious.
But the problem is, when I check the output files per test, I found the std::execution version may have as low as 45k output files in a test 2. If I print a message after each write(), the problem still exists, while the number of messages is always 50k, so every file is processed and the problem seems more likely to be the write calls not being fully applied to the file system when the program exits.

The number of directories is always correct, and the PPL and TBB versions don't have a such problem.

I'm using Visual Studio 2022. Did I wrote anything wrong? What can I do to prevent this from happening with std::execution?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source