'How to offload to the GPU with OpenACC in Windows?

I am trying to use OpenACC in Windows. I am using GCC to compile. (with version 8.1.0)

I found a sample code online using OpenACC.

So using the command prompt, I typed as follows.

"C:\Users\chang>g++ -fopenacc -o C:\Users\chang\source\repos\Project18\Project18\testing.exe C:\Users\chang\source\repos\Project18\Project18\Source1.cpp"

And if I look at Performance in Task manager while the code is running, I don't see any change in GPU usage.

Also if I skip -fopenacc

"C:\Users\chang>g++ -o C:\Users\chang\source\repos\Project18\Project18\testing.exe C:\Users\chang\source\repos\Project18\Project18\Source1.cpp"

There is no difference in speed between with -fopenacc and without.

So I was wondering if there is a prerequisite before I use this OpenACC.

Below is the sample code I found.

Thanks in advance.

P.S As far as I remember, I haven't downloaded openacc.h and tried to find it online but couldn't find where it is. Is this can be a problem? I think since I could run exe file this doesn't seem like a problem but just in case.

    /*
 *  Copyright 2012 NVIDIA Corporation
 *
 *  Licensed under the Apache License, Version 2.0 (the "License");
 *  you may not use this file except in compliance with the License.
 *  You may obtain a copy of the License at
 *
 *      http://www.apache.org/licenses/LICENSE-2.0
 *
 *  Unless required by applicable law or agreed to in writing, software
 *  distributed under the License is distributed on an "AS IS" BASIS,
 *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 *  See the License for the specific language governing permissions and
 *  limitations under the License.
 */
#include <iostream>
#include <math.h> 
#include <string.h>
#include <openacc.h>
#include <chrono>

#define NN 4096
#define NM 4096
using namespace std;
using namespace chrono;

double A[NN][NM];
double Anew[NN][NM];

int main(int argc, char** argv)
{
    const int n = NN;
    const int m = NM;
    const int iter_max = 1000;

    const double tol = 1.0e-6;
    double error = 1.0;

    memset(A, 0, n * m * sizeof(double));
    memset(Anew, 0, n * m * sizeof(double));

    for (int j = 0; j < n; j++)
    {
        A[j][0] = 1.0;
        Anew[j][0] = 1.0;
    }

    printf("Jacobi relaxation Calculation: %d x %d mesh\n", n, m);

    system_clock::time_point start = system_clock::now();
    int iter = 0;
    #pragma acc data copy(A), create(Anew)
    while (error > tol && iter < iter_max)
    {
        error = 0.0;

        #pragma acc kernels
        for (int j = 1; j < n - 1; j++)
        {
            for (int i = 1; i < m - 1; i++)
            {
                Anew[j][i] = 0.25 * (A[j][i + 1] + A[j][i - 1]
                    + A[j - 1][i] + A[j + 1][i]);
                error = fmax(error, fabs(Anew[j][i] - A[j][i]));
            }
        }

        #pragma acc kernels
        for (int j = 1; j < n - 1; j++)
        {
            for (int i = 1; i < m - 1; i++)
            {
                A[j][i] = Anew[j][i];
            }
        }

        if (iter % 100 == 0) printf("%5d, %0.6f\n", iter, error);

        iter++;
    }

    system_clock::time_point end = system_clock::now();
    std::chrono::duration<float> sec = end - start;
    cout << sec.count() << endl;
}


Solution 1:[1]

At this time, GCC doesn't support GPU code offloading on Windows. See https://stackoverflow.com/a/59376314/664214, or http://mid.mail-archive.com/[email protected], for example. It's certainly possible to implement, but somebody needs to do it, or pay for the work.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 tschwinge