'How can I turn this function implementation into a threaded version?

I need to turn the following implementation of this function into a threaded one. Yet, I have absolutely no clue on how to do so nor how to approach the problem. Any tips or orientation is much appreciated.

void compute_target_pixel(int x, int y) {
  int i, j, sum = 0;
  int delta = (KLEN - 1) / 2;
  for (i = -delta; i <= delta; ++i)
    for (j = -delta; j <= delta; ++j)
      if (0 <= x + i && x + i < WIDTH && 0 <= y + j && y + j < HEIGHT)
        sum += filter.values[(i + delta) * KLEN + (j + delta)] * pixels[(x + i) * HEIGHT + (y + j)];

  if(filter.sum > 0) target[x * HEIGHT + y] = sum / filter.sum;
  else target[x * HEIGHT + y] = sum;
}


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source