'Python GPU bubble sort

I have a sorting but unclearly big data need to sort clearly. Therefore, I want to use bubble sort to do it. Likes it.

#CPU bubble sort
def CPU_bubble_sort(data):
    for i in range(len(data)): # times
        change = False # Are there any action in this round?
        for i2 in range(len(data)-1): # index
            if data[i2] > data[i2+1]: # complete
                data[i2],data[i2+1] = data[i2+1],data[i2] # change
                change = True # There have actions in this round.
        if change == False: # if there are not any actions in this round
            break # It's mean sorting is finish.
    return data

Now, I want to use GPU from numba to do it quickly.

from numba import cuda
import numpy as np

#Call gpu sort function
def call_bubble_sort(data):
    #GPU deploy
    threads_per_block = 512 # threads
    blocks_per_grid = int(len(data) / 2 / threads_per_block) # blocks
    if len(data) / 2 % threads_per_block == 0:
        pass
    else: 
        blocks_per_grid += 1 
    #complete
    gpu_data = cuda.to_device(data) # data copy to gpu
    change = cuda.to_device(np.bool_(False)) # Are there any action in this round?
    for i in range(len(data)): #max times
        gpu_step = cuda.to_device(np.bool_(i%2)) # odd steps or even steps
        GPU_BubbleSort[blocks_per_grid,threads_per_block](gpu_data,gpu_step,change)
        cuda.synchronize() #wait GPU in this round
        if i % 2 == 0: 
            continue
        elif change.copy_to_host() == False: # if there are not any actions in this round
            break # can stop
        change = cuda.to_device(np.bool_(False)) # next rounds
    return gpu_data.copy_to_host()

@cuda.jit
def GPU_BubbleSort(data,step,change):
    idx = cuda.threadIdx.x + cuda.blockDim.x * cuda.blockIdx.x # thread num
    idx = idx * 2 + 1 # data odd nums
    #sort
    if idx < len(data): # threads to act
        if step == False: # even rounds
            if data[idx] < data[idx-1]: #complete left first
                change = True # Action in this turn
                data[idx] , data[idx-1] = data[idx-1] , data[idx] # change
        else: #odd rounds
            #complete right
            if idx+1 >= len(data): #if out of the data range
                pass # do nothing
            elif data[idx] > data[idx+1]: # complete right
                data[idx] , data[idx+1] = data[idx+1] , data[idx] # change
                change = True # Action in this turn

I think that 'change' is a single variable, so every threads are eager to update it if in demand. In other words, the variable won't be updated if nobody get it. However, this way is fail. I want to know how to create the sharing variable in a grid, or are there anyway to get the same effect?

I'm poor at English. I'm sorry if I had offended.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source