'Multiprocessing not improving performance, and no output printed
I have written a function that takes vibration data in a proprietary format and looks for 60 Hz noise. It loops through reading a portion of the file (known as a block), applying an FFT to that block, and then checking the magnitude of the bin closest to 60 Hz. This process takes about 20 minutes due to the file size. The portion of the function which takes the longest is reading a block into memory into a Pandas DataFrame. I was hoping that using multiprocessing would speed it up, but I'm not having any luck. Here is my function which currently works as intended:
from scipy.fftpack import fft
from custom import load_dataset
import numpy as np
from multiprocessing import Process
def assessment():
# Code to read a database to get the path to the file, file length, sample rate, and the accelerometer name
# Code to determine block size based on sample rate
magnitudes = []
for block in range(0, file_length, block_size):
df = load_dataset(file_path, channels=accel_name, start_block=block, nblocks=block_size)
# Apply FFT
N = len(df)
nyquist = sample_rate/2
X = np.linspace(0.0, nyquist, int(N/2))
Y = fft(df)
y = 2/N*np.abs(Y[0:np.int(N/2)])
res = min(enumerate(X), key=lambda x: abs(60-x[1])) # Find bin closest to 60 Hz
magnitude = y[res[0]] # Get magnitude of desired bin
magnitudes.append(magnitude) # Store in list
return magnitudes
I tried adding this to the bottom, but the code ran for 20 minutes and then output nothing:
if __name__ == '__main__':
p = Process(target=assessment)
p.start()
p.join()
EDIT: I'm hoping this revision is something that can take advantage of threading or multiprocessing more easily:
def assessment(file_path, accel_name, block, block_size):
df = load_dataset(file_path, channels=accel_name, start_block=block, nblocks=block_size)
# Apply FFT
N = len(df)
nyquist = sample_rate/2
X = np.linspace(0.0, nyquist, int(N/2))
Y = fft(df)
y = 2/N*np.abs(Y[0:np.int(N/2)])
res = min(enumerate(X), key=lambda x: abs(60-x[1])) # Find bin closest to 60 Hz
magnitude = y[res[0]] # Get magnitude of desired bin
return magnitude
if __name__ == '__main__':
# Code to set all the variables that need to be passed to assessment function
temp = []
for block in range(0, file_length, block_size):
temp.append((file_path, accel_name, block, block_size)) #create list of tuples
p = Process(target=assessment, args=temp)
p.start()
p.join()
Solution 1:[1]
I'm not going to assert that this will actually succeed in making your process faster -- see comments on the question for a discussion of caveats involved -- but if you wanted to parallelize it with multiprocessing, that might look like the following:
def main():
# Code to set all the variables that need to be passed to assessment function
processes = [
Process(target=assessment, args=(file_path, accel_name, block, block_size))
for block in range(0, file_length, block_size)
]
for p in processes:
p.start()
for p in processes:
p.join()
if __name__ == '__main__':
main()
Note that we create one Process object per concurrent operation anticipated.
Note that just returning the value doesn't convey it to the parent process -- you should use a multiprocessing.Queue or multiprocessing.Pipe for that; or just write it to storage direct from the child process and avoid the transfer cost altogether.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
