'When use Multiprocessing.Process doesn't use all process whit large data

I have a very large python code. The fundamental of that is I have a function which use a row of Dataframe and apply some formulas and save the object i've create whit joblib in my files. (Im gonna put a function to capture the essence of the script).

import Multiprocessing as multi

def somefunct(DataFrame_row, some_parameter1, some_parameter2, sema):

    python_object = My_object(DataFrame_row['Column1'],DataFrame_row['Column2'])
    python_object.some_complicate_method(some_parameter1, some_parameter2)
    # for example calculate an integral of My_object data  
    #takes 50-60 second aprox per row

    joblib.dump(python_object, path_save)
    #Before of tried a function that save the object i tried afunction that
    #save the object in the DataFrame


    sema.release()

def apply_all_data_frame(df, n_procces):

    sema = multi.Semaphore(n_procesos)
    procesos_list = []

    for index, row in df.iterrow():
        sema.acquire()
        p = multi.Process(target = somefunct,
                          args = (row, some_parameter1, some_parameter2, sema))

        procesos_list.append(p)
        p.start()

    for proceso in procesos_list:
        proceso.join()

So, the DataFrame contain 5000 rows and it maybe contain more in the future. I test the script with a data with 100 rows in a computer with 16 cores and 32 logic processor. I choose 30 process and with 100 rows use the 30 process(100% CPU) and finish quickly. But when i try again with all the data the computer only use 4 or 3 process (11%) and use 2.0 gb of RAM each process. Take to long.

My first try with the program was use Pool and Pool.map, but in that case is the same problem and full the RAM an broke everything despite having use less process (16 i think).

I've coment in the script that my first program was saving the object in the DataFrame but when i see that the RAM full 100% i decided to save the object. In that case i tried the Pool and freezing all, because create a python process with 0% work in the CPU.

I tried the function without Semaphore to.

I'm apologize for the English and for the explanation, is my first question online.

screenshot of how the process of computer works



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source