'Why is my parallel code slower than my serial code?

Recently started learning parallel on my own and I have next to no idea what I'm doing. Tried applying what I have learnt but I think I'm doing something wrong because my parallel code is taking a longer time to execute than my serial code. My PC is running a i7-9700. This is the original serial code in question

def getMatrix(name):
 matrixCreated = []
i = 0   
while True:
    i += 1
    row = input('\nEnter elements in row %s of Matrix %s (separated by commas)\nOr -1 to exit: ' %(i, name))
    if row == '-1':
        break
    else:
        strList = row.split(',')
        matrixCreated.append(list(map(int, strList)))
return matrixCreated

def getColAsList(matrixToManipulate, col):
myList = []
numOfRows = len(matrixToManipulate)
for i in range(numOfRows):
    myList.append(matrixToManipulate[i][col])
return myList

def getCell(matrixA, matrixB, r, c):
matrixBCol = getColAsList(matrixB, c)
lenOfList = len(matrixBCol)
productList = [matrixA[r][i]*matrixBCol[i] for i in range(lenOfList)]
return sum(productList)

matrixA = getMatrix('A')
matrixB = getMatrix('B')

rowA = len(matrixA)
colA = len(matrixA[0])
rowB = len(matrixB)
colB = len(matrixB[0])

result = [[0 for p in range(colB)] for q in range(rowA)]
    
if (colA != rowB):
   print('The two matrices cannot be multiplied')
else:
     print('\nThe result is')
     for i in range(rowA):
     for j in range(colB):
         result[i][j] = getCell(matrixA, matrixB, i, j)
     print(result[i])
    

EDIT: This is the parallel code with time library. Initially didn't include it as I thought it was wrong so just wanted to see if anyone had ideas to parallize it instead

import multiprocessing as mp
pool = mp.Pool(mp.cpu_count())

def getMatrix(name):
matrixCreated = []
i = 0   
while True:
    i += 1
    row = input('\nEnter elements in row %s of Matrix %s (separated by commas)\nOr -1 to exit: ' %(i, name))
    if row == '-1':
        break
    else:
        strList = row.split(',')
        matrixCreated.append(list(map(int, strList)))
return matrixCreated

def getColAsList(matrixToManipulate, col):
myList = []
numOfRows = len(matrixToManipulate)
for i in range(numOfRows):
    myList.append(matrixToManipulate[i][col])
return myList

def getCell(matrixA, matrixB, r, c):
matrixBCol = getColAsList(matrixB, c)
lenOfList = len(matrixBCol)
productList = [matrixA[r][i]*matrixBCol[i] for i in range(lenOfList)]
return sum(productList)

matrixA = getMatrix('A')
matrixB = getMatrix('B')

rowA = len(matrixA)
colA = len(matrixA[0])
rowB = len(matrixB)
colB = len(matrixB[0])

import time
start_time = time.time()

result = [[0 for p in range(colB)] for q in range(rowA)]

if (colA != rowB):
   print('The two matrices cannot be multiplied')
else:
     print('\nThe result is')
     for i in range(rowA):
     for j in range(colB):
         result[i][j] = getCell(matrixA, matrixB, i, j)
     print(result[i])
print (" %s seconds " % (time.time() - start_time))
results = [pool.apply(getMatrix, getColAsList, getCell)]
pool.close()


Solution 1:[1]

So I would agree that you are doing something wrong. I would say that your code is not parallelable.

For the code to be parallelable it has to be dividable into smaller pieces and it either has to be:

1, Independent, meaning when it runs it doesn't rely on other processes to do its job.

For example if I have a list with 1,000,000 objects that need to be processed. And I have 4 workers to process them with. Then give each worker 1/4 of the objects to process and then when they finish all objects have been processed. But worker 3 doesn't care if worker 1, 2 or 4 completed before or after it did. Nor does worker3 care about what worker 1, 2 or 4 returned or did. It actually shouldn't even know that there are any other workers out there.

2, Managed, meaning there is dependencies between workers but thats ok cause you have a main thread that coordinates the workers. Still though, workers shouldn't know or care about each other. Think of them as mindless muscle, they only do what you tell them to do. Not to think for themselves.

For example I have a list with 1,000,000 objects that need to be processed. First all objects need to go through func1 which returns something. Once ALL objects are done with func1 those results should then go into func2. So I create 4 workers, give each worker 1/4 of the objects and have them process them with func1 and return the results. I wait for all workers to finish processing the objects. Then I give each worker 1/4 of the results returned by func1 and have them process it with func2. And I can keep doing this as many times as I want. All I have to do is have the main thread coordinate the workers so they dont start when they aren't suppose too and tell them what and when to process.

Take this with a grain of salt as this is a simplified version of parallel processing.

Tip for parallel and concurrency

You shouldn't get user input in parallel. Only the main thread should handle that.

If your work load is light then you shouldn't use parallel processing.

If your task can't be divided up into smaller pieces then its not parallelable. But it can still be run on a background thread as a way of running something concurrently.

Concurrency Example:

If your task is long running and not parallelable, lets say it takes 10 minutes to complete. And it requires a user to give input. Then when the user gives input start the task on a worker. If the user gives input again 1 minute later then take that input and start the 2nd task on worker2. Input at 5 minutes start task3 on worker3. At the 10 minute mark task1 is complete. Because everything is running concurrently by the 15 minute mark all task are complete. That's 2x faster then running the tasks in serial which would take 30 minutes. However this is concurrency not parallel.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 TeddyBearSuicide