Batch Processing with Python Multithreading

I want to execute 5 threads at a time but I have 23 things I want to run in total. Here is the code I came up with.

import threading
from multiprocessing import Pool
import time

class MyThread(threading.Thread):
    def run(id):
        print(f"thread {id}")

if __name__ == '__main__':
    start_time = time.time()
    threads = []
    batch_size = 5
    for i in range(23):

    batch_index = 1
    thread_index = 1
    while len(threads) > 0:
        pool = Pool()
        print(f"Batch {batch_index}")
        for j in range(batch_size):
            if threads:
                t = threads.pop()
                pool.apply_async(t, (thread_index,))
                thread_index += 1
        batch_index += 1

    elapsed_time = time.time() - start_time
    print(f"Took {elapsed_time}")


Batch 1
thread 1
thread 2
thread 3
thread 4
thread 5
Batch 2
thread 6
thread 7
thread 8
thread 9
thread 10
Batch 3
thread 11
thread 12
thread 13
thread 14
thread 15
Batch 4
thread 16
thread 17
thread 18
thread 19
thread 20
Batch 5
thread 21
thread 22
thread 23
Took 15.523964166641235

Each thread takes 3 seconds. If I executed the function sequentially, it would take at least 60 seconds but with the 5 threads at a time, it ended up with only 12 seconds. This is a huge improvement.

Another thing to note is that I declared threads variable as list. List has pop() method in Python. This returns the item (thread object in this case) and removes it from the list. This way, you can use the list to keep track on the threads.

I also needed to add if threads: to check if the threads still has items in case the number of threads is not divisible by 5. If I had 23 threads I want to execute, it attempts to execute 20, 21, 22, 23, 24, 25. 24 and 25 do not exist in the list so it errors out. To prevent such a situation, the if statement is necessary.

Leave a Reply

Your email address will not be published.