Concurrent Clustering#

Goal#

In an image covering a large region of the sky, identifying all the clusters can be a time-consuming operation. One possible approach to speed up this process is to explore different region of the sky in parallel, using any kind of concurrency technology. All the applications we have developped so far are sequential. But all modern operating systems (Linux, OSX, Windows…) allow one application to run several tasks concurrently, potentially taking advantage of modern multi-core processors. This exercise is a brief introduction to multithreading and multiprocessing through asynchronous programming. It relies on library concurrent.futures, which makes it rather easy.

A demonstration example#

Look at the example below.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import concurrent.futures
import time


def add(value1, value2):
    time.sleep(2)
    return value1 + value2


exe = concurrent.futures.ProcessPoolExecutor()

time0 = time.time()

future_res1 = exe.submit(add, 10, 1)
future_res2 = exe.submit(add, 20, 2)

print(f"res1: {future_res1.result()}")
print(f"res2: {future_res2.result()}")

time1 = time.time()

print(f"time: {time1 - time0} seconds")

Instead of directly calling add(), the program is giving those calls to the instance of ProcessPoolExecutor, which will execute the function within a child process. The call to submit is not blocking the execution. Instead of getting directly the result, one get a “future” result, which will be ready later. When one call future_res1.result(), it is blocking the execution until the result is really available. Thus, the usual practice is to first submit all the computing tasks, so that they can start running concurrently, then to require all the results.

The example uses an instance of ProcessPoolExecutor, which will launch multiple processes. Using multiprocessing can accelerate or not a programm, depending on the time saved when using several cores simultaneously, compared to the time lost to create the new processes. If your computing time is too low, compared to the process creation time, you do not speed-up your program.

You can also try with ThreadPoolExecutor, which will launch multiple threads instead. Threads are light-weight processes, faster to create, but for many python programs, using multi-threading does not accelerate the execution, because Python suffers from a “Global Interpreter Lock” (GIL), meaning only one thread at a time can ask to interpret some Python code. In the example above, there is not enough instructions to suffer from the GIL. In the following step below, one may also see efficient multi-threading, because many time-consuming tasks are done by Numpy, which is an external library written in C, not needing the python interpreter, thus not suffering from the GIL.

Implementation details#

This project can be started by duplicating your solution to exercise 4 into a new file called pja_concurrent_clustering.py. Add timing instructions, and run the code with the data file global.fits, which is significantly larger and longer to process. Save the output, so to check that your future modifications will not break the results.

If you feel the process of the data file global.fits is too long, a first optional optimization may be to reimplement the convolution using the function https://docs.scipy.org/doc/scipy-0.16.0/reference/generated/scipy.signal.convolve2d.html. This should significantly speed-up the convolution.

Then, the goal of this project is the use of the library concurrent.futures, so to introduce concurrency in your clustering, and profit from several hardware cores. Always check that your final numerical results stay identical.

Result#

Try to run sequentially, or with a ThreadPoolExecutor, or with a ProcessPoolExecutor. Do you succeed to speed-up your execution ?

NPAC Computing Course

Concurrent Clustering

Contents

Concurrent Clustering#

Goal#

A demonstration example#

Implementation details#

Result#