Build the clusters#

Reminder

  • Start the exercise with the Python template file in the src directory corresponding to the exercise.

    • Your code must be added to the main function.

    • If you write generic functions, put them into libraries to reuse them easily in other applications. Look at Library and Module documentation for more details on Python libraries.

    • You should carefully follow the division in steps proposed for every exercise and provide specific signatures for each step when this is specified.

  • Git operations:

    • Commit very frequently with relevant message. Focus on making clear the reasons why you made the change.

    • Push once a significant result is obtained.

  • Code quality:

    • Check frequently the PyCharm annotations about your code, for example the colored signals in the right vertical gutter.

    • Execute the check_commit_status.py script to check the validation tests.

    • Assess the quality of your code with SonarQube.

  • Documentation:

    • Add docstrings (""" ... """) at begining of ALL functions & classes.

    • Add one-line comments wherever this is useful to understand your code.

  • Refer to slides for detailed information on technical topics.

Goal#

The goal of this exercise is, starting from the peak, collect all the neightboring pixels which are presumably issued from the same original celestial object. This is called a cluster. For each cluster, an integrated luminosity will be computed, and the list of clusters will be sorted accordingly.


Principle#

(Refer to previous exercise to know what is a peak, and how it has been detected. Note that to detect peaks, we built some intermediate artefacts such as a convolution image, whose only utility was to find the peaks coordinates. In this part, we do not use any more the convolution image, and only the original real image will be considered)

An image is a 2D array of pixels. We want to identify all regions of this image, made of contiguous pixels, called clusters. These regions will be associated with celestial objects we want to characterize. To identify the contiguous pixels, only the pixels with a luminosity value greater than a threshold are considered.

We have identified peaks, that are the top pixel of every clusters. Now we will collect all the pixels of the original image above a threshold in order to form clusters.

A list of cluster is then built by gathering all the clusters found and sorted by cluster luminosity (i.e. the integral of the selected pixels value).


Implementation details#

As for every exercise, you will import the modules developed for the previous exercises. You should also put the functions for this exercise in a dedicated module named lib_cluster.py. And the main program should be written in a file ex4_clusters.py.

Cluster Characteristics#

Cluster objects must contain:

  • The integrated luminosity which is the sum of pixel values (in the original image) collected in the cluster

  • The luminosity of its peak (in the original image), which we sometimes call the “top value”

  • The coordinates (row, column) of the peak.

They will be implemented as a Python class.

Additional slides and notebooks#

Some slides are needed to complete this document and to get more precise technical information. Sometimes it may even be also needed to explore the complete documentations that are shown in appendices. Here: Fits slides, Numpy Notebook, Pyplot slides are very useful.


Step 1 : build the clusters#

In this step, you will build the cluster objects and register them into a sorted list.

Building a cluster means collecting all the pixels of the original image, around a given peak. The values of the collected pixels will have to be above a threshold.

This is an iterative process starting with the immediate neighbours of the peak (radius 1) and extending the radius until the mean value of the pixels at this radius is below the threshold.

The main steps are:

  1. Extract a sub-image of the original image around the peak position with a given radius, starting at 1.

  2. Compute the pixel sum at this radius: to achieve this, sum up all the pixel values (using numpy.sum()) inside this-sub image and remove the value computed for the previous radius (value of the central pixel if radius is 1) to remove the contribution from the previous neighbours. This is equivalent to computing the pixels sum in the pixel ring at the current radius.

  3. Compute the mean pixel value at the current ring (ie. by dividing the ring sum by the number of pixels at the current radius).

  4. If this mean value is below the threshold, do not add this radius to the cluster and stop the collection process. Else, increase radius by one and restart at the first step.

Note

For the successful validation of this exercise, you must use as the threshold value:

background + (6.0 * dispersion)

If the cluster contains 9 pixels or less (radius <= 1), this cluster is discarded. Otherwise, this cluster is valid (with the radius being its extension) and the integral of pixel values being its luminosity.

Note

For a few peaks, the value of the central pixel may be slightly below the threshold. This is an effect of the image noise, that was previously reduced thanks to the convolution. Even in this case, try to make the cluster : probably the first ring (radius = 1) will be above the threshold, and the cluster is valid.

Note

You can use the method proposed at the end of the previous exercise to check that the clusters you found and their positions matches what you can see on the image.

Then for every cluster found, you must instantiate a Cluster class object and store its characteristics, i.e.:

  • the pixel coordinates of the peak

  • the pixel value of the peak (top)

  • the integral of all pixels values (luminosity)

  • the radius (extension)

The process has to be repeated for every peak. It is suggested to implement this process in a Python function named build_clusters(). The figure below illustrates the result of the cluster pixel collection:

_images/cluster06.png

Signature

  • The number of clusters found in the image :

    signature_fmt_1 = 'RESULT: clusters_number = {:d}'
    

If you want to check your program with the common image common.fits, here is the expected signature:

RESULT: clusters_number = 48

Step 2 : sort the clusters#

To complete this clustering process, every cluster has to be stored in a sorted list.

  • The list must be sorted in descending order.

  • The sorting key should be a combination of the integral and the peak pixel value (to differentiate clusters with the same integral).

Signature

  • For the cluster with the highest integrated luminosity:

    • Its integrated luminosity

    • The pixel value of the peak

    • The pixel coordinates of the peak (image coordinate system)

    • the extension of the cluster (in pixels)

    signature_fmt_2 = 'RESULT: cluster_max_integral = {:d}'
    signature_fmt_3 = 'RESULT: cluster_max_top = {:d}'
    signature_fmt_4 = 'RESULT: cluster_max_column = {:d}'
    signature_fmt_5 = 'RESULT: cluster_max_row = {:d}'
    signature_fmt_6 = 'RESULT: cluster_max_extension = {:d}'
    

If you want to check your program with the common image common.fits, here is the expected signature:

RESULT: cluster_max_integral = 1868237
RESULT: cluster_max_top = 19148
RESULT: cluster_max_column = 171
RESULT: cluster_max_row = 111
RESULT: cluster_max_extension = 6