NPAC Computing Course#

spot

The purpose of these 5 days is primarily to introduce some programming concepts and the life-cycle of the software development, through the widely used Python language:

  • to understand how a scientific application is developed, tested, constructed, assembled, documented

  • to study and exercise some of the productivity tools that can be exploited to drive the development process

  • exemplify how some development tools can be used, via a realistic example.

Even though language and tools may be different in the future, you will likely all be using computer science if you continue in PhD, data science or engineering. You should always focus on the bigger picture, and extract general knowledge rather than details of implementation:

  • Writing structured code

  • Managing code history and versions

  • Documenting code

  • Analyzing code quality

  • Testing code

Time flies… What won’t be addressed in this course:

  • How to conceptually design algorithms or data structures

  • How to deploy the application onto user environments

  • The theory of computing languages

  • Physics issues

Organization of the course#

The first two thirds of the course are made of six mandatory exercises. For each exercise, you will write a program producing a formatted output (signature), which we will automatically compare to an expected one, so to check that you successfully answered the questions. This is only checking that you computed the expected values, but never forget that most importantly, we will review the way you did it : comments, documentation, your messages when versioning new code, etc. For the last third of the course, you will have to choose between few loose projects.

During the course, you’ll be using

  • development tools that we will present in First Steps section.

  • various technical facilities related to the physics context (PyPlot, Numpy, WCS, Simbad) that will be introduced during the exercises, when we will start using them.

An oral presentation will be given on each tool or facility at the beginning of or during this course, introducing all the main features that will be required to complete the exercises. These presentations are available under the Slides area of this documentation navigation menu: you are invited to refer to it as often as needed until you are comfortable with a tool. The Python background you need is covered under the Python Notebooks area in rightsidebar. We recommend that when you are looking for some specific information, you always start with these presentations that have been tailored to your needs, rather than looking at a more general/complete tutorial.

There are many possible productivity tools for software development. We selected some of them that you will have to use along this course:

  • PyCharm (development, debugging)

  • Git (code & version management)

  • SonarQube (code quality assessment)

The implementation language that you will use is Python. During the exercises, you will develop some scientific applications, around some of the well-known scientific libraries:

  • numpy (data manipulation)

  • pyplot (data visualization)

  • scipy (algorithms)

Pedagogical assumptions#

We cope with the very heterogeneous technical background of the students. We run a set of progressive exercises starting from a basic level up to more complex situations. Advanced students have opportunity to explore more detailed features. All tools and libraries offer free access tutorials & manuals. The theme of the exercises is astrophysics, but this lecture doesn’t require any particular expertise in this field.

Scientific themes#

During this course, we’ll build an application to manipulate astrophysics images uploaded from the ESO Digital Sky Survey (DSS) database. The images used are coming from Oschin Schmidt Telescope on Palomar Mountain. We’ll study:

  • How to display images

  • How to build and manage a graphical application

  • How to build and manage an interactive application

  • How to extract some scientific information from the images like:

    • Star identification and coordinates

    • Statistics information on celestial objects

Exercises#

This lecture is built around a set of exercises of increasing complexity that will allow to discover and practice the themes mentioned above and described in the navigation menu displayed on the top-right side of every page.

Main steps that will be accomplished during the exercises are:

  • reading one fits image + analysis/discovery of the fits format;

  • plotting the image + discovering the pyplot mechanisms (simple plot, complex canvas, …);

  • interactive widgets + callback functions;

  • background analysis + gaussian fit;

  • “object” finding;

  • object management: classes, lists, dictionaries;

  • adding statistical analysis on objects (histogramming, gaussian fit onto each cluster);

  • coordinate conversion: from pixels to global sky coordinates using wcslib;

  • accessing a public catalog to associate objects to celestial bodies, labeling the brightest ones on the plot;

  • assembling a complex application.

The following remarks apply to every exercise

  • Git, pycharm, tests, code quality (SonarQube), debugger … are supposed to be used everywhere, at every step.

  • We provide short presentations introducing all the concepts, algorithms, techniques and tools used in the exercises. in particular, Python features will be progressively introduced, with small presentations on each programming features.

  • It is strongly recommended to use the PyCharm debugger to troubleshoot your applications. It may also be used as a data browser/explorer.

  • Each exercise (except First Steps) has two parts: a batch part, to be written in a dedicated application file for each exercise, and a graphical part, to be added to a display.py file which you will progressively extend one exercise at a time.

For what concerns the batch part of each exercise, a file is provided with a skeleton code to be completed. We call it “batch” because the application will not interact with the user : it is expected to read an image, make some computations, and finally print few key results on the terminal. You are provided with the expected results when processing the common image which is shared by all students. But you do not know which values are expected for your specific image. Once you think your code is ok, you must push it to your reference Git repository (how to do it is explained in First Steps). This will trig validation tests, including a check that your output is correct for your specific image.

For what concerns the graphical part of each exercise, you are expected to progressively extend a file display.py. You will reuse each time the same file, which will receive new features for each exercise. There is no automatic validation of this part. Your final code will be tried and examined by the instructors at the end of the course. Your graphical application will reuse most of the computation code written for the batch parts of exercises. It is highly recommended to locate the reusable code in separates files (python modules), to be imported both in the batch applications and in the graphical application.

Signatures#

For each exercise, for what concerns the batch part, you will have to complete a python file so that it computes key values and finally prints them on the terminal. We call this the signature throughout the documentation. It will reflect the correctness of the results obtained.

Main characteristics of the signature are:

  • This information must obey a very precise format. All exercises describe the required format for their specific signature lines. In particular, in order to distinguish the signature printouts, the format always start by the word RESULT: (don’t forget the ``colon``). This keyword is meant to separate all printouts that your applications might produce, and select the signature lines.

  • The signature printouts will be eventually automatically analyzed at each push actions by a dedicated script, launched in the Git server. You must check the result (it will be explained in First Steps how to do it) and fix problems if the test doesn’t succeed. This will be part of the grading.

  • Remember that this signature is not meant to produce debug printouts. Instead, it is better to use the PyCharm debugger for troubleshooting problems than adding print statements to your code… (although producing informational or debug printouts is always possible of course)

  • Generally, the signature format has the following structure characteristics:

  • the signature is made of one or several individual lines

  • each information has a name and a value (the name syntax is mandatory and is specified for every exercise and should be strictly respected)

  • the expected format should be exactly of the form: RESULT: <name> = <value>

    • each name is followed by an equal character

    • all spaces are ignored in the line

    • you should generally use the following specific Python formats for the values:

      • {:s} for string values

      • {:.10f} for small real values

      • {:.5f} for real values

      • {:.0f} for big real values

      • {:d} for integers

      • {:02d} for integers which are ranks

  • some examples:

    # for a textual value
    print('RESULT: value_name1: {:s}'.format(a_str_value))
    
    # for a real value
    print('RESULT: value_name2: {:.10f}'.format(a_float_value))
    
    # for an integers value
    print('RESULT: value_name3: {:d}'.format(a_int_value))
    

Projects#

In addition to the core exercises (#1 to #5) which cover all the technical aspects that you will learn during this course, we also propose a few projects (#a to #d) that will allow you to try and go beyond what was the basics in the exercises. Depending on your background at the beginning of the lecture, you may complete only one of them or several. Explanations for projects are very limited on purpose, and much room is left to creativity. Note that projects are not optional, but indeed, it is expected that somebody with little computing knowledge and experience will not have the time to do them all. These projects are part of the grading (around 20% of the total grade).

A few important remarks regarding the projects are:

  • You should not start a project before the successful completion (and validation) of the first six exercises.

  • The implementation of these projects is voluntarily less documented: you are expected to have enough experience when you start the projects to design your own implementation.

  • There is no standard result expected and thus no signature to validate.

  • About the grading, you are expected to - at least - indicate how you think you might elaborate the solution for at least one of the projects.