Welcome to ZAP’s documentation!

Tired of sky subtraction residuals? ZAP them!

ZAP (the Zurich Atmosphere Purge) is a high precision sky subtraction tool which can be used as complete sky subtraction solution, or as an enhancement to previously sky-subtracted MUSE integral field spectroscopic data. The method uses PCA to isolate the residual sky subtraction features and remove them from the observed datacube. Though the operation of ZAP is not dependent on perfect flatfielding of the data in a MUSE exposure, better results are obtained when these corrections are made ahead of time. Future development will include expansion to more instruments.


Version 2.0 is now compatible with the WFM-AO mode, and also brings major improvements for the sky subtraction. Check below the details in the Changelog section as well as the dicussion on the Optimal number of eigenvectors.

The paper describing the original method can be found here: http://adsabs.harvard.edu/abs/2016MNRAS.458.3210S

Please cite ZAP as:

\bibitem[Soto et al.(2016)]{2016MNRAS.458.3210S} Soto, K.~T., Lilly, S.~J., Bacon, R., Richard, J., \& Conseil, S.\ 2016, \mnras, 458, 3210


ZAP requires the following packages:

  • Numpy (1.6.0 or later)
  • Astropy (1.0 or later)
  • SciPy (0.13.3 or later)
  • Scikit-learn

Many linear algebra operations are performed in ZAP, so it can be beneficial to use an alternative BLAS package. In the Anaconda distribution, the default BLAS comes with Numpy linked to MKL, which can amount to a 20% speedup of ZAP.

The last stable release of ZAP can be installed simply with pip:

pip install zap

Or into the user path with:

pip install --user zap


In its most hands-off form, ZAP can take an input FITS datacube, operate on it, and output a final FITS datacube. The main function to do this is zap.process:

import zap
zap.process('INPUT.fits', outcubefits='OUTPUT.fits')

Care should be taken, however, since this case assumes a sparse field, and better results can be obtained by applying masks.

There are a number of options that can be passed to the code which we describe below.

Sparse Field Case

This case specifically refers to the case where the sky can be measured in the sky frame itself, using:

zap.process('INPUT.fits', outcubefits='OUTPUT.fits')

In both cases, the code will create a resulting processed datacube named DATACUBE_ZAP.fits in the current directory. While this can work well in the case of very faint sources, masks can improve the results.

For the sparse field case, a mask file can be included, which is a 2D FITS image matching the spatial dimensions of the input datacube. Masks are defined to be >= 1 on astronomical sources and 0 at the position of the sky. Set this parameter with the mask keyword

zap.process('INPUT.fits', outcubefits='OUTPUT.fits', mask='mask.fits')

Filled Field Case

This approach also can address the saturated field case and is robust in the case of strong emission lines, in this case the input is an offset sky observation. To achieve this, we calculate the SVD on an external sky frame using the function zap.SVDoutput.

An example of running the code in this way is as follows:

extSVD = zap.SVDoutput('Offset_Field_CUBE.fits', mask='mask.fits')
zap.process('Source_cube.fits', outcubefits='OUTPUT.fits', extSVD=extSVD)

The integration time of this frame does not need to be the same as the object exposure, but rather just a 2-3 minute exposure.

Optimal number of eigenvectors

The major difficulty to get a high quality sky subtraction is to find the optimal number of eigenvalues to use. ZAP provides an automated way for this, trying to find the inflexion point of the variance curve. This is one way to do it, but there is no right answer to this issue. A higher number of eigenvalues used for the reconstruction will give a better sky subtraction, but with the risk of subtracting signal from strong emission lines.

The first thing one can do to optimize the PCA quality is to use a good mask, to avoid incorporating signal from astronomical sources in the eigenvectors. Then it is highly recommended to have a look at the explained variance curves (which can be saved with the varcurvefits parameter) and the selected number of eigenvalues (saved in the FITS headers in ZAPNEV*). It is also possible to use the interactive mode (see below) to try different number of eigenvectors. This number can be specified manually with the neval parameter.

Command Line Interface

ZAP can also be used from the command line:

python -m zap INPUT_CUBE.fits

More information use of the command line interface can be found with the command

python -m zap -h

Interactive mode

ZAP can be used interactively from the Python console:

import zap
zobj = zap.process('INPUT.fits', interactive=True)

The run method operates on the datacube, and retains all of the data and methods necessary to process a final data cube in a Python class named Zap. You can elect to investigate the data product via the Zap object, and even reprocess the cube with a different number of eigenspectra per region. A workflow may go as follows:

import zap
from matplotlib import pyplot as plt

# allow ZAP to run the optimize routine
zobj = zap.process('INPUT.fits', interactive=True)

# plot the variance curves and the selection of the number of eigenspectra used

# plot a spectrum extracted from the original cube
plt.plot(zobj.cube[:,50:100,50:100].sum(axis=(1,2)), 'b', alpha=0.3)

# plot a spectrum of the cleaned ZAP dataproduct
plt.plot(zobj.cleancube[:,50:100,50:100].sum(axis=(1,2)), 'g')

# choose just the first 3 spectra for all segments

# plot a spectrum extracted from the original cube
plt.plot(zobj.cube[:,50:100,50:100].sum(axis=(1,2)), 'b', alpha=0.3)

# plot a spectrum of the cleaned ZAP dataproduct
plt.plot(zobj.cleancube[:,50:100,50:100].sum(axis=(1,2))), 'g')

# choose some number of modes by hand

# plot a spectrum
plt.plot(zobj.cleancube[:,50:100,50:100].sum(axis=(1,2))), 'k')

# Use the optimization algorithm to identify the best number of modes per segment

# compare to the previous versions
plt.plot(zobj.cleancube[:,50:100,50:100].sum(axis=(1,2))), 'r')

# identify a pixel in the dispersion axis that shows a residual feature in
# the original

# compare this to the zap dataproduct

# write the processed cube as a single extension FITS

# or merge the zap datacube into the original input datacube, replacing the
# data extension


2.0 (2017-09-08)

  • Compatibility with the WFM-AO mode.

  • Use Scikit-learn’s implementation for the PCA instead of the custom one. This solved an issue with spatial variations introduced by zap. Also it is much faster than the previous implementation. A drawback however is that it is no more possible to save the SVD file.

  • Use only one sky segment by default, which means that the cube is no more split on the wavelength axis. Originally zap used 11 segments, whose goals were to have coherent groups of sky emission lines, with a smaller number of eigenvalues per segment. And it also allowed to parallelize the computation. But, the segments were also responsible for continuum oscillations, and made the choice of the number of eigenvalues per segment very difficult and very sensitive. With only one segment the performance of the sky subtraction is much better, thanks to the higher correlation between sky lines on the whole wavelength range.

    So using only one segment allows to greatly reduce the risk of killing an emission line with the new PCA. It is still possible to modify the segments if needed:

    from zap.zap import SKYSEG
    SKYSEG[:] = [0, 5400, ..., 10000]
  • New continuum filter type with a polynomial fit (cftype='fit'). Must be used with care though, as the fit can easily go out of control in the red part of the spectrum.

  • Change the default median filter width to 300, for the median and weight continuum filters. The values used previously, 100 and 50, were too small which explains the background oscillations in the red part of the spectra.

  • New parameter (ncpu) to set the number of used CPU.

  • Remove the possibility to save the SVD result in a FITS file. This is because of the change of PCA implementation. It is still possible to pass the SVD computed by SVDoutput to process, passing the in-memory object directly without saving it to disk.

  • Speed improvements, mostly thanks to the new PCA implementation.

  • New parameter to save the explained variance curves.

1.0 (2016-04-02)

First public release. This is the version described in the 2016MNRAS.458.3210S paper.