Welcome to ZAP’s documentation!

Tired of sky subtraction residuals? ZAP them!

ZAP (the Zurich Atmosphere Purge) is a high precision sky subtraction tool which can be used as complete sky subtraction solution, or as an enhancement to previously sky-subtracted MUSE integral field spectroscopic data. The method uses PCA to isolate the residual sky subtraction features and remove them from the observed datacube. Though the operation of ZAP is not dependent on perfect flat-fielding of the data in a MUSE exposure, better results are obtained when these corrections are made ahead of time. Future development will include expansion to more instruments.


Version 2.0 is compatible with the WFM-AO mode, and also brings major improvements for the sky subtraction. Check below the details in the Changelog section as well as the dicussion on the Optimal number of eigenvectors.

Version 2.1 brings compatibility with the NFM-AO mode.

The paper describing the original method can be found here: http://adsabs.harvard.edu/abs/2016MNRAS.458.3210S

Please cite ZAP as:

\bibitem[Soto et al.(2016)]{2016MNRAS.458.3210S}
    Soto, K.~T., Lilly, S.~J., Bacon, R., Richard, J., \& Conseil, S.
    2016, \mnras, 458, 3210


ZAP requires the following packages:

  • Numpy
  • Astropy
  • SciPy (0.18.1 or later is recommend as a SVD convergence issue was found with an older version)
  • Scikit-learn

Many linear algebra operations are performed in ZAP, so it can be beneficial to use an alternative BLAS package. In the Anaconda distribution, the default BLAS comes with Numpy linked to MKL, which can amount to a 20% speedup of ZAP.

The last stable release of ZAP can be installed simply with pip:

pip install zap

Or into the user path with:

pip install --user zap


In its most hands-off form, ZAP can take an input FITS datacube, operate on it, and output a final FITS datacube. The main function to do this is zap.process:

import zap
zap.process('INPUT.fits', outcubefits='OUTPUT.fits')

Care should be taken, however, since this case assumes a sparse field, and better results can be obtained by applying masks.

There are a number of options that can be passed to the code which we describe below.

Sparse Field Case

This case specifically refers to the case where the sky can be measured in the sky frame itself, using:

zap.process('INPUT.fits', outcubefits='OUTPUT.fits')

In both cases, the code will create a resulting processed datacube named DATACUBE_ZAP.fits in the current directory. While this can work well in the case of very faint sources, masks can improve the results.

For the sparse field case, a mask file can be included, which is a 2D FITS image matching the spatial dimensions of the input datacube. Masks are defined to be >= 1 on astronomical sources and 0 at the position of the sky. Set this parameter with the mask keyword

zap.process('INPUT.fits', outcubefits='OUTPUT.fits', mask='mask.fits')

Filled Field Case

This approach also can address the saturated field case and is robust in the case of strong emission lines, in this case the input is an offset sky observation. To achieve this, we calculate the SVD on an external sky frame using the function zap.SVDoutput.

An example of running the code in this way is as follows:

extSVD = zap.SVDoutput('Offset_Field_CUBE.fits', mask='mask.fits')
zap.process('Source_cube.fits', outcubefits='OUTPUT.fits', extSVD=extSVD)

The integration time of this frame does not need to be the same as the object exposure, but rather just a 2-3 minute exposure.

Optimal number of eigenvectors

The major difficulty to get a high quality sky subtraction is to find the optimal number of eigenvalues to use. ZAP provides an automated way for this, trying to find the inflexion point of the variance curve. This is one way to do it, but there is no right answer to this issue. A higher number of eigenvalues used for the reconstruction will give a better sky subtraction, but with the risk of subtracting signal from strong emission lines.

The first thing one can do to optimize the PCA quality is to use a good mask, to avoid incorporating signal from astronomical sources in the eigenvectors. Then it is highly recommended to have a look at the explained variance curves (which can be saved with the varcurvefits parameter) and the selected number of eigenvalues (saved in the FITS headers in ZAPNEV*). It is also possible to use the interactive mode (see below) to try different number of eigenvectors. This number can be specified manually with the neval parameter.

Strong values at edges

Because of atmospheric refraction the cube edges are different depending on the wavelength, which means that the spectra at the edges contain many NaN values. ZAP filters out these spaxels (when the spectra have more than 25% of NaN values), because it cannot process incomplete spectra. So these spectra are not sky-subtracted and appear with a stronger flux in the output cube or image.

The zap.mask_nan_edges function allows to mask these spectra, detecting the ones with too many NaNs, and replacing them with NaNs.

Command Line Interface

ZAP can also be used from the command line:

python -m zap INPUT_CUBE.fits

More information use of the command line interface can be found with the command

python -m zap -h

Interactive mode

ZAP can be used interactively from the Python console:

import zap
zobj = zap.process('INPUT.fits', interactive=True)

The run method operates on the datacube, and retains all of the data and methods necessary to process a final data cube in a Python class named Zap. You can elect to investigate the data product via the Zap object, and even reprocess the cube with a different number of eigenspectra per region. A workflow may go as follows:

import zap
from matplotlib import pyplot as plt

# allow ZAP to run the optimize routine
zobj = zap.process('INPUT.fits', interactive=True)

# plot the variance curves and the selection of the number of eigenspectra used

# plot a spectrum extracted from the original cube
plt.plot(zobj.cube[:,50:100,50:100].sum(axis=(1,2)), 'b', alpha=0.3)

# plot a spectrum of the cleaned ZAP dataproduct
plt.plot(zobj.cleancube[:,50:100,50:100].sum(axis=(1,2)), 'g')

# choose just the first 3 spectra for all segments

# plot a spectrum extracted from the original cube
plt.plot(zobj.cube[:,50:100,50:100].sum(axis=(1,2)), 'b', alpha=0.3)

# plot a spectrum of the cleaned ZAP dataproduct
plt.plot(zobj.cleancube[:,50:100,50:100].sum(axis=(1,2))), 'g')

# choose some number of modes by hand

# plot a spectrum
plt.plot(zobj.cleancube[:,50:100,50:100].sum(axis=(1,2))), 'k')

# Use the optimization algorithm to identify the best number of modes per segment

# compare to the previous versions
plt.plot(zobj.cleancube[:,50:100,50:100].sum(axis=(1,2))), 'r')

# identify a pixel in the dispersion axis that shows a residual feature in
# the original

# compare this to the zap dataproduct

# write the processed cube as a single extension FITS

# or merge the zap datacube into the original input datacube, replacing the
# data extension


2.1 (2019-07-03)

  • Zap now requires Python 3.5 or later, and Astropy 2.0 or later.
  • Compatibility with the NFM-AO mode.
  • Ignore the notch filter region in the ‘median’ and ‘fit’ methods for the continuum filter (cftype).
  • Change the default continuum filter method (cftype) to ‘median’ and remove the ‘weight’ method. The previous default method (‘weight’) has some problems with the treatment of spectra edges and with the notch filter edges for the AO modes, and is also introducing bumps in the red wavelengths.
  • Add a command-line argument for nevals.
  • Scipy 0.18.1 or later is required, because of SVD convergence issue with previous versions.
  • Add a function to mask the cube edges (zap.mask_nan_edges): the spaxels at the edges of the cube may have a lot of NaNs in their spectra, so ZAP does not subtract the sky on these which leaves high residuals.

2.0 (2017-09-08)

  • Compatibility with the WFM-AO mode.

  • Use Scikit-learn’s implementation for the PCA instead of the custom one. This solved an issue with spatial variations introduced by zap. Also it is much faster than the previous implementation. A drawback however is that it is no more possible to save the SVD file.

  • Use only one sky segment by default, which means that the cube is no more split on the wavelength axis. Originally zap used 11 segments, whose goals were to have coherent groups of sky emission lines, with a smaller number of eigenvalues per segment. And it also allowed to parallelize the computation. But, the segments were also responsible for continuum oscillations, and made the choice of the number of eigenvalues per segment very difficult and very sensitive. With only one segment the performance of the sky subtraction is much better, thanks to the higher correlation between sky lines on the whole wavelength range.

    So using only one segment allows to greatly reduce the risk of killing an emission line with the new PCA. It is still possible to modify the segments if needed:

    from zap.zap import SKYSEG
    SKYSEG[:] = [0, 5400, ..., 10000]
  • New continuum filter type with a polynomial fit (cftype='fit'). Must be used with care though, as the fit can easily go out of control in the red part of the spectrum.

  • Change the default median filter width to 300, for the median and weight continuum filters. The values used previously, 100 and 50, were too small which explains the background oscillations in the red part of the spectra.

  • New parameter (ncpu) to set the number of used CPU.

  • Remove the possibility to save the SVD result in a FITS file. This is because of the change of PCA implementation. It is still possible to pass the SVD computed by SVDoutput to process, passing the in-memory object directly without saving it to disk.

  • Speed improvements, mostly thanks to the new PCA implementation.

  • New parameter to save the explained variance curves.

1.0 (2016-04-02)

First public release. This is the version described in the 2016MNRAS.458.3210S paper.


zap.process(cubefits, outcubefits='DATACUBE_ZAP.fits', clean=True, zlevel='median', cftype='median', cfwidthSVD=300, cfwidthSP=300, nevals=[], extSVD=None, skycubefits=None, mask=None, interactive=False, ncpu=None, pca_class=None, n_components=None, overwrite=False, varcurvefits=None)

Performs the entire ZAP sky subtraction algorithm.

This is the main ZAP function. It works on an input FITS file and optionally writes the product to an output FITS file.

  • cubefits (str) – Input FITS file, containing a cube with data in the first extension.
  • outcubefits (str) – Output FITS file, based on the input one to propagate all header information and other extensions. Default to DATACUBE_ZAP.fits.
  • clean (bool) – If True (default value), the NaN values are cleaned. Spaxels with more then 25% of NaN values are removed, the others are replaced with an interpolation from the neighbors. The NaN values are reinserted into the final datacube. If set to False, any spaxel with a NaN value will be ignored.
  • zlevel (str) – Method for the zeroth order sky removal: none, sigclip or median (default).
  • cftype ({'median', 'fit', 'none'}) – Method for the continuum filter.
  • cfwidthSVD (int or float) – Window size for the continuum filter, for the SVD computation. Default to 300.
  • cfwidthSP (int or float) – Window size for the continuum filter used to remove the continuum features for calculating the eigenvalues per spectrum. Smaller values better trace the sources. An optimal range of is typically 20 - 50 pixels. Default to 300.
  • nevals (list) – Allow to specify the number of eigenspectra used for each segment. Provide either a single value that will be used for all of the segments, or a list of 11 values that will be used for each of the segments.
  • extSVD (Zap object) – Can be a Zap object output from SVDoutput(). If given, the SVD from this object will be used, otherwise the SVD is computed. So this allows to compute the SVD on an other field or with different settings.
  • skycubefits (str) – Path for the optional output of the sky that is subtracted from the cube. This is simply the input cube minus the output cube.
  • mask (str) – A 2D fits image to exclude regions that may contaminate the zlevel or eigenspectra. This image should be constructed from the datacube itself to match the dimensionality. Sky regions should be marked as 0, and astronomical sources should be identified with an integer greater than or equal to 1. Default to None.
  • interactive (bool) – If True, a Zap object containing all information on the ZAP process is returned, and can be used to explore the eigenspectra and recompute the output (with the reprocess() method). In this case, the output files are not saved (outcubefits and skycubefits are ignored). Default to False.
  • varcurvefits (str) – Path for the optional output of the explained variance curves.
zap.SVDoutput(cubefits, clean=True, zlevel='median', cftype='median', cfwidth=300, mask=None, ncpu=None, pca_class=None, n_components=None)

Performs the SVD decomposition of a datacube.

This allows to use the SVD for a different datacube. It used to allow to save the SVD to a file but this is no more possible. Instead it returns a Zap which can be given to the process() function.

  • cubefits (str) – Input FITS file, containing a cube with data in the first extension.
  • clean (bool) – If True (default value), the NaN values are cleaned. Spaxels with more then 25% of NaN values are removed, the others are replaced with an interpolation from the neighbors.
  • zlevel (str) – Method for the zeroth order sky removal: none, sigclip or median (default).
  • cftype ({'median', 'fit', 'none'}) – Method for the continuum filter.
  • cfwidth (int or float) – Window size for the continuum filter, default to 300.
  • mask (str) – Path of a FITS file containing a mask (1 for objects, 0 for sky).
zap.nancleanfits(cubefits, outfn='NANCLEAN_CUBE.fits', rejectratio=0.25, boxsz=1, overwrite=False)

Interpolates NaN values from the nearest neighbors.

  • cubefits (str) – Input FITS file, containing a cube with data in the first extension.
  • outfn (str) – Output FITS file. Default to NANCLEAN_CUBE.fits.
  • rejectratio (float) – Defines a cutoff for the ratio of NAN to total pixels in a spaxel before the spaxel is avoided completely. Default to 0.25
  • boxsz (int) – Defines the number of pixels around the offending NaN pixel. Default to 1, which looks for the 26 nearest neighbors which is a 3x3x3 cube.
zap.contsubfits(cubefits, outfits='CONTSUB_CUBE.fits', ncpu=None, cftype='median', cfwidth=300, clean_nan=True, zlevel='median', overwrite=False)

A standalone implementation of the continuum removal.

zap.mask_nan_edges(cube, outfile=None, plot=False, threshold=50, extname='DATA')

Mask the edges of a cube, using the number of nans in a spaxel.

At the edges of MUSE cubes, spaxels can contain many NaNs in the spectral axis. Because of this, ZAP will not subtract the sky for these spaxels, which will lead to spaxels with higher flux. This function allows to mask these spaxels, using a threshold on the percentage of NaN values in the spectral axis.

  • cube (ndarray or str) – Cube data or FITS filename.
  • outfile (str) – Path of the output (masked) cube. This can only be used if the input cube is a filename, as the DATA extension will be replaced by the masked cube.
  • plot (bool) – Show images of the different steps (default: False).
  • threshold (float) – Percentage of NaNs above which the spaxel will be masked.
  • extname (str) – Extension name for the data, defaults to DATA.

mask, cube – The computed mask and the masked cube.

Return type:

ndarray, ndarray

class zap.Zap(cubefits, pca_class=None, n_components=None)

Main class to run each of the steps of ZAP.


The final datacube after removing all of the residual features.


A 2D array containing the subtracted continuum per spaxel.


The original cube with the zlevel subtraction performed per spaxel.


A 1d array containing the wavelength solution generated from the header parameters.


WCS object with the wavelength solution.


A list of the wavelength bin limits used in segmenting the sepctrum for SVD.


A 3d boolean datacube containing True in voxels where a NaN value was replaced with an interpolation.


A 1d array containing the number of eigenvalues used per segment to reconstruct the residuals.


A normalized version of the datacube decunstructed into a 2d array.


The pixel indices of the bounding regions for each spectral segment.


A 2d array containing the reconstructed emission line residuals.


Boolean that indicates that the NaN cleaning method was used.


Boolean indicating that the zero level correction was used.


The datacube deconstructed into a 2d array for use in the the SVD.


A list of length nsegments containing variances calculated per spaxel used for normalization


The position in the cube of the spaxels that are in the 2d deconstructed stack


A 1d array containing the result of the zero level subtraction


Choose the number of eigenspectra/evals to use for reconstruction.

User supplies the number of eigen spectra to be used (neval) or the percentage of the eigenspectra that were calculated (peval) from each spectral segment to be used.

The user can either provide a single value to be used for all segments, or provide an array that defines neval or peval per segment.


Remold the continuum array so it can be investigated.

Takes the continuum stack and returns it into a familiar cube form.

make_cube_from_stack(stack, with_nans=False)

Stuff the stack back into a cube.

mergefits(outcubefits, overwrite=False)

Merge the ZAP cube into the full muse datacube and write.


Compute the optimal number of components needed to characterize the residuals.

This function calculates the variance per segment with an increasing number of eigenspectra/eigenvalues. It then deterimines the point at which the second derivative of this variance curve reaches zero. When this occurs, the linear reduction in variance is attributable to the removal of astronomical features rather than emission line residuals.


Reconstruct the residuals from a given set of eigenspectra and eigenvalues


Subtracts the reconstructed residuals and places the cleaned spectra into the duplicated datacube.


A method that redoes the eigenvalue selection, reconstruction, and remolding of the data.

writecube(outcubefits='DATACUBE_ZAP.fits', overwrite=False)

Write the processed datacube to an individual fits file.

writeskycube(skycubefits='SKYCUBE_ZAP.fits', overwrite=False)

Write the processed datacube to an individual fits file.

writevarcurve(varcurvefits='VARCURVE_ZAP.fits', overwrite=False)

Write the explained variance curves to an individual fits file.