Standard deviation and average over 1000 images with random null values

Standard deviation and average over 1000 images with random null values

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I have a large collection of raster files that I would like to calculate the standard deviation over. The first issue is that ArcGIS cannot process over 1000 rasters via cell statistics, so I decided to go another route by summing the squared values of all the rasters, and then getting the variance and standard deviation. The problem is that each raster may have different areas with values, and some areas with null.

If I were to try to script an iterative function that would sum the rasters, the output would be completely null since with a large quantities of these maps would eventually have null values everywhere.

I then tried using another program, Spirits ( While this software was able to provide an image like this:

The problem with this output is that the null values were converted to 0.

I am looking for a method to process a large time series of these images that ignore the null values for their calculations, and can process over 1000 images at once. Do you guys have any recommendations?

You can do this in very easily in R using the overlay function in the raster package.

For demonstration purposes I simulate a raster stack object containing all of the rasters. In a real analysis this object would be a pointer to the rasters on disk and read them in blocks to keep the problem memory safe.

library(raster) library(rgdal) r <- raster(ncols=100, nrows=100) r[] <- runif(ncell(r)) r <- stack(r) for(i in 1:6) { cat("layer",i,"
") r <- addLayer(r, r) r[[i]] <- runif(ncell(r[[i]])) }

You could create a raster stack, from rasters on disk, using the list.files function and a wildcard to read all rasters in a directory. In this example the "r" object would represent a stack of all tiff rasters in "C:/mydir".

r <- stack(list.files("C:/mydir", "tif$"))

To calculate the standard deviation you would use overlay and pass it the sd function with the na.rm = TRUE argument to remove nodata values. <- overlay(r, fun = sd, na.rm = TRUE)

Keep in mind that the mean and standard deviation assume a Gaussian distribution and with skewed distributions are no longer relevant moments. For a spatial dataset this large, skewed "at pixel" distributions are possible and, depending on the data, you could also fall into the "taking the mean of a mean" trap. In the case of needing to take the mean of a derived mean (eg., mean of monthly precipitation) you would use the harmonic mean and not the arithmetic mean.

For a distribution independent measure of variation, akin to standard deviation or variance, you could use the median absolute deviation from the median (MAD), and the median for the central tendency. The "mad" function in R adjusts the coefficient for asymptotically normal consistency.

r.mad <- overlay(r, fun = mad, na.rm = TRUE) r.median <- overlay(r, fun = median, na.rm = TRUE) plot(r.mad)

If you read the functions help, invoked using ?overlay, you will see that one of the arguments is "filename". If you specify this argument a raster will be written to disk. The format can be defined through additional arguments, specifying format, bit-type, etc… or just by the file extension (the easy way).

r.mad <- overlay(r, fun = mad, na.rm = TRUE, filename = "C:/mydir/raster_means.tif")

Here are some options.

  1. Set up a non-null counter (NNC). That is one additional image that holds the number of non-null values. Then you can just use the zero approach but use the NNC raster to divide / stats etc.
  2. The problem is inherently non-spatial so other tools like Matlab will do this task out of the box.
  3. You can actually do this in as well using the NNC method.

EDIT. So first create the binary rasters. In ArcGIS (I'd use a model to do all your rasters, actually I'd use GDAL… ) change all values to 1 and all nodata to 0. See image below, change the last value ("inputraster" to 1 in the argument. Then you can just do batches of sums in cell statistics (or just use gdal_calc) and the final value in each cell will be the total numbers of non-null values.

What you could do with the output from Spirits is to convert the 0 values to Null, either with SetNull or clip the raster to the raster using Clip (Data Management) and in that tool you can set the null value.


  1. Skippere

    Between us speaking, you should to try look in

  2. Bardarik

    I apologise, but, in my opinion, you commit an error. I can defend the position. Write to me in PM, we will talk.

  3. Nerg

    I think you are making a mistake. I propose to discuss it.

  4. Averill

    Excuse me for what I am aware of interfering ... this situation. Write here or in PM.

Write a message