ncdata.utils package#

General user utility functions.

ncdata.utils.rename_dimension(ncdata, name_from, name_to)#

Rename a dimension of an NcData.

This function calls ncdata.dimensions.rename, but then it also renames the dimension in all the variables which reference it, including those in sub-groups.

See: Rename

Parameters:

ncdata (NcData) – data with a top-level dimension to rename.
name_from (str) – existing name of dimension to rename.
name_to (str) – new name of dimension.

Return type:

None

Notes

The operation is in-place. To produce a new NcData with the renamed dimension, create a copy first with copy().
Unlike a simple rename(), this checks whether a dimension of the new name already exists, and if so raises an error.

ncdata.utils.dataset_differences(dataset_or_path_1, dataset_or_path_2, check_names=False, check_dims_order=True, check_dims_unlimited=True, check_vars_order=True, check_attrs_order=True, check_groups_order=True, check_var_data=True, show_n_first_different=2, suppress_warnings=False)#

Compare two netcdf datasets.

Accepts paths, pathstrings, open netCDF4.Datasets or NcData objects. File paths are opened with the netCDF4 module.

See: Equality Testing

Parameters:

dataset_or_path_1 (str or Path or netCDF4.Dataset or NcData) – First dataset to compare : either an open netCDF4.Dataset, a path to open one, or an NcData object.
dataset_or_path_2 (str or Path or netCDF4.Dataset or NcData) – Second dataset to compare : either an open netCDF4.Dataset, a path to open one, or an NcData object.
check_dims_order (bool, default True) – If False, no error results from the same dimensions appearing in a different order. However, unless suppress_warnings is True, the error string is issued as a warning.
check_vars_order (bool, default True) – If False, no error results from the same variables appearing in a different order. However unless suppress_warnings is True, the error string is issued as a warning.
check_attrs_order (bool, default True) – If False, no error results from the same attributes appearing in a different order. However unless suppress_warnings is True, the error string is issued as a warning.
check_groups_order (bool, default True) – If False, no error results from the same groups appearing in a different order. However unless suppress_warnings is True, the error string is issued as a warning.
check_names (bool, default False) – Whether to warn if the names of the top-level datasets are different
check_dims_unlimited (bool, default True) – Whether to compare the ‘unlimited’ status of dimensions
check_var_data (bool, default True) – If True, all variable data is also checked for equality. If False, only dtype and shape are compared. NOTE: comparison of arrays is done in-memory, so could be highly inefficient for large variable data.
show_n_first_different (int, default 2) – Number of value differences to display.
suppress_warnings (bool, default False) – When False (the default), report changes in content order as Warnings. When True, ignore changes in ordering. See also : Container ordering.

Returns:

errs – A list of “error” strings, describing differences between the inputs. If empty, no differences were found.

Return type:

list of str

Examples

>>> data = NcData(
...    name="a",
...    variables=[NcVariable("b", data=[1, 2, 3, 4])],
...    attributes={"a1": 4}
... )
>>> data2 = data.copy()
>>> data2.avals.update({"a1":3, "v":7})
>>> data2.variables["b"].data = np.array([1, 7, 3, 99])  # must be an array!
>>> print('\n'.join(dataset_differences(data, data2)))
Dataset attribute lists do not match: ['a1'] != ['a1', 'v']
Dataset "a1" attribute values differ : 4 != 3
Dataset variable "b" data contents differ, at 2 points: @INDICES[(1,), (3,)] : LHS=[2, 4], RHS=[7, 99]

See also

variable_differences()

ncdata.utils.variable_differences(v1, v2, check_attrs_order=True, check_var_data=True, show_n_first_different=2, suppress_warnings=False, _group_id_string=None)#

Compare variables.

See: Equality Testing

Parameters:

v1 (NcVariable) – variables to compare
v2 (NcVariable) – variables to compare
check_attrs_order (bool, default True) – If False, no error results from the same contents in a different order, however unless suppress_warnings is True, the error string is issued as a warning.
check_var_data (bool, default True) – If True, all variable data is also checked for equality. If False, only dtype and shape are compared. NOTE: comparison of large arrays is done in-memory, so may be highly inefficient.
show_n_first_different (int, default 2) – Number of value differences to display.
suppress_warnings (bool, default False) – When False (the default), report changes in content order as Warnings. When True, ignore changes in ordering entirely.
_group_id_string (str) – (internal use only)

Returns:

errs – A list of “error” strings, describing differences between the inputs. If empty, no differences were found.

Return type:

list of str

See also

dataset_differences()

ncdata.utils.index_by_dimensions(ncdata, **dim_index_kwargs)#

Index an NcData over named dimensions.

Parameters:

ncdata (NcData) – The input data.
dim_index_kwargs (Mapping[str, Any]) – Indexing to apply to named dimensions. E.G. index_by_dimensions(data, x=1), index_by_dimensions(data, time=slice(0, 100), levels=[1,2,5]).

Return type:

A new copy of ‘ncdata’, with dimensions and all relevant variables sub-indexed.

Examples

>>> data1 = index_by_dimensions(data, time=slice(0, 10))  # equivalent to [:10]
>>> data2 = index_by_dimensions(data, levels=[1,2,5])
>>> data3 = index_by_dimensions(data, time=3, levels=slice(2, 10, 3))

Notes

Where a dimension key is a single value, the dimension will be removed. This mimics how numpy arrays behave, i.e. the difference between a[1] and a[[1]] or a[1:2].
Supported types of index key are: a single number; a slice; a list of indices or booleans. A tuple, or one-dimensional array can also be used in place of a list.
Key types not supported are: Multi-dimensional arrays; Ellipsis; np.newaxis / None.
A Slicer provides the same functionality with a slicing syntax.

See also

Slicer

class ncdata.utils.Slicer#

Bases: object

An object which can index an NcData over its dimensions.

This wraps the index_by_dimensions() method for convenience, returning an object which supports the Python extended slicing syntax.

Examples

>>> subdata = Slicer(data, "time")[:3]

>>> ds = Slicer(data, 'levels', 'time')
>>> subdata_2 = ds[:10, :2]
>>> subdata_3 = ds[1, [1,2,4]]

>>> subdata_4 = Slicer(data)[:3, 1:4]

Notes

A Slicer contains the original ncdata and presents it in a “sliceable” form. Indexing it returns a new NcData, so the original data is unchanged. The Slicer is also unchanged and can be reused.
index_by_dimensions() provides the same functionality in a different form. See there for more exact details of the operation.

See also

index_by_dimensions()

__init__(ncdata, *dimension_names)#

Create an indexer for an NcData, applying to specific dimensions.

This can then be indexed to produce a derived (sub-indexed) dataset.

Parameters:

ncdata (NcData) – Input data to be sliced.
dimension_names (list[str]) – Dimension names to which successive index keys will be applied. If none are given, defaults to ncdata.dimensions.

ncdata#: data to be indexed.

dim_names#: dimensions to index, in order.

ncdata.utils.save_errors(ncdata)#

Scan a dataset for consistency and completeness.

See: Correctness and Consistency

Describe any aspects of this dataset which would prevent it from saving (cause an error). If there are any such problems, then an attempt to save the ncdata to a netcdf file will fail. If there are none, then a save should succeed.

Parameters:: ncdata (NcData) – data to check
Returns:: A list of strings, error messages describing problems with the dataset. If no errors, returns an empty list.
Return type:: errors

Notes

The checks made are roughly the following:

(1) check names in all components (dimensions, variables, attributes and groups):

all names are valid netcdf names
all element names match their key in the component, i.e. component[key].name == key

(2) check that all attribute values have netcdf-compatible dtypes.

( E.G. no object or compound (recarray) dtypes )

(3) check that, for all contained variables:

its dimensions are all present in the enclosing dataset
it has an attached data array, of a netcdf-compatible dtype
the shape of its data matches the lengths of its dimensions

ncdata.utils.ncdata_copy(ncdata)#

Return a copy of the data.

The operation makes fresh copies of all ncdata objects, but does not copy variable data arrays.

See: Copying

Parameters:: ncdata (NcData) – data to copy
Returns:: identical but distinct copy of input
Return type:: ncdata

Notes

This operation is now also available as an object method: copy().

Syntactically, this is generally more convenient, but the operation is identical.

For example:

>>> data1 = ncdata_copy(data)
>>> data2 = data.copy()
>>> data1 == data2
True