Common Operations#

A group of common operations are available on all the core component types, i.e. the operations of extract/remove/insert/rename/copy on the .dimensions, .variables, .attributes and .groups properties of core objects.

Most of these are hopefully “obvious” Pythonic methods of the container objects.

Note

The special .avals property of NcData and NcVariable also provides key common operations associated with .attributes, notably rename and the del operator. But not those needing NcAttribute objects – so add and addall are not available.

Extract and Remove#

These are implemented as __delitem__() and pop() methods, which work in the usual way.

For Example:

>>> var_x = dataset.variables.pop("x")
>>> del data.variables["y"]

Insert / Add#

A new content (component) can be added under its own name with the add() method.

Example : dataset.variables.add(NcVariable("x", dimensions=["x"], data=my_data))

NameMap() component of the parent object. But it is more usual to add or set attributes using .avals rather than .attributes.

Example :

>>> dataset.variables["x"].avals["units"] = "m s-1"

There is also an addall() method, which adds multiple content objects in one operation.

>>> vars = [NcVariable(name) for name in ("a", "b", "c")]
>>> dataset.variables.addall(vars)
>>> list(dataset.variables)
['x', 'a', 'b', 'c']

Rename#

A component can be renamed with the rename() method. This changes both the name in the container and the component’s own name – it is not recommended ever to set component.name directly, as this obviously can become inconsistent.

Example :

>>> dataset.variables.rename("x", "y")

result:

>>> print(dataset.variables.get("x"))
None
>>> print(dataset.variables.get("y"))
<NcVariable(<no-dtype>): y()
    y:units = 'm s-1'
>

Warning

Renaming a dimension will not rename references to it (i.e. in variables), which obviously may cause problems. The utility function rename_dimension() is provided for this. See : Rename a dimension.

Copying#

All core objects support a .copy() method. See for instance ncdata.NcData.copy().

These however do not copy variable data arrays (either real or lazy), but produce new (copied) variables referencing the same arrays. So, for example:

>>> # Construct a simple test dataset
>>> import numpy as np
>>> from ncdata import NcData, NcDimension, NcVariable
>>> ds = NcData(
...     dimensions=[NcDimension('x', 12)],
...     variables=[NcVariable('vx', ['x'], np.ones(12))]
... )

>>> # Make a copy
>>> ds_copy = ds.copy()

>>> # The new dataset has a new matching variable with a matching data array
>>> # The variables are different ..
>>> ds_copy.variables['vx'] is ds.variables['vx']
False
>>> # ... but the arrays are THE SAME ARRAY
>>> ds_copy.variables['vx'].data is ds.variables['vx'].data
True

>>> # So changing one actually CHANGES THE OTHER ...
>>> ds.variables['vx'].data[6:] = 777
>>> ds_copy.variables['vx'].data
array([  1.,   1.,   1.,   1.,   1.,   1., 777., 777., 777., 777., 777.,
       777.])

If needed you can of course replace variable data with copies yourself, since you can freely assign to .data. For real data, this is just var.data = var.data.copy().

There is also a utility function ncdata.utils.ncdata_copy() : This is effectively the same thing as the NcData object copy() method.

Equality Testing#

We implement equality operations == / != for all the core data objects.

>>> vA = dataset.variables["a"]
>>> vB = dataset.variables["b"]
>>> vA == vB
False
>>> dataset == dataset.copy()
True

Warning

Equality testing for NcData and NcVariable actually calls the ncdata.utils.dataset_differences() and ncdata.utils.variable_differences() utilities.

This can be very costly if it needs to compare large data arrays.

If you need to avoid comparing large (and possibly lazy) arrays then you can use the ncdata.utils.dataset_differences() and ncdata.utils.variable_differences() utility functions directly instead. These provide a check_var_data=False option, to ignore differences in data content.

See: Dataset Equality Testing

Object Creation#

The constructors should allow reasonably readable inline creation of data. See here : Core Object Constructors

Ncdata is deliberately not very fussy about ‘correctness’, since it is not tied to an actual dataset which must “make sense”. see : Correctness and Consistency .

Hence, there is no great need to install things in the ‘right’ order (e.g. dimensions before variables which need them). You can create objects in one go, like this :

>>> data1 = NcData(
...     dimensions=[
...         NcDimension("y", 2),
...         NcDimension("x", 3),
...     ],
...     variables=[
...         NcVariable("y", dimensions=["y"], data=[0, 1]),
...         NcVariable("x", dimensions=["x"], data=[0, 1, 2]),
...         NcVariable("dd", dimensions=["y", "x"], data=[[0, 1, 2], [3, 4, 5]])
...     ]
... )
>>> data1
<ncdata._core.NcData object at ...>

or iteratively, like this :

>>> data2 = NcData()
>>> dims = [("y", 2), ("x", 3)]
>>> data2.variables.addall([
...     NcVariable(name, dimensions=[name], data=np.arange(length))
...     for name, length in dims
... ])
>>> data2.variables.add(
...     NcVariable("dd", dimensions=["y", "x"],
...     data=np.arange(6).reshape(2,3))
... )
>>> data2.dimensions.addall([NcDimension(name, length) for name, length in dims])
>>> data2
<ncdata._core.NcData object at ...>

Note : here, the variables were created before the dimensions. The result is the same:

>>> data1 == data2
True