VirES Python Client Data Handling

Abstract: The VirES Python Client provides helpful functions to handle the retrieved data

# Display important package versions used
%load_ext watermark
%watermark -i -v -p viresclient,pandas,xarray,matplotlib
Python implementation: CPython
Python version       : 3.9.7
IPython version      : 8.0.1

viresclient: 0.11.0
pandas     : 1.4.1
xarray     : 0.21.1
matplotlib : 3.5.1

How to use the viresclient to find and retrieve Aeolus data has been described in the previous sections. This tutorial provides further insights on data manipulation options to help you further interact with the data.

What to do when data has been retrieved

Once we have retrieved the data with the get_between function, we have a data object (of the type ReturnData) which provides some great useful functions to convert and manipulate it to your preferred data type object. Lets first request some data so that we can further manipulate it afterwards:

# We import the AeolusRequest class from the viresclient
from viresclient import AeolusRequest
# We create a new AeolusRequest instance
request = AeolusRequest()
DATA_PRODUCT = "ALD_U_N_2A"
request.set_collection(DATA_PRODUCT)

# Fetch some example parameters, for example from two different field_types
request.set_fields(
    sca_fields=["SCA_extinction"],
    ica_fields=["ICA_extinction"],
)

# Retrieve the data
return_data = request.get_between(
    start_time="2020-04-10T06:21:58Z",
    end_time="2020-04-10T06:22:33Z",
    filetype="nc"
)

Additional information on response

The response data object has also a sources attribute that provides an array of tuples that describe from which products the returned data has been extracted. Each tuple contains 3 elements, which are filename, baseline and processor identifier. This information is also passed to the xarray Attributes as Sources.

# We can see the sources files from which the data was extracted
# by looking at the sources attribute
return_data.sources
[('AE_OPER_ALD_U_N_2A_20200410T062135020_005424001_009457_0004',
  '2A11',
  'ADM_L2aP/03.11')]

Convert data

Now that we have the return_data object which is a wrapper to the retrieved netCDF file we can use some conversion functions:

  • as_xarray: Returns an xarray object - groups are not possible in xarray so all parameters are flattened to the same level, will create issues when requesting multiple field_types where there are parameters with the same indicator identifier (naming conflicts)

  • as_xarray_dict: Return as dictionary object with field_type as key and xarray objects as value

  • as_dataframe: Returns a pandas dataframe object

Throughout the previous tutorials we have seen already some examples of this, but here are again the methods listed as overview.

# Conversion to xarray
return_data.as_xarray()
<xarray.Dataset>
Dimensions:         (ica_dim: 357, array_24: 24, sca_dim: 3)
Dimensions without coordinates: ica_dim, array_24, sca_dim
Data variables:
    ICA_extinction  (ica_dim, array_24) float64 -1e+06 0.0 ... -1e+06 -1e+06
    SCA_extinction  (sca_dim, array_24) float64 -1e+06 0.0 ... -1e+06 -1e+06
Attributes:
    Sources:  [('AE_OPER_ALD_U_N_2A_20200410T062135020_005424001_009457_0004'...
# Conversion to xarray dictionary keeping field types as separate xarrays
return_data.as_xarray_dict()
{'ica': <xarray.Dataset>
 Dimensions:         (ica_dim: 357, array_24: 24)
 Dimensions without coordinates: ica_dim, array_24
 Data variables:
     ICA_extinction  (ica_dim, array_24) float64 ...
 Attributes:
     Sources:  [('AE_OPER_ALD_U_N_2A_20200410T062135020_005424001_009457_0004'...,
 'sca': <xarray.Dataset>
 Dimensions:         (sca_dim: 3, array_24: 24)
 Dimensions without coordinates: sca_dim, array_24
 Data variables:
     SCA_extinction  (sca_dim, array_24) float64 ...
 Attributes:
     Sources:  [('AE_OPER_ALD_U_N_2A_20200410T062135020_005424001_009457_0004'...}
# Conversion to pandas dataframe
return_data.as_dataframe()
ICA_extinction SCA_extinction
ica_dim array_24 sca_dim
0 0 0 -1000000.0 -1000000.0
1 -1000000.0 -1000000.0
2 -1000000.0 -1000000.0
1 0 0.0 0.0
1 0.0 0.0
... ... ... ... ...
356 22 1 -1000000.0 -1000000.0
2 -1000000.0 -1000000.0
23 0 -1000000.0 -1000000.0
1 -1000000.0 -1000000.0
2 -1000000.0 -1000000.0

25704 rows × 2 columns

Save data

Depending on the complexity of your data retrieval you might want to save the response to your workspace instead of doing the data retrieval process each time you execute your notebook. You might also want to share or just save the resulting dataset from your query. To do that you can just use the to_file method provided by the ReturnedData object and specify the name to use for the file. The data will be saved as netCDF. If you want to overwrite the file if it already exists you can pass overwrite=True in the call.

return_data.to_file("retrieved_data.nc", overwrite=True)
Data written to retrieved_data.nc

Load saved data

If you want to access your saved data again you can use any library that can work with netCDF data, this is also helpful if you want to share your result dataset outside the VRE environment. If you want to work with the saved data inside the VRE environemnt you can use the get_from_file function provided by the AeolusRequest class. This will return a ReturnData object which will allow you continue working with the data as described above, i.e. allowing conversion to xarray or dataframe.

# We import the AeolusRequest class from the viresclient
from viresclient import AeolusRequest
# We create a new AeolusRequest instance
request = AeolusRequest()

data_object = request.get_from_file("retrieved_data.nc")
data_object.as_xarray()
<xarray.Dataset>
Dimensions:         (ica_dim: 357, array_24: 24, sca_dim: 3)
Dimensions without coordinates: ica_dim, array_24, sca_dim
Data variables:
    ICA_extinction  (ica_dim, array_24) float64 -1e+06 0.0 ... -1e+06 -1e+06
    SCA_extinction  (sca_dim, array_24) float64 -1e+06 0.0 ... -1e+06 -1e+06