ctdfjorder.CTD.CTD.clean(self, method) None#

Applies data cleaning methods to the specified feature using the selected method. Supports cleaning practical salinity using the ‘clean_salinity_ai’ method.

Parameters:

method (str, default 'clean_salinity_ai') – The cleaning method to apply. Currently, only ‘clean_salinity_ai’ is supported, which uses a GRU-based machine learning model to clean the salinity values.

Raises:
  • NoSamplesError – When the function is called on a CTD object with no data.

  • ValueError – When the cleaning method is invalid.

Notes

The ‘clean_salinity_ai’ method uses a Gated Recurrent Unit (GRU) model to correct salinity measurements. This model is designed to smooth out unrealistic fluctuations in salinity with respect to pressure.

The procedure for ‘clean_salinity_ai’ is as follows:

  1. Bin the data every 0.5 dbar of pressure.

  2. Use a GRU model with a loss function that penalizes non-monotonic salinity increases with decreasing pressure.

  3. Train the model on the binned data to predict clean salinity values.

  4. Replace the original salinity values in the profile with the predicted clean values.

  5. Reintegration of the cleaned profile into the main dataset.

The loss function \(( L )\) used in training is defined as:

\[L = \text{MAE}(y_{\text{true}}, y_{\text{pred}}) + \lambda \cdot \text{mean}(P)\]

where \(( P )\) are penalties for predicted salinity increases with decreasing pressure, and \(( \lambda )\) is a weighting factor.

Examples

ctd_data = CTD('example.csv')
ctd_data.clean('clean_salinity_ai')
# This will clean the salinity data using the AI-based method, correcting any unrealistic
# values and ensuring smoother transitions with respect to pressure.

See also

_AI.clean_salinity_ai

Method used to clean salinity with ‘clean_salinity_ai’ option.