CLI Tutorial#

The following tutorial is for CLI users, if you are using the library to write your own Python or R scripts then you’ll want to read Developers.

Installation#

We recommend using a conda environment just for CTDFjorder. To do this, open your terminal (MacOS/Linux) or command prompt (Windows) and run the following commands:

$ conda deactivate
$ conda create --name ctdfjorder python=3.12
$ conda activate ctdfjorder

Then install CTDFjorder using pip:

(ctdfjorder)$ pip install ctdfjorder

Run ctdcli#

Now we will process our files.

Tip

To see what options you have to process the files, type ctdcli default -h or view the documentation for the CLI.

For the purposes of this demo are assuming that you have the following:

  • Files with endings .rsk from an RBR instrument or .csv from a Castaway device.

  • A master sheet which will be used to attach metadata to the CTD tables. This must be named mastersheet.csv and be located in the same folder as your CTD data. Additionally it must have the following fields:
    • UNIQUE ID CODE

    • nominal longitude

    • nominal latitude

    • CTD cast file name

    • location

    • loc id

    • date/time (ISO)

    • sechhi depth

  • Access to a public MapBox token.

If you meet those conditions make your terminal window fullscreen. Then copy and paste the following into your terminal, and replace MY_TOKEN with your public MapBox token. Members of FjordPhyto can use this token pk.eyJ1Ijoibmlrb3Rob21hcyIsImEiOiJjbHl2Z2JzbDQxZjEwMmpwd2c1cnJpYmRyIn0.j9l0EXWa2ik51AbAcIe5HQ

Tip

Add plotting by including -p in the command, like so ctdcli default -r -p --token MY_TOKEN

(ctdfjorder) $ ctdcli default -r -m mastersheet.csv --token MY_TOKEN

Here we are telling CTDFjorder the following:

  • -r Reset our file environment (delete old plots and remake folders)

  • -m The location of our mastersheet

  • --token Our token to interact with MapBox and generate our map.

Interpret output#

If you see a spinning globe you did it! Once the files are done processing a table will print with pipeline information for each file. Green means the file passed a step and red means an error occurred such that the file could not continue to be processed. Once all files are completed, a map will open as well. The points are individual casts. The map can be filtered.

  • If you used the -p option then plots are in the ctdplots folder next to our original data and were made with functions from the Visualize module.

  • There you will also find a ctdfjorder_data.csv with our processed data.

  • To investigate files that did not pass the pipeline open the ctdfjorder.log file.

Steps#

These are the functions we ran through the CLI on each file in this tutorial:

data = CTD(file)
data.expand_date(day=False)
data.remove_upcasts()
data.remove_non_positive_samples()
data.filter_columns_by_range(column='salinity', upper_bound=None, lower_bound=10)
data.add_metadata(master_sheet_path='mastersheet.csv')
data.clean(method='clean_salinity_ai')
data.add_surface_salinity()
data.add_surface_temperature()
data.add_meltwater_fraction()
data.add_absolute_salinity()
data.add_density()
data.add_potential_density()
data.add_n_squared()
data.add_mld_bf()
data.add_profile_classification()

Congrats! You can now use CTDFjorder to investigate your ctd data. For more in depth information on the processes executed here, read the API.

CLI Commands#

CTDFjorder

usage: sample [-h] {default} ...

Positional Arguments#

command

Possible choices: default

Sub-commands#

default#

Run the default processing pipeline

sample default [-h] [-p] [-v] [-q] [-r] [-s] [-d] [-m MASTERSHEET]
               [-w [WORKERS]] [--token TOKEN] [-o OUTPUT]
               [--filter-columns [{filename,unique_id,profile_id,site_id,site_name,timestamp,year,month,day,latitude,longitude,depth,pressure,sea_pressure,p_mid,temperature,conservative_temperature,salinity,salinity_abs,density,potential_density,surface_temperature,surface_salinity,surface_density,meltwater_fraction_eq_10,meltwater_fraction_eq_11,brunt_vaisala_frequency_squared,profile_type,conductivity,specific_conductivity,speed_of_sound,oxygen_concentration,oxygen_saturation,ph,alkalinity,nitrate,phosphate,silicate,ammonium,particulate_organic_carbon,total_organic_carbon,particulate_inorganic_carbon,dissolved_inorganic_carbon,secchi_depth,turbidity,chlorophyll,chlorophyll_fluorescence,par,orp} ...]]
               [--filter-upper [FILTER_UPPER ...]]
               [--filter-lower [FILTER_LOWER ...]]

Named Arguments#

-p, --plot

Generate plots

Default: False

-v, --verbose

Verbose logger output to ctdfjorder.log (repeat for increased verbosity)

Default: 3

-q, --quiet

Quiet output (show errors only)

Default: 0

-r, --reset

Reset file environment

Default: False

-s, --show-status

Show processing status and pipeline status

Default: False

-d, --debug-run

Run 20 files for testing

Default: False

-m, --mastersheet

Path to mastersheet

-w, --workers

Max workers

--token

MapBox token to enable interactive map plot

-o, --output

Output file path

Default: 'ctdfjorder_data.csv'

--filter-columns

Possible choices: filename, unique_id, profile_id, site_id, site_name, timestamp, year, month, day, latitude, longitude, depth, pressure, sea_pressure, p_mid, temperature, conservative_temperature, salinity, salinity_abs, density, potential_density, surface_temperature, surface_salinity, surface_density, meltwater_fraction_eq_10, meltwater_fraction_eq_11, brunt_vaisala_frequency_squared, profile_type, conductivity, specific_conductivity, speed_of_sound, oxygen_concentration, oxygen_saturation, ph, alkalinity, nitrate, phosphate, silicate, ammonium, particulate_organic_carbon, total_organic_carbon, particulate_inorganic_carbon, dissolved_inorganic_carbon, secchi_depth, turbidity, chlorophyll, chlorophyll_fluorescence, par, orp

List of columns to filter

--filter-upper

Upper bounds for the filtered columns

--filter-lower

Lower bounds for the filtered columns