Objective

We are using genetic algorithms, a type of stochastic optimization, to calibrate the following model coefficients:

  • ..surv_adult_enroute_int used in submodel surv_adult_enroute
  • ..surv_adult_prespawn_int used in submodel surv_adult_prespawn
  • ..surv_egg_to_fry_int used in submodel surv_egg_to_fry
  • ..surv_juv_rear_int used in submodel surv_juv_rear
  • ..surv_juv_rear_contact_points used in submodel surv_juv_rear
  • ..surv_juv_rear_prop_diversions used in submodel surv_juv_rear
  • ..surv_juv_rear_total_diversions used in submodel surv_juv_rear
  • ..surv_juv_bypass_int used in submodel surv_juv_bypass
  • ..surv_juv_delta_int used in submodel surv_juv_delta
  • ..surv_juv_delta_contact_points used in submodel surv_juv_delta
  • ..surv_juv_delta_total_diverted used in submodel surv_juv_delta
  • ..surv_juv_outmigration_sj_int used in submodel surv_juv_outmigration_san_joaquin
  • ..ocean_entry_success_int used in submodel ocean_entry_success

A detailed description for each of these parameters can be found in the Spring Run Documentation Website

We are restricting the calibration to watersheds with Grandtab data, these are:

watershed
Antelope Creek
Battle Creek
Butte Creek
Clear Creek
Deer Creek
Mill Creek
Feather River
Yuba River

Packages

We are using the GA R package to implement our genetic algorithms search to calibrate the above intercepts for the springRunDSM::spring_run_model() function of the springRunDSM R package.

Additionally, we are using DSMCalibrationData to access calibration related data.

Calibrating Data

GrandTab

We will be minimizing the error between GrandTab estimated escapement data for the years 1998-2017 to natural spawners predicted by the DSM. We are further scaling this error by a weight representing the amount of observed data available for a given watershed. Given that GrandTab includes both hatchery and natural spawners, we will scale GrandTab escapement values using an estimated proportion natural spawners for each watershed. This will allow us to compare predicted natural spawners to psuedo GrandTab natural spawners.

For calibration, we have prepared the GrandTab data in two ways:

  1. DSMCalibrationData::grandtab_observed - This data will be used in the fitness function to measure the difference between model predictions and observed escapement. Missing values are NA and we made all records less than 100 NA to account for a lack of confidence for counts less than 100.

  2. DSMCalibrationData::grandtab_imputed - This data will be used to calculate the number of juveniles during the 20 year simulation. The GrandTab data is incomplete for many watersheds during the 20 year period of calibration. For watersheds with no GrandTab data, we used 40 as the default escapement value. For watersheds with incomplete data, we used the mean escapement value.

Observed Values

Imputed Values

Selected Proxy Years

The data inputs to the DSM are for years 1980-1999. We selected proxy years for 1998-2017 from the 1980-1999 model inputs by comparing the DWR water year indices in this analysis.

These inputs are generated by calling DSMCalibrationData::set_synth_years(params), with a params object from the life cycle model ex: RunDSM::params. See the documentation for more details on the DSMCalibrationData R package.

Spawning Inputs Proxy Years

Calibration Year Selected Proxy Year
1997 1997
1998 1998
1999 1999
2000 2000
2001 1987
2002 1985
2003 2000
2004 1985
2005 1980
2006 1995
2007 1989
2008 1994
2009 1981
2010 1985
2011 1997
2012 1985
2013 1987
2014 1992
2015 1992
2016 1985
2017 1998

Rearing Inputs Proxy Years

Calibration Year Selected Proxy Year
1998 1998
1999 1999
2000 2000
2001 1987
2002 1985
2003 2000
2004 1985
2005 1980
2006 1995
2007 1989
2008 1994
2009 1981
2010 1985
2011 1997
2012 1985
2013 1987
2014 1992
2015 1992
2016 1985
2017 1998

Fitness Function

We are minimizing the sum of squared errors of simulated natural adults and grandtab adults scaled using simulated proportion natural, normalized by mean escapement. The purpose of normalizing is to prevent watersheds with large escapement values from dominating the calibration. In addition to normalizing we are adding weights to each error term corresponding to the amount data available at that watershed. Doing so will penalize watersheds with more data whose error term is large, but be less aggressive with watersheds with less data.

All intercepts and watersheds that need calibration are described in the Objective section of this document.

Notes:

  1. ..surv_adult_enroute_int and ..ocean_entry_success_int are vectors and expanded within the fitness function arguments.

  2. R integer data type is a 32 bit signed integer which can only count to ~2 billion. During calibration, some estimates of juveniles and adults will exceed this limit and cause errors. We run the prediction within the fitness function inside a tryCatch block. If an integer is produced that is too large we return a large loss value, this will ensure that this iteration will not be selected for future generations.

# Fitness Function ------------------
spring_run_fitness <- function(
  known_adults,
  seeds,
  params,
  surv_adult_enroute_int,
  surv_adult_prespawn_int,
  surv_egg_to_fry_int,
  surv_juv_rear_int_default,
  surv_juv_rear_int_battle_clear,
  surv_juv_rear_int_butte,
  surv_juv_rear_int_deer,
  surv_juv_rear_int_mill,
  surv_juv_rear_int_sac,
  surv_juv_rear_int_feather,
  surv_juv_rear_int_yuba,
  surv_juv_rear_int_san_joaq,
  surv_juv_rear_contact_points,
  surv_juv_rear_prop_diversions,
  surv_juv_rear_total_diversions,
  surv_juv_bypass_int,
  surv_juv_delta_int,
  surv_juv_delta_contact_points,
  surv_juv_delta_total_diverted,
  surv_juv_outmigration_sj_int,
  ocean_entry_success_int_default,
  ocean_entry_success_int_battle_clear,
  ocean_entry_success_int_butte,
  ocean_entry_success_int_deer,
  ocean_entry_success_int_mill,
  ocean_entry_success_int_bear_feather,
  ocean_entry_success_int_yuba
) {
  params_init <- params
  
  params_init$..surv_adult_enroute_int = surv_adult_enroute_int
  params_init$..surv_adult_prespawn_int = surv_adult_prespawn_int # they hard coded from fall run
  params_init$..surv_egg_to_fry_int = surv_egg_to_fry_int  # they hard coded from fall run
  
  # Juvenile rearing survival coefficients and variables
  params_init$..surv_juv_rear_int = c(`Upper Sacramento River` = surv_juv_rear_int_default, 
                                      `Antelope Creek` = surv_juv_rear_int_default, 
                                      `Battle Creek` = surv_juv_rear_int_battle_clear,
                                      `Bear Creek` = surv_juv_rear_int_default, 
                                      `Big Chico Creek` = surv_juv_rear_int_default, 
                                      `Butte Creek` = surv_juv_rear_int_butte,
                                      `Clear Creek` =   surv_juv_rear_int_battle_clear, 
                                      `Cottonwood Creek` = surv_juv_rear_int_default, 
                                      `Cow Creek` = surv_juv_rear_int_default,
                                      `Deer Creek` = surv_juv_rear_int_deer, 
                                      `Elder Creek` = surv_juv_rear_int_default, 
                                      `Mill Creek` = surv_juv_rear_int_mill,
                                      `Paynes Creek` = surv_juv_rear_int_default, 
                                      `Stony Creek` = surv_juv_rear_int_default, 
                                      `Thomes Creek` = surv_juv_rear_int_default,
                                      `Upper-mid Sacramento River` = surv_juv_rear_int_sac, 
                                      `Sutter Bypass` = surv_juv_rear_int_default,
                                      `Bear River` = surv_juv_rear_int_default, 
                                      `Feather River` = surv_juv_rear_int_feather, 
                                      `Yuba River` = surv_juv_rear_int_yuba,
                                      `Lower-mid Sacramento River` =    surv_juv_rear_int_sac, 
                                      `Yolo Bypass` = surv_juv_rear_int_default, 
                                      `American River` = surv_juv_rear_int_default,
                                      `Lower Sacramento River` = surv_juv_rear_int_sac, 
                                      `Calaveras River` = surv_juv_rear_int_default, 
                                      `Cosumnes River` = surv_juv_rear_int_default,
                                      `Mokelumne River` = surv_juv_rear_int_default, 
                                      `Merced River` = surv_juv_rear_int_default, 
                                      `Stanislaus River` = surv_juv_rear_int_default,
                                      `Tuolumne River` = surv_juv_rear_int_default, 
                                      `San Joaquin River` = surv_juv_rear_int_san_joaq)
  
  params_init$..surv_juv_rear_contact_points = surv_juv_rear_contact_points
  params_init$..surv_juv_rear_prop_diversions = surv_juv_rear_prop_diversions
  params_init$..surv_juv_rear_total_diversions = surv_juv_rear_total_diversions
  params_init$..surv_juv_bypass_int = surv_juv_bypass_int
  params_init$..surv_juv_delta_int = surv_juv_delta_int
  params_init$..surv_juv_delta_contact_points = surv_juv_delta_contact_points
  params_init$..surv_juv_delta_total_diverted = surv_juv_delta_total_diverted
  params_init$..surv_juv_outmigration_sj_int = surv_juv_outmigration_sj_int
  
  # Ocean entry success coefficient and variable
  params_init$..ocean_entry_success_int = c(
    `Upper Sacramento River` = ocean_entry_success_int_default,
    `Antelope Creek` = ocean_entry_success_int_default,
    `Battle Creek` = ocean_entry_success_int_battle_clear,
    `Bear Creek` = ocean_entry_success_int_default,
    `Big Chico Creek` = ocean_entry_success_int_default,
    `Butte Creek` = ocean_entry_success_int_butte,
    `Clear Creek` = ocean_entry_success_int_battle_clear,
    `Cottonwood Creek` = ocean_entry_success_int_default,
    `Cow Creek` = ocean_entry_success_int_default,
    `Deer Creek` = ocean_entry_success_int_deer,
    `Elder Creek` = ocean_entry_success_int_default,
    `Mill Creek` = ocean_entry_success_int_mill,
    `Paynes Creek` = ocean_entry_success_int_default,
    `Stony Creek` = ocean_entry_success_int_default,
    `Thomes Creek` = ocean_entry_success_int_default,
    `Upper-mid Sacramento River` = ocean_entry_success_int_default,
    `Sutter Bypass` = ocean_entry_success_int_default,
    `Bear River` = ocean_entry_success_int_bear_feather,
    `Feather River` = ocean_entry_success_int_bear_feather,
    `Yuba River` = ocean_entry_success_int_yuba,
    `Lower-mid Sacramento River` = ocean_entry_success_int_default,
    `Yolo Bypass` = ocean_entry_success_int_default,
    `American River` = ocean_entry_success_int_default,
    `Lower Sacramento River` = ocean_entry_success_int_default,
    `Calaveras River` = ocean_entry_success_int_default,
    `Cosumnes River` = ocean_entry_success_int_default,
    `Mokelumne River` = ocean_entry_success_int_default,
    `Merced River` = ocean_entry_success_int_default,
    `Stanislaus River` = ocean_entry_success_int_default,
    `Tuolumne River` = ocean_entry_success_int_default,
    `San Joaquin River` = ocean_entry_success_int_default
  )
  
  keep <- c(2, 3, 6, 7, 10, 12, 19, 20)
  num_obs <- rowSums(!is.na(known_adults[keep, 6:19]))
  total_obs <- sum(!is.na(known_adults[keep, 6:19]))
  weights <- num_obs / total_obs
  
  
  tryCatch({
    preds <- spring_run_model(mode = "calibrate",
                              seeds = seeds,
                              stochastic = FALSE,
                              ..params = params_init)
    
    known_nats <- known_adults[keep, 6:19] * (1 - params_init$proportion_hatchery[keep])
    mean_escapent <-rowMeans(known_nats, na.rm = TRUE)
    
    sse <- sum(((preds[keep,] - known_nats)^2 * weights)/mean_escapent, na.rm = TRUE)
    
    return(sse)
  },
  error = function(e) return(1e12),
  warning = function(w) return(1e12)
  )
}

Run GA Algorithm

params <- DSMCalibrationData::set_synth_years(fallRunDSM::params)

res <- ga(type = "real-valued",
          fitness =
            function(x) -spring_run_fitness(
              known_adults = DSMCalibrationData::grandtab_observed$spring,
              seeds = DSMCalibrationData::grandtab_imputed$spring,
              params = params,
              x[1], x[2], x[3], x[4], x[5], x[6], x[7], x[8], x[9], x[10],
              x[11], x[12], x[13], x[14], x[15], x[16], x[17], x[18], x[19],
              x[20], x[21], x[22], x[23], x[24], x[25], x[26], x[27]
            ),
          lower = c(2.5, rep(-3.5, 26)),
          upper = rep(3.5, 27),
          popSize = 150,
          maxiter = 10000,
          run = 50,
          parallel = TRUE,
          pmutation = .4)

Calibration Settings

The ga function allows for several paramaters to be set before calibrating a model. Here we list the reasoning for the values passed into the ga functon above.

  • lower - sets the lower bound possible for each of the parameters we are trying to calibrate. All parameters used in the model exist within a inverse logit function therefore limiting the genetic algorithms space of search greatly improves its speed. We also explicitly limit ..surv_adult_enroute as a positive value to keep the GA from quickly converging to a state where all fish are dead.

  • upper - sets the upper bound possible for each of the parameters we are trying to calibrate. The same reasoning as lower, but we add no special limitiation to ..surv_adult_enroute

  • popSize - sets population size for each iteration of the GA. We found 150 to be a good balance between breadth and speed of the search.

  • maxiter - sets the number of iterations the GA will run for. We set this value large enough so that the run argument would be determining factor as to when the GA stopped.

  • run - we set this value to 50 as it allowed the algorithm to avoid local minimum but also complete the calibration within 3-4 hours.

  • pmutation - set the probability of mutation, we used this value to explore the solution space more aggresively once we had a starting position from a previous run.

The algorithm is always run in parallel.

Calibration Results

The results for calibration are below:

Spawning Adults

  • ..surv_adult_enroute_int = 3.1003406
  • ..surv_adult_prespawn_int = -0.3201203

Egg to Fry

  • ..surv_egg_to_fry_int = 2.771515

Rearing Survival Intercepts

Note: Deer Creek survival intercept was used for all Upper Sacramento Tributaries not listed below

Intercepts listed below are supplied into the model as a vector ..surv_juv_int

  • Default Intercept = -2.3216816
  • Battle Creek = 1.6270837
  • Butte Creek = 0.2273643
  • Deer Creek = -0.11196
  • Mill Creek = 0.5919554
  • Upper-mid Sacramento River = 3.0156608
  • Lower-mid Sacramento River = 3.0156608
  • Lower Sacramento River = 3.0156608
  • Feather River = -0.8856578
  • Yuba River = -1.0748487
  • San Joaquin River = 0.4185424
  • ..surv_juv_rear_contact_points = 0.0663566
  • ..surv_juv_rear_prop_diversions = -3.4449834
  • ..surv_juv_rear_total_diversions = -1.550287

Migration Intercepts

  • ..surv_juv_bypass_int = -0.8524081
  • ..surv_juv_delta_int = 1.0170387
  • ..surv_juv_delta_contact_points = 0.9521923
  • ..surv_juv_delta_total_diverted = 0.0640437
  • ..surv_juv_outmigration_sj_int = 2.9256026

Ocean Entry Success

Intercepts listed below are supplied into the model as a vector ..ocean_entry_success_int

  • Default Ocean Entry = 2.7418892
  • Battle Creek = 1.6482682
  • Clear Creek = 1.6482682
  • Butte Creek = 3.4508557
  • Deer Creek = 0.1561519
  • Mill Creek = 1.9782962
  • Bear River = 0.3114266
  • Feather River = 0.3114266
  • Yuba River = -1.0108874

Default Ocean Entry is used for all watersheds not listed above