PerSCiDO facilitates the exploration of research datasets.

Share your research datasets using PerSCiDO!

Numbers
Datasets: 35
Downloaded: 1643
  • Observation data simulation results
Benchmark MODECOGeL
  • Contributor Laurence Viry
A global sensitivity analysis approach for marine biogeochemical modeling. Marine biogeochemical models are now commonly included as modules in complex ocean circulation modeling systems. They are thus increasingly used for many applications. For such applications, sensitivity analysis (SA), i.e. methods that aim to quantify the relative influence of the inputs on some given outputs in a complex system like a numerical model, are a valuable tool. The model concerned with this benchmark is MODECOGeL, for which the input space is of dimension d = 74, and consider 15 different quantities of interest (QoI). This study required a high number of model evaluations that the authors decided to provide to the MASCOT NUM community, and more generally to the community of uncertainty quantification.
Read me file
descripdatamodecomar.txt
Read me file
Benchmark MODECOGeL: A global sensitivity analysis approach for marine biogeochemical modeling

Application:
------------
Marine biogeochemical models are now commonly included as modules in complex ocean circulation modeling systems. They are thus increasingly used for many applications. For such applications, sensitivity analysis (SA), i.e. methods that aim to quantify the relative influence of the inputs on some given outputs in a complex system like a numerical model, are a valuable tool.

Indeed, they can help better understand the model itself, and identify which parameters are most influential and should be calibrated carefully.

Benchmark objectives
--------------------
The model concerned with this benchmark is MODECOGeL, for which the input space is of dimension d = 74, and consider 15 different quantities of interest (QoI).

To know more about the model, the parameters in model inputs and the quantities of interest considered, consult the file Description.pdf provided in the data archive:

* The MEDECOGeL model is briefly described in Section 1.1.
* The d uncertain input parameters of the model are described in Section 1.2
* the QoIs are listed in Section 1.3.
* the different strategies for estimating indices in section 1.4
* and the description of the data for each strategy in section 2.

This study required a high number of model evaluations that the authors decided to provide to the MASCOT NUM community, and more generally to the community of uncertainty quantification.

Different strategies for estimating indices
--------------------------------------------

In the case for complex models, one evaluates the model for selected values of the input parameters, and the resulting output values are used to estimate sensitivity indices of interest. In our study, the different strategies we apply are described in the file Description.pdf (section 1.4).

Description of the data for each strategy
-----------------------------------------
To compute the Sobol' indices, we used the R "sensitivity" package. The choice of the designs of experiments was driven by the functions of this package that we used for the estimation of first-, second-order and total Sobol' indices.

* First-order Sobol' indices with the replication procedure (section 2.1)

- Two replicated designs (Latin hypercubes) of size N for a total cost of 2N model evaluations.
We considered 4 different values for the size N of each design: 1e+3, 1e+4, 1e+5 and 1e+6

The information needed to compute the first-order Sobol' indices from the model outputs is contained in a R dataset, one dataset for each size.

inputRLHS_74_1e+3.RData ,
inputRLHS_74_1e+4.RData ,
inputRLHS_74_1e+5.RData ,
inputRLHS_74_1e+6.RData

which contain:
d: number of parameters
parClass: sigma/mean for each parameter (see Appendix in Description.pdf),
parNames: names of parameters,
parCent: mean for each parameter,
N: an integer giving the size of each replicated design.
X1: a matrix containing the first replicated design. The field separator character is "white space", the first column of the table contains the row names.
X2: a matrix containing the the second replicated design with the same format.
perm1, perm2: table of permutations needed to compute X2 and X1. perm1 and perm2
are both NXd matrices.

To know, how is built X2 from X1, perm1 and perm2, see Presentation.pdf (section 2.1).

- The quantities of interest For each N, the 15 quantities of interest (QoIs) computed on X1 and X2 are stored in two files, each of size (1 +N)X15 in table format (csv file):

QI_X1_RLHS_NA_1e+3.csv , QI_X2_RLHS_NA_1e+3.csv ,
QI_X1_RLHS_NA_1e+4.csv , QI_X2_RLHS_NA_1e+4.csv ,
QI_X1_RLHS_NA_1e+5.csv , QI_X2_RLHS_NA_1e+5.csv ,
QI_X1_RLHS_NA_1e+6.csv , QI_X2_RLHS_NA_1e+6.csv.

The first line of this file contains the names of these quantities of interest.
To read these files in R, you can use the command read.csv as follows:

qi <- read.csv(file="QI_X1_RLHS_NA_1e+3.csv",header=TRUE,sep=" ",row.names=1)

* Second-order Sobol' indices with the replication procedure (section 2.2)

- Two replicated designs (randomized orthogonal arrays) of size N for a total cost of 2N model evaluations.

For the second-order, N = q^2 where q >= d-1 is a prime number corresponding to the number of levels of the orthogonal array used. We considered 3 different values for q: 73 (N=5 329), 227(N=51 529) and 709 (N=506 681).

The information needed to compute the second-order Sobol' indices from the model outputs is contained in a R dataset, one dataset for each sample size N.

inputRLHS2_74_73.RData,
inputRLHS2_74_227.RData,
inputRLHS2_74_709.RData

which contain:
d: number of parameters
parClass: sigma/mean for each parameter (see Appendix),
parNames: names of parameters,
parCent: mean for each parameter,
N: an integer giving the size of each orthogonal arrays.
X1: a matrix containing the first orthogonal array. The field separator character is "white space", the first column of the table contains the row names.
X2: a matrix containing the second orthogonal array with the same format.
perm1, perm2: table of permutations needed to compute X2 from X1. Both are NXd matrices whose columns are permutations of {1,...,N}.

To know, how is built X2 from X1, perm1 and perm2, see Presentation.pdf (section 2.2)

- The quantities of interest: For each q (N = q2), the 15 QoIs computed on X1 and X2 are stored in two files of size (N+ 1)X15 in table format (csv file):

QI_X1_RLHS2_NA_Q73.csv , QI_X2_RLHS2_NA_Q73.csv ,
QI_X1_RLHS2_NA_Q227.csv , QI_X2_RLHS2_NA_Q227.csv ,
QI_X1_RLHS2_NA_Q709.csv , QI_X2_RLHS2_NA_Q709.csv.

The first line of this file contains the names of the 15 QoIs.

To read these files in R, you can use the command read.csv as previously:

qi <- read.csv(file="QI_X1_RLHS2_NA_Q73.csv",header=TRUE,sep=" ",row.names=1)

* Monte Carlo estimation of total Sobol' indices introduced by Saltelli

- Two random samples X1 and X2 of size N for a total cost of (2+d)N model evaluations where d is the number of parameters.

To compute the total indices, we used an experimental design X from X1 and X2 of size (2+d)N.

We considered one value for N = 1:e + 5.

The information needed to compute the total Sobol' indices from the model outputs is contained in a R dataset, one dataset for each size.

inputSalt_74_1e+5.RData

which contain:
d: number of parameters,
parClass: sigma/mean for each parameter (see Appendix),
parNames: names of parameters,
parCent: mean for each parameter,
N: an integer giving the size of each replicated design.
X1: a NXd matrix containing the first i.i.d. sample of input parameters. The field separator character is "white space", the first column of the table contains the row names.
X2: a NXd matrix containing the second i.i.d. sample of input parameters with the same format.
X: a (d + 2)NXd matrix containing the design of experiments.

See in Description.pdf (section 2.3), how is obtained X from X1 and X2.

- The quantities of interest: For each value of N, a file of size (1+(2+d)N)X15 in table format(csv file) that contains the 15 quantities of interest computed on X.

QI_Salt_NA_1e+5.csv

The first line of this file contains the names of the QoIs.
To read these file in R, you can use the command read.csv :

qi <- read.csv(file="QI_Salt_NA_3e+4.csv",header=TRUE,sep=" ",row.names=1)

* Local method with gradient calculation

The information needed to compute the local indices from the model outputs is contained in a R dataset:

inputLocal_74_2.RData

which contain:
nparam: number of parameters,
nrowplan : an integer giving the size of the experimental design,
alpha: 2-dimensional vector of values for alpha (see Equation (3) in Presentation.pdf) to approximate the gradients of our quantities of interest,
parCent : mean for each parameter,
parClass: sigma/mean for each parameter (see Appendix),
parNames: names of parameters,
planSimul: a matrix X of size (1 + 2d)NXd containing the experimental design,
qiNames: names of quantities of interest,
tabqi: a matrix containing the 15 quantities of interest retained.

Yo know how is planSimul built, see Presentation.pdf (section 2.4).
2019 10 18
Other metadata
  • External Identifiers:

  • Subjects:

    Computer Science, Mathematics, Biochemistry
  • Keywords:

    sensitivity analysis, Sobol' indices, marine biogeochemistry
  • Corresponding tasks:

    benchmark

Laurence Viry (2019) Benchmark MODECOGeL. [Data set]. Published 2019 via Perscido-Grenoble-Alpes;

Laurence Viry (2019) Benchmark MODECOGeL. [Data set]. Published 2019 via Perscido-Grenoble-Alpes