API Reference

Dataset Creation

EmulatorsTrainer.create_training_datasetFunction
create_training_dataset(n::Int, lb::Array, ub::Array)

Generate quasi-Monte Carlo samples using Latin Hypercube Sampling.

Arguments

  • n::Int: Number of samples to generate
  • lb::Array: Lower bounds for each parameter
  • ub::Array: Upper bounds for each parameter

Returns

  • Matrix{Float64}: Matrix of shape (nparams, nsamples) with parameter combinations

Example

lb = [0.1, 0.5, 60.0]
ub = [0.5, 1.0, 80.0]
samples = create_training_dataset(1000, lb, ub)
source
EmulatorsTrainer.create_training_dictFunction
create_training_dict(training_matrix::Matrix, idx_comb::Int, params::Vector{String})

Create parameter dictionary for a specific sample from the training matrix.

Arguments

  • training_matrix::Matrix: Matrix of parameter combinations
  • idx_comb::Int: Column index of the desired combination
  • params::Vector{String}: Parameter names

Returns

  • Dict{String, Float64}: Dictionary mapping parameter names to values
source
EmulatorsTrainer.prepare_dataset_directoryFunction
prepare_dataset_directory(root_dir::String; force::Bool=false)

Safely create a dataset directory with existence checking and metadata tracking.

Arguments

  • root_dir::String: Path to the dataset directory
  • force::Bool=false: If true, backs up existing directory; if false, throws error
source
EmulatorsTrainer.compute_datasetFunction
compute_dataset(training_matrix, params, root_dir, script_func, mode; force=false)

Compute dataset using specified parallelization mode with optional force override.

Arguments

  • training_matrix::AbstractMatrix: Matrix of parameter combinations
  • params::AbstractVector{String}: Parameter names
  • root_dir::String: Root directory for dataset
  • script_func::Function: Function to compute data for each parameter combination
  • mode::Symbol: Computation mode (:distributed, :threads, or :serial)
  • force::Bool=false: Force overwrite of existing directory

Modes

  • :distributed: Use distributed computing across multiple processes
  • :threads: Use multi-threading on shared memory
  • :serial: Sequential execution (useful for debugging)
source

Data Loading and Training

EmulatorsTrainer.add_observable_df!Function
add_observable_df!(df::DataFrame, location::String, param_file::String,
                  observable_file::String, first_idx::Int, last_idx::Int, get_tuple::Function)

Add observation slice to DataFrame with NaN checking.

Arguments

  • df::DataFrame: Target DataFrame
  • location::String: Directory containing files
  • param_file::String: JSON file with parameters
  • observable_file::String: NPY file with observables
  • first_idx::Int: Start index for slice
  • last_idx::Int: End index for slice
  • get_tuple::Function: Function to process (params, observable) into tuple
source
add_observable_df!(df::DataFrame, location::String, param_file::String,
                  observable_file::String, get_tuple::Function)

Add complete observation to DataFrame with NaN checking.

Arguments

  • df::DataFrame: Target DataFrame
  • location::String: Directory containing files
  • param_file::String: JSON file with parameters
  • observable_file::String: NPY file with observables
  • get_tuple::Function: Function to process (params, observable) into tuple
source
EmulatorsTrainer.load_df_directory!Function
load_df_directory!(df::DataFrame, Directory::String, add_observable_function::Function)

Recursively load all observations from directory into DataFrame.

Arguments

  • df::DataFrame: Target DataFrame
  • Directory::String: Root directory to search
  • add_observable_function::Function: Function to add each observation
source
EmulatorsTrainer.extract_input_output_dfFunction
extract_input_output_df(df::AbstractDataFrame)

Automatically detect and extract input and output features from a DataFrame. Assumes the last column named "observable" contains the output arrays and all other columns are input features.

Returns

  • array_input::Matrix{Float64}: Input features matrix (ninputfeatures × n_samples)
  • array_output::Matrix{Float64}: Output features matrix (noutputfeatures × n_samples)
source
EmulatorsTrainer.get_minmax_inFunction
get_minmax_in(df::DataFrame, array_pars_in::Vector{String})

Compute min/max values for specified input features.

Arguments

  • df::DataFrame: DataFrame with input features
  • array_pars_in::Vector{String}: Column names to compute min/max for

Returns

  • Matrix{Float64}: Shape (n_params, 2) with [min, max] for each parameter
source
EmulatorsTrainer.get_minmax_outFunction
get_minmax_out(array_out::AbstractMatrix{<:Real})

Compute minimum and maximum values for each output feature. Automatically detects the number of output features from the array dimensions.

Arguments

  • array_out::AbstractMatrix{<:Real}: Output array with shape (noutputfeatures, n_samples)

Returns

  • out_MinMax::Matrix{Float64}: Matrix with shape (noutputfeatures, 2) containing [min, max] for each feature
source
EmulatorsTrainer.maximin_df!Function
maximin_df!(df, in_MinMax, out_MinMax)

Normalize DataFrame features to [0, 1] range in-place.

Arguments

  • df: DataFrame to normalize
  • in_MinMax: Min/max values for input features
  • out_MinMax: Min/max values for output features
source
EmulatorsTrainer.splitdfFunction
splitdf(df::DataFrame, pct::Float64)

Randomly split DataFrame into two parts.

Arguments

  • df::DataFrame: DataFrame to split
  • pct::Float64: Fraction for first split (0 to 1)

Returns

  • (DataFrame, DataFrame): Two views of the split data
source
EmulatorsTrainer.traintest_splitFunction
traintest_split(df, test)

Split DataFrame into training and test sets.

Arguments

  • df: DataFrame to split
  • test: Fraction for test set

Returns

  • (train_df, test_df): Training and test DataFrames
source
EmulatorsTrainer.getdataFunction
getdata(df)

Split DataFrame into train/test sets with automatic dimension detection.

Arguments

  • df: DataFrame with features and observables

Returns

  • (xtrain, ytrain, xtest, ytest): Training and test arrays as Float64
source

Validation

EmulatorsTrainer.evaluate_residualsFunction
evaluate_residuals(Directory::String, dict_file::String, pars_array::Vector{String},
                  get_ground_truth::Function, get_emu_prediction::Function;
                  get_σ::Union{Function,Nothing}=nothing)

Compute residuals between ground truth and emulator predictions. Automatically detects the number of validation samples and output features.

Arguments

  • Directory::String: Root directory containing validation data
  • dict_file::String: Name of the parameter JSON file to search for
  • pars_array::Vector{String}: Parameter names to extract
  • get_ground_truth::Function: Function to load ground truth data
  • get_emu_prediction::Function: Function to get emulator prediction
  • get_σ::Union{Function,Nothing}=nothing: Optional function to get uncertainties

Returns

  • Matrix{Float64}: Residuals matrix (nsamples × noutput_features)
source
EmulatorsTrainer.evaluate_sorted_residualsFunction
evaluate_sorted_residuals(Directory::String, dict_file::String, pars_array::Vector{String},
                        get_ground_truth::Function, get_emu_prediction::Function;
                        get_σ::Union{Function,Nothing}=nothing, 
                        percentiles::AbstractVector{<:Real}=[68.0, 95.0, 99.7])

Compute sorted residuals at specified percentiles. Automatically detects number of samples and output features.

Arguments

  • Directory::String: Root directory with validation data
  • dict_file::String: Name of parameter JSON file
  • pars_array::Vector{String}: Parameter names to extract
  • get_ground_truth::Function: Function to load ground truth
  • get_emu_prediction::Function: Function to get emulator prediction
  • get_σ::Union{Function,Nothing}=nothing: Optional function for uncertainties
  • percentiles::AbstractVector{<:Real}=[68.0, 95.0, 99.7]: Percentiles to compute

Returns

  • Matrix{Float64}: Sorted residuals (npercentiles × nfeatures)
source
EmulatorsTrainer.sort_residualsFunction
sort_residuals(residuals::AbstractMatrix{<:Real};
              percentiles::AbstractVector{<:Real}=[68.0, 95.0, 99.7])

Sort residuals and extract specified percentiles. Automatically detects dimensions from input matrix.

Arguments

  • residuals::AbstractMatrix{<:Real}: Residuals matrix
  • percentiles::AbstractVector{<:Real}=[68.0, 95.0, 99.7]: Percentiles to extract

Returns

  • Matrix{Float64}: Percentiles matrix (npercentiles × nfeatures)
source