API Reference

Dataset Creation

EmulatorsTrainer.create_training_dataset — Function

create_training_dataset(n::Int, lb::Array, ub::Array)

Generate quasi-Monte Carlo samples using Latin Hypercube Sampling.

Arguments

n::Int: Number of samples to generate
lb::Array: Lower bounds for each parameter
ub::Array: Upper bounds for each parameter

Returns

Matrix{Float64}: Matrix of shape (nparams, nsamples) with parameter combinations

Example

lb = [0.1, 0.5, 60.0]
ub = [0.5, 1.0, 80.0]
samples = create_training_dataset(1000, lb, ub)

source

EmulatorsTrainer.create_training_dict — Function

create_training_dict(training_matrix::Matrix, idx_comb::Int, params::Vector{String})

Create parameter dictionary for a specific sample from the training matrix.

Arguments

training_matrix::Matrix: Matrix of parameter combinations
idx_comb::Int: Column index of the desired combination
params::Vector{String}: Parameter names

Returns

Dict{String, Float64}: Dictionary mapping parameter names to values

source

EmulatorsTrainer.prepare_dataset_directory — Function

prepare_dataset_directory(root_dir::String; force::Bool=false)

Safely create a dataset directory with existence checking and metadata tracking.

Arguments

root_dir::String: Path to the dataset directory
force::Bool=false: If true, backs up existing directory; if false, throws error

source

EmulatorsTrainer.compute_dataset — Function

compute_dataset(training_matrix, params, root_dir, script_func, mode; force=false)

Compute dataset using specified parallelization mode with optional force override.

Arguments

training_matrix::AbstractMatrix: Matrix of parameter combinations
params::AbstractVector{String}: Parameter names
root_dir::String: Root directory for dataset
script_func::Function: Function to compute data for each parameter combination
mode::Symbol: Computation mode (:distributed, :threads, or :serial)
force::Bool=false: Force overwrite of existing directory

Modes

:distributed: Use distributed computing across multiple processes
:threads: Use multi-threading on shared memory
:serial: Sequential execution (useful for debugging)

source

Data Loading and Training

EmulatorsTrainer.add_observable_df! — Function

add_observable_df!(df::DataFrame, location::String, param_file::String,
                  observable_file::String, first_idx::Int, last_idx::Int, get_tuple::Function)

Add observation slice to DataFrame with NaN checking.

Arguments

df::DataFrame: Target DataFrame
location::String: Directory containing files
param_file::String: JSON file with parameters
observable_file::String: NPY file with observables
first_idx::Int: Start index for slice
last_idx::Int: End index for slice
get_tuple::Function: Function to process (params, observable) into tuple

source

add_observable_df!(df::DataFrame, location::String, param_file::String,
                  observable_file::String, get_tuple::Function)

Add complete observation to DataFrame with NaN checking.

Arguments

df::DataFrame: Target DataFrame
location::String: Directory containing files
param_file::String: JSON file with parameters
observable_file::String: NPY file with observables
get_tuple::Function: Function to process (params, observable) into tuple

source

EmulatorsTrainer.load_df_directory! — Function

load_df_directory!(df::DataFrame, Directory::String, add_observable_function::Function)

Recursively load all observations from directory into DataFrame.

Arguments

df::DataFrame: Target DataFrame
Directory::String: Root directory to search
add_observable_function::Function: Function to add each observation

source

EmulatorsTrainer.extract_input_output_df — Function

extract_input_output_df(df::AbstractDataFrame)

Automatically detect and extract input and output features from a DataFrame. Assumes the last column named "observable" contains the output arrays and all other columns are input features.

Returns

array_input::Matrix{Float64}: Input features matrix (ninputfeatures × n_samples)
array_output::Matrix{Float64}: Output features matrix (noutputfeatures × n_samples)

source

EmulatorsTrainer.get_minmax_in — Function

get_minmax_in(df::DataFrame, array_pars_in::Vector{String})

Compute min/max values for specified input features.

Arguments

df::DataFrame: DataFrame with input features
array_pars_in::Vector{String}: Column names to compute min/max for

Returns

Matrix{Float64}: Shape (n_params, 2) with [min, max] for each parameter

source

EmulatorsTrainer.get_minmax_out — Function

get_minmax_out(array_out::AbstractMatrix{<:Real})

Compute minimum and maximum values for each output feature. Automatically detects the number of output features from the array dimensions.

Arguments

array_out::AbstractMatrix{<:Real}: Output array with shape (noutputfeatures, n_samples)

Returns

out_MinMax::Matrix{Float64}: Matrix with shape (noutputfeatures, 2) containing [min, max] for each feature

source

EmulatorsTrainer.maximin_df! — Function

maximin_df!(df, in_MinMax, out_MinMax)

Normalize DataFrame features to [0, 1] range in-place.

Arguments

df: DataFrame to normalize
in_MinMax: Min/max values for input features
out_MinMax: Min/max values for output features

source

EmulatorsTrainer.splitdf — Function

splitdf(df::DataFrame, pct::Float64)

Randomly split DataFrame into two parts.

Arguments

df::DataFrame: DataFrame to split
pct::Float64: Fraction for first split (0 to 1)

Returns

(DataFrame, DataFrame): Two views of the split data

source

EmulatorsTrainer.traintest_split — Function

traintest_split(df, test)

Split DataFrame into training and test sets.

Arguments

df: DataFrame to split
test: Fraction for test set

Returns

(train_df, test_df): Training and test DataFrames

source

EmulatorsTrainer.getdata — Function

getdata(df)

Split DataFrame into train/test sets with automatic dimension detection.

Arguments

df: DataFrame with features and observables

Returns

(xtrain, ytrain, xtest, ytest): Training and test arrays as Float64

source

Validation

EmulatorsTrainer.evaluate_residuals — Function

evaluate_residuals(Directory::String, dict_file::String, pars_array::Vector{String},
                  get_ground_truth::Function, get_emu_prediction::Function;
                  get_σ::Union{Function,Nothing}=nothing)

Compute residuals between ground truth and emulator predictions. Automatically detects the number of validation samples and output features.

Arguments

Directory::String: Root directory containing validation data
dict_file::String: Name of the parameter JSON file to search for
pars_array::Vector{String}: Parameter names to extract
get_ground_truth::Function: Function to load ground truth data
get_emu_prediction::Function: Function to get emulator prediction
get_σ::Union{Function,Nothing}=nothing: Optional function to get uncertainties

Returns

Matrix{Float64}: Residuals matrix (nsamples × noutput_features)

source

EmulatorsTrainer.evaluate_sorted_residuals — Function

evaluate_sorted_residuals(Directory::String, dict_file::String, pars_array::Vector{String},
                        get_ground_truth::Function, get_emu_prediction::Function;
                        get_σ::Union{Function,Nothing}=nothing, 
                        percentiles::AbstractVector{<:Real}=[68.0, 95.0, 99.7])

Compute sorted residuals at specified percentiles. Automatically detects number of samples and output features.

Arguments

Directory::String: Root directory with validation data
dict_file::String: Name of parameter JSON file
pars_array::Vector{String}: Parameter names to extract
get_ground_truth::Function: Function to load ground truth
get_emu_prediction::Function: Function to get emulator prediction
get_σ::Union{Function,Nothing}=nothing: Optional function for uncertainties
percentiles::AbstractVector{<:Real}=[68.0, 95.0, 99.7]: Percentiles to compute

Returns

Matrix{Float64}: Sorted residuals (npercentiles × nfeatures)

source

EmulatorsTrainer.sort_residuals — Function

sort_residuals(residuals::AbstractMatrix{<:Real};
              percentiles::AbstractVector{<:Real}=[68.0, 95.0, 99.7])

Sort residuals and extract specified percentiles. Automatically detects dimensions from input matrix.

Arguments

residuals::AbstractMatrix{<:Real}: Residuals matrix
percentiles::AbstractVector{<:Real}=[68.0, 95.0, 99.7]: Percentiles to extract

Returns

Matrix{Float64}: Percentiles matrix (npercentiles × nfeatures)

source