API Reference
Dataset Creation
EmulatorsTrainer.create_training_dataset — Functioncreate_training_dataset(n::Int, lb::Array, ub::Array)Generate quasi-Monte Carlo samples using Latin Hypercube Sampling.
Arguments
n::Int: Number of samples to generatelb::Array: Lower bounds for each parameterub::Array: Upper bounds for each parameter
Returns
Matrix{Float64}: Matrix of shape (nparams, nsamples) with parameter combinations
Example
lb = [0.1, 0.5, 60.0]
ub = [0.5, 1.0, 80.0]
samples = create_training_dataset(1000, lb, ub)EmulatorsTrainer.create_training_dict — Functioncreate_training_dict(training_matrix::Matrix, idx_comb::Int, params::Vector{String})Create parameter dictionary for a specific sample from the training matrix.
Arguments
training_matrix::Matrix: Matrix of parameter combinationsidx_comb::Int: Column index of the desired combinationparams::Vector{String}: Parameter names
Returns
Dict{String, Float64}: Dictionary mapping parameter names to values
EmulatorsTrainer.prepare_dataset_directory — Functionprepare_dataset_directory(root_dir::String; force::Bool=false)Safely create a dataset directory with existence checking and metadata tracking.
Arguments
root_dir::String: Path to the dataset directoryforce::Bool=false: If true, backs up existing directory; if false, throws error
EmulatorsTrainer.compute_dataset — Functioncompute_dataset(training_matrix, params, root_dir, script_func, mode; force=false)Compute dataset using specified parallelization mode with optional force override.
Arguments
training_matrix::AbstractMatrix: Matrix of parameter combinationsparams::AbstractVector{String}: Parameter namesroot_dir::String: Root directory for datasetscript_func::Function: Function to compute data for each parameter combinationmode::Symbol: Computation mode (:distributed, :threads, or :serial)force::Bool=false: Force overwrite of existing directory
Modes
:distributed: Use distributed computing across multiple processes:threads: Use multi-threading on shared memory:serial: Sequential execution (useful for debugging)
Data Loading and Training
EmulatorsTrainer.add_observable_df! — Functionadd_observable_df!(df::DataFrame, location::String, param_file::String,
observable_file::String, first_idx::Int, last_idx::Int, get_tuple::Function)Add observation slice to DataFrame with NaN checking.
Arguments
df::DataFrame: Target DataFramelocation::String: Directory containing filesparam_file::String: JSON file with parametersobservable_file::String: NPY file with observablesfirst_idx::Int: Start index for slicelast_idx::Int: End index for sliceget_tuple::Function: Function to process (params, observable) into tuple
add_observable_df!(df::DataFrame, location::String, param_file::String,
observable_file::String, get_tuple::Function)Add complete observation to DataFrame with NaN checking.
Arguments
df::DataFrame: Target DataFramelocation::String: Directory containing filesparam_file::String: JSON file with parametersobservable_file::String: NPY file with observablesget_tuple::Function: Function to process (params, observable) into tuple
EmulatorsTrainer.load_df_directory! — Functionload_df_directory!(df::DataFrame, Directory::String, add_observable_function::Function)Recursively load all observations from directory into DataFrame.
Arguments
df::DataFrame: Target DataFrameDirectory::String: Root directory to searchadd_observable_function::Function: Function to add each observation
EmulatorsTrainer.extract_input_output_df — Functionextract_input_output_df(df::AbstractDataFrame)Automatically detect and extract input and output features from a DataFrame. Assumes the last column named "observable" contains the output arrays and all other columns are input features.
Returns
array_input::Matrix{Float64}: Input features matrix (ninputfeatures × n_samples)array_output::Matrix{Float64}: Output features matrix (noutputfeatures × n_samples)
EmulatorsTrainer.get_minmax_in — Functionget_minmax_in(df::DataFrame, array_pars_in::Vector{String})Compute min/max values for specified input features.
Arguments
df::DataFrame: DataFrame with input featuresarray_pars_in::Vector{String}: Column names to compute min/max for
Returns
Matrix{Float64}: Shape (n_params, 2) with [min, max] for each parameter
EmulatorsTrainer.get_minmax_out — Functionget_minmax_out(array_out::AbstractMatrix{<:Real})Compute minimum and maximum values for each output feature. Automatically detects the number of output features from the array dimensions.
Arguments
array_out::AbstractMatrix{<:Real}: Output array with shape (noutputfeatures, n_samples)
Returns
out_MinMax::Matrix{Float64}: Matrix with shape (noutputfeatures, 2) containing [min, max] for each feature
EmulatorsTrainer.maximin_df! — Functionmaximin_df!(df, in_MinMax, out_MinMax)Normalize DataFrame features to [0, 1] range in-place.
Arguments
df: DataFrame to normalizein_MinMax: Min/max values for input featuresout_MinMax: Min/max values for output features
EmulatorsTrainer.splitdf — Functionsplitdf(df::DataFrame, pct::Float64)Randomly split DataFrame into two parts.
Arguments
df::DataFrame: DataFrame to splitpct::Float64: Fraction for first split (0 to 1)
Returns
(DataFrame, DataFrame): Two views of the split data
EmulatorsTrainer.traintest_split — Functiontraintest_split(df, test)Split DataFrame into training and test sets.
Arguments
df: DataFrame to splittest: Fraction for test set
Returns
(train_df, test_df): Training and test DataFrames
EmulatorsTrainer.getdata — Functiongetdata(df)Split DataFrame into train/test sets with automatic dimension detection.
Arguments
df: DataFrame with features and observables
Returns
(xtrain, ytrain, xtest, ytest): Training and test arrays as Float64
Validation
EmulatorsTrainer.evaluate_residuals — Functionevaluate_residuals(Directory::String, dict_file::String, pars_array::Vector{String},
get_ground_truth::Function, get_emu_prediction::Function;
get_σ::Union{Function,Nothing}=nothing)Compute residuals between ground truth and emulator predictions. Automatically detects the number of validation samples and output features.
Arguments
Directory::String: Root directory containing validation datadict_file::String: Name of the parameter JSON file to search forpars_array::Vector{String}: Parameter names to extractget_ground_truth::Function: Function to load ground truth dataget_emu_prediction::Function: Function to get emulator predictionget_σ::Union{Function,Nothing}=nothing: Optional function to get uncertainties
Returns
Matrix{Float64}: Residuals matrix (nsamples × noutput_features)
EmulatorsTrainer.evaluate_sorted_residuals — Functionevaluate_sorted_residuals(Directory::String, dict_file::String, pars_array::Vector{String},
get_ground_truth::Function, get_emu_prediction::Function;
get_σ::Union{Function,Nothing}=nothing,
percentiles::AbstractVector{<:Real}=[68.0, 95.0, 99.7])Compute sorted residuals at specified percentiles. Automatically detects number of samples and output features.
Arguments
Directory::String: Root directory with validation datadict_file::String: Name of parameter JSON filepars_array::Vector{String}: Parameter names to extractget_ground_truth::Function: Function to load ground truthget_emu_prediction::Function: Function to get emulator predictionget_σ::Union{Function,Nothing}=nothing: Optional function for uncertaintiespercentiles::AbstractVector{<:Real}=[68.0, 95.0, 99.7]: Percentiles to compute
Returns
Matrix{Float64}: Sorted residuals (npercentiles × nfeatures)
EmulatorsTrainer.sort_residuals — Functionsort_residuals(residuals::AbstractMatrix{<:Real};
percentiles::AbstractVector{<:Real}=[68.0, 95.0, 99.7])Sort residuals and extract specified percentiles. Automatically detects dimensions from input matrix.
Arguments
residuals::AbstractMatrix{<:Real}: Residuals matrixpercentiles::AbstractVector{<:Real}=[68.0, 95.0, 99.7]: Percentiles to extract
Returns
Matrix{Float64}: Percentiles matrix (npercentiles × nfeatures)