API Reference
Dataset Creation
EmulatorsTrainer.create_training_dataset
— Functioncreate_training_dataset(n::Int, lb::Array, ub::Array)
Generate quasi-Monte Carlo samples using Latin Hypercube Sampling.
Arguments
n::Int
: Number of samples to generatelb::Array
: Lower bounds for each parameterub::Array
: Upper bounds for each parameter
Returns
Matrix{Float64}
: Matrix of shape (nparams, nsamples) with parameter combinations
Example
lb = [0.1, 0.5, 60.0]
ub = [0.5, 1.0, 80.0]
samples = create_training_dataset(1000, lb, ub)
EmulatorsTrainer.create_training_dict
— Functioncreate_training_dict(training_matrix::Matrix, idx_comb::Int, params::Vector{String})
Create parameter dictionary for a specific sample from the training matrix.
Arguments
training_matrix::Matrix
: Matrix of parameter combinationsidx_comb::Int
: Column index of the desired combinationparams::Vector{String}
: Parameter names
Returns
Dict{String, Float64}
: Dictionary mapping parameter names to values
EmulatorsTrainer.prepare_dataset_directory
— Functionprepare_dataset_directory(root_dir::String; force::Bool=false)
Safely create a dataset directory with existence checking and metadata tracking.
Arguments
root_dir::String
: Path to the dataset directoryforce::Bool=false
: If true, backs up existing directory; if false, throws error
EmulatorsTrainer.compute_dataset
— Functioncompute_dataset(training_matrix, params, root_dir, script_func, mode; force=false)
Compute dataset using specified parallelization mode with optional force override.
Arguments
training_matrix::AbstractMatrix
: Matrix of parameter combinationsparams::AbstractVector{String}
: Parameter namesroot_dir::String
: Root directory for datasetscript_func::Function
: Function to compute data for each parameter combinationmode::Symbol
: Computation mode (:distributed, :threads, or :serial)force::Bool=false
: Force overwrite of existing directory
Modes
:distributed
: Use distributed computing across multiple processes:threads
: Use multi-threading on shared memory:serial
: Sequential execution (useful for debugging)
Data Loading and Training
EmulatorsTrainer.add_observable_df!
— Functionadd_observable_df!(df::DataFrame, location::String, param_file::String,
observable_file::String, first_idx::Int, last_idx::Int, get_tuple::Function)
Add observation slice to DataFrame with NaN checking.
Arguments
df::DataFrame
: Target DataFramelocation::String
: Directory containing filesparam_file::String
: JSON file with parametersobservable_file::String
: NPY file with observablesfirst_idx::Int
: Start index for slicelast_idx::Int
: End index for sliceget_tuple::Function
: Function to process (params, observable) into tuple
add_observable_df!(df::DataFrame, location::String, param_file::String,
observable_file::String, get_tuple::Function)
Add complete observation to DataFrame with NaN checking.
Arguments
df::DataFrame
: Target DataFramelocation::String
: Directory containing filesparam_file::String
: JSON file with parametersobservable_file::String
: NPY file with observablesget_tuple::Function
: Function to process (params, observable) into tuple
EmulatorsTrainer.load_df_directory!
— Functionload_df_directory!(df::DataFrame, Directory::String, add_observable_function::Function)
Recursively load all observations from directory into DataFrame.
Arguments
df::DataFrame
: Target DataFrameDirectory::String
: Root directory to searchadd_observable_function::Function
: Function to add each observation
EmulatorsTrainer.extract_input_output_df
— Functionextract_input_output_df(df::AbstractDataFrame)
Automatically detect and extract input and output features from a DataFrame. Assumes the last column named "observable" contains the output arrays and all other columns are input features.
Returns
array_input::Matrix{Float64}
: Input features matrix (ninputfeatures × n_samples)array_output::Matrix{Float64}
: Output features matrix (noutputfeatures × n_samples)
EmulatorsTrainer.get_minmax_in
— Functionget_minmax_in(df::DataFrame, array_pars_in::Vector{String})
Compute min/max values for specified input features.
Arguments
df::DataFrame
: DataFrame with input featuresarray_pars_in::Vector{String}
: Column names to compute min/max for
Returns
Matrix{Float64}
: Shape (n_params, 2) with [min, max] for each parameter
EmulatorsTrainer.get_minmax_out
— Functionget_minmax_out(array_out::AbstractMatrix{<:Real})
Compute minimum and maximum values for each output feature. Automatically detects the number of output features from the array dimensions.
Arguments
array_out::AbstractMatrix{<:Real}
: Output array with shape (noutputfeatures, n_samples)
Returns
out_MinMax::Matrix{Float64}
: Matrix with shape (noutputfeatures, 2) containing [min, max] for each feature
EmulatorsTrainer.maximin_df!
— Functionmaximin_df!(df, in_MinMax, out_MinMax)
Normalize DataFrame features to [0, 1] range in-place.
Arguments
df
: DataFrame to normalizein_MinMax
: Min/max values for input featuresout_MinMax
: Min/max values for output features
EmulatorsTrainer.splitdf
— Functionsplitdf(df::DataFrame, pct::Float64)
Randomly split DataFrame into two parts.
Arguments
df::DataFrame
: DataFrame to splitpct::Float64
: Fraction for first split (0 to 1)
Returns
(DataFrame, DataFrame)
: Two views of the split data
EmulatorsTrainer.traintest_split
— Functiontraintest_split(df, test)
Split DataFrame into training and test sets.
Arguments
df
: DataFrame to splittest
: Fraction for test set
Returns
(train_df, test_df)
: Training and test DataFrames
EmulatorsTrainer.getdata
— Functiongetdata(df)
Split DataFrame into train/test sets with automatic dimension detection.
Arguments
df
: DataFrame with features and observables
Returns
(xtrain, ytrain, xtest, ytest)
: Training and test arrays as Float64
Validation
EmulatorsTrainer.evaluate_residuals
— Functionevaluate_residuals(Directory::String, dict_file::String, pars_array::Vector{String},
get_ground_truth::Function, get_emu_prediction::Function;
get_σ::Union{Function,Nothing}=nothing)
Compute residuals between ground truth and emulator predictions. Automatically detects the number of validation samples and output features.
Arguments
Directory::String
: Root directory containing validation datadict_file::String
: Name of the parameter JSON file to search forpars_array::Vector{String}
: Parameter names to extractget_ground_truth::Function
: Function to load ground truth dataget_emu_prediction::Function
: Function to get emulator predictionget_σ::Union{Function,Nothing}=nothing
: Optional function to get uncertainties
Returns
Matrix{Float64}
: Residuals matrix (nsamples × noutput_features)
EmulatorsTrainer.evaluate_sorted_residuals
— Functionevaluate_sorted_residuals(Directory::String, dict_file::String, pars_array::Vector{String},
get_ground_truth::Function, get_emu_prediction::Function;
get_σ::Union{Function,Nothing}=nothing,
percentiles::AbstractVector{<:Real}=[68.0, 95.0, 99.7])
Compute sorted residuals at specified percentiles. Automatically detects number of samples and output features.
Arguments
Directory::String
: Root directory with validation datadict_file::String
: Name of parameter JSON filepars_array::Vector{String}
: Parameter names to extractget_ground_truth::Function
: Function to load ground truthget_emu_prediction::Function
: Function to get emulator predictionget_σ::Union{Function,Nothing}=nothing
: Optional function for uncertaintiespercentiles::AbstractVector{<:Real}=[68.0, 95.0, 99.7]
: Percentiles to compute
Returns
Matrix{Float64}
: Sorted residuals (npercentiles × nfeatures)
EmulatorsTrainer.sort_residuals
— Functionsort_residuals(residuals::AbstractMatrix{<:Real};
percentiles::AbstractVector{<:Real}=[68.0, 95.0, 99.7])
Sort residuals and extract specified percentiles. Automatically detects dimensions from input matrix.
Arguments
residuals::AbstractMatrix{<:Real}
: Residuals matrixpercentiles::AbstractVector{<:Real}=[68.0, 95.0, 99.7]
: Percentiles to extract
Returns
Matrix{Float64}
: Percentiles matrix (npercentiles × nfeatures)