3D Time varying FNO

3D Time varying

The implementation allows for discretization along time dimension to be 1 (only 1 time step). But you can also treat time as any other dimension, so this could also be as a generic 4D FNO

3D Model

ParametricDFNOs.DFNO_3D.Model — Type

Model(config::ModelConfig)

A mutable structure representing the FNO, including configurations, weights, biases, and other parameters.

Constructor

This constructor initializes the model's layers and distributes the weights and biases according to the provided ModelConfig.

source

ParametricDFNOs.DFNO_3D.ModelConfig — Type

ModelConfig(;nx::Int=64, ny::Int=64, nz::Int=64, nt::Int=51, nc_in::Int=5, nc_mid::Int=128, nc_lift::Int=20, nc_out::Int=1, mx::Int=8, my::Int=8, mz::Int=8, mt::Int=4, nblocks::Int=4, dtype::DataType=Float32, partition::Vector{Int}=[1, 8], relu01::Bool=true)

A configuration struct that holds parameters for the Model setup.

Fields

nx: The discretization along the x-dimension.
ny: The discretization along the y-dimension.
nz: The discretization along the z-dimension.
nt: The discretization along time.
nc_in: The number of input channels.
nc_mid: The number of intermediate channels.
nc_lift: The number of lifted channels.
nc_out: The number of output channels.
mx: The size of the model in the x-dimension for Fourier transform.
my: The size of the model in the y-dimension for Fourier transform.
mz: The size of the model in the z-dimension for Fourier transform.
mt: The size of the model in time for Fourier transform.
nblocks: The number of blocks in the model.
dtype: The data type used for computations, default is Float32.
partition: The partitioning configuration for distributed computing, default is [1, 8].
relu01: Boolean that determines whether the final relu layer is applied

source

ParametricDFNOs.DFNO_3D.initModel — Method

initModel(model::Model)

Initializes the Model by allocating and setting the initial values for the parameters.

Arguments

model: An instance of Model to be initialized.

Returns

Returns the initialized parameters θ which may be placed on GPU if the global gpu_flag is set to true.

source

3D Forward Pass

ParametricDFNOs.DFNO_3D.forward — Method

forward(model::Model, θ, x::Any)

Performs the forward pass using the model defined by Model.

The function applies a series of transformations to the input data x using the model parameters θ and the configurations within the model.

Arguments

model: The Model object that contains configurations and parameters for the forward pass.
θ: The parameters of the model, likely initialized by initModel.
x: Input data that will be passed through the model.

Returns

The output of the model after the forward pass, reshaped to the dimensions appropriate for the number of output channels, time steps, and spatial dimensions.

Details

The process includes:

Reshaping the input and applying lifting operations.
Processing through a series of blocks that includes spectral and standard convolutions, followed by batch normalization and non-linear activation functions (ReLU).
Final projection to the output channels and application of a double ReLU operation to finalize the forward pass.

Notes

This function will move x to GPU and perform GPU-enabled computations if gpu_flag is set.

Caution

The input x should have the correct number of elements but does not need to have any particular shape. However, incorrect permutation of the input dimensions will lead to incorrect solution operators.

source

3D Training

ParametricDFNOs.DFNO_3D.TrainConfig — Type

TrainConfig

A configuration struct for setting up the training environment.

Fields

nbatch: The number of samples per batch.
epochs: The number of full training cycles through the dataset.
seed: The random seed for reproducibility of shuffling and initialization.
plot_every: Frequency of plotting the evaluation metrics (every n epochs).
learning_rate: The step size for the optimizer.
x_train: Training input data.
y_train: Training target data.
x_valid: Validation input data.
y_valid: Validation target data.

Usage

This struct is used to encapsulate all necessary training parameters, including data and hyperparameters, to configure the training loop.

source

ParametricDFNOs.DFNO_3D.train! — Method

train!(config::TrainConfig, model::Model, θ::Dict; comm=MPI.COMM_WORLD, plotEval::Function=plotEvaluation)

Conducts the training process for a given model using distributed computing and tracks the training and validation loss.

Arguments

config: The training configuration settings as specified by the TrainConfig struct.
model: The neural network model to be trained as specified by the Model struct.
θ: A dictionary of model parameters to be optimized during training.
comm: The MPI communicator to be used for distributed training (defaults to MPI.COMM_WORLD).
plotEval: The plotting function called to evaluate training progress (defaults to plotEvaluation).

Training Procedure

The function sets up the optimizer and initializes training and validation data.
It goes through the specified number of training epochs, updating model parameters via gradient descent.
If the gpu_flag is set, computations are performed on a GPU.
At specified intervals, the training process is evaluated by plotting the training and validation losses and predicted vs. actual outputs.

Outputs

Updates the model's parameters in-place.
Produces plots every epoch to visually evaluate model performance.
Saves weights every 2 epochs

Notes

A progress meter will be displayed.
Make sure to set up TrainConfig properly

source

3D Data Loading

Critical component

See Data Partitioning for instructions on how to set it up properly.

ParametricDFNOs.DFNO_3D.DataConfig — Type

DataConfig

A struct for configuring the data loading process for model training and validation.

Fields

ntrain: Number of training samples.
nvalid: Number of validation samples.
x_key: Key under which input (X) data is stored in the JLD2 file.
x_file: Path to the file containing input (X) data.
y_key: Key under which output (Y) data is stored in the JLD2 file.
y_file: Path to the file containing output (Y) data.
modelConfig: An instance of ModelConfig that contains model-specific configurations.

Description

This struct stores paths and keys for data files, along with the counts of training and validation samples, to facilitate data preparation and loading in a distributed computing environment. It is tightly coupled with the model's configuration, especially for partitioning the data across different processing nodes.

source

ParametricDFNOs.DFNO_3D.loadDistData — Method

loadDistData(config::DataConfig; dist_read_x_tensor=UTILS.dist_read_tensor, dist_read_y_tensor=UTILS.dist_read_tensor, comm=MPI.COMM_WORLD)

Loads and distributes training and validation data across processes for distributed training.

Arguments

config: An instance of DataConfig which holds configuration for data loading.
dist_read_x_tensor: Function to read the distributed x tensors (defaults to dist_read_tensor).
dist_read_y_tensor: Function to read the distributed y tensors (defaults to dist_read_tensor).
comm: MPI communicator used for distributed data loading (defaults to MPI.COMM_WORLD).

Functionality

Initializes MPI communication to distribute data according to the model's partitioning scheme.
Loads input and output data from specified files and keys, and distributes them according to the data partitioning logic defined in the model configuration.
Prepares and separates the data into training and validation sets.

Returns

Four arrays: Training inputs, training outputs, validation inputs, and validation outputs, each formatted for the distributed training process.
x_train, y_train, x_valid, y_valid

This function manages the distribution and partitioning of large datasets across multiple nodes in a parallel computing environment, using MPI for communication. It is essential for ensuring that data is appropriately sliced and distributed to match the computational architecture and memory constraints.

source

Distributed read for complex storage scenarios

View Custom 3D Time varying FNO for an example of how you can extend this distributed read to a complex storage scheme.

ParametricDFNOs.DFNO_3D.UTILS.dist_loss — Method

dist_loss(local_pred_y, local_true_y)

Calculates the distributed normalized root mean squared error (NRMSE) between predicted and actual values.

Arguments

local_pred_y: Local tensor of predicted values, typically a subset of the whole prediction corresponding to the data handled by a specific node.
local_true_y: Local tensor of actual values corresponding to a subset of the data

Returns

A scalar value representing the relative L2 error of the predictions to the true values, which quantifies the prediction error normalized by the magnitude of the actual values.

source

ParametricDFNOs.DFNO_3D.UTILS.dist_read_tensor — Method

dist_read_tensor(file_name, key, indices)

Reads a tensor slice from an HDF5 file based on provided indices and reshapes the result.

Arguments

file_name: The name or path of the HDF5 file from which data is to be read.
key: The key within the HDF5 file that corresponds to the dataset of interest.
indices: The indices specifying the slice of the dataset to be extracted.

Returns

A reshaped array where the first dimension is singleton, extending the dimensions of the original data slice by one.

Functionality

Opens an HDF5 file in read-only mode.
Accesses the dataset associated with the provided key.
Extracts the slice of the dataset specified by indices.
Reshapes the extracted data, adding a singleton dimension at the beginning.

Example Usage

# To read a specific slice from a dataset within an HDF5 file
tensor_data = dist_read_tensor("data.h5", "dataset_key", (1:10, 5:15, 2:2))

source

3D Plotting

ParametricDFNOs.DFNO_3D.plotEvaluation — Method

plotEvaluation(modelConfig::ModelConfig, x_plot, y_plot, y_predict; trainConfig::TrainConfig, additional::Dict{String,Any} = Dict{String,Any}())

Generates a plots comparing the true and predicted values along with the input data and the absolute difference magnified by a factor of 5 along the time dimension.

Arguments

modelConfig: A ModelConfig struct specifying the dimensions and parameters of the model.
x_plot: Input data to the model.
y_plot: True output data from the model.
y_predict: Predicted output data from the model.
trainConfig: An optional TrainConfig struct containing training configurations, used for constructing the filename for the plot.
additional: An optional dictionary of additional objects that are added to the save file name.

This is a specific plotting function used for a 2 phase fluid flow problem, you can override this by passing your own plotting code that follows this method signature to train!

source

3D Checkpoints

ParametricDFNOs.DFNO_3D.loadWeights! — Method

loadWeights!(θ, filename, key, partition; comm=MPI.COMM_WORLD, isLocal=true)

Loads and distributes weights across processes for a parallelized model.

Arguments

θ: Dictionary of model parameters to be updated with loaded weights.
filename: Name or path of the file containing the saved weights.
key: Key under which the weights are saved in the file.
partition: The partitioning scheme used for distributed tensor weights.
comm: MPI communicator for the distributed system (defaults to MPI.COMM_WORLD).
isLocal: Flag indicating whether the file path should be under the generated 'weights' folder. (relative filepath or not)

Functionality

Loads weights from a JLD2 file and distributes them according to the partitioning across MPI ranks.
If gpu_flag is set, ensures weights are moved to GPU memory.

source

ParametricDFNOs.DFNO_3D.saveWeights — Method

saveWeights(θ, model::Model; additional=Dict{String,Any}(), comm=MPI.COMM_WORLD)

Saves the current state of the model's weights to a file, only executed by the rank 0 process.

Arguments

θ: The current state of the model's parameters.
model: The Model instance containing the model configurations.
additional: Include a Dict of strings that you would like your filename to contain and objects your file should contain
comm: The MPI communicator to be used for determining the process rank, can usually be ignored.

Functionality

Collects distributed weights from all processes.
Saves the weights to a JLD2 file with additional metadata.

Notes

The file is saved with a unique name generated from model parameters and additional metadata.

source