3D Time varying FNO
The implementation allows for discretization along time dimension to be 1 (only 1 time step). But you can also treat time as any other dimension, so this could also be as a generic 4D FNO
3D Model
ParametricDFNOs.DFNO_3D.Model
— TypeModel(config::ModelConfig)
A mutable structure representing the FNO, including configurations, weights, biases, and other parameters.
Constructor
This constructor initializes the model's layers and distributes the weights and biases according to the provided ModelConfig.
ParametricDFNOs.DFNO_3D.ModelConfig
— TypeModelConfig(;nx::Int=64, ny::Int=64, nz::Int=64, nt::Int=51, nc_in::Int=5, nc_mid::Int=128, nc_lift::Int=20, nc_out::Int=1, mx::Int=8, my::Int=8, mz::Int=8, mt::Int=4, nblocks::Int=4, dtype::DataType=Float32, partition::Vector{Int}=[1, 8], relu01::Bool=true)
A configuration struct that holds parameters for the Model setup.
Fields
nx
: The discretization along the x-dimension.ny
: The discretization along the y-dimension.nz
: The discretization along the z-dimension.nt
: The discretization along time.nc_in
: The number of input channels.nc_mid
: The number of intermediate channels.nc_lift
: The number of lifted channels.nc_out
: The number of output channels.mx
: The size of the model in the x-dimension for Fourier transform.my
: The size of the model in the y-dimension for Fourier transform.mz
: The size of the model in the z-dimension for Fourier transform.mt
: The size of the model in time for Fourier transform.nblocks
: The number of blocks in the model.dtype
: The data type used for computations, default isFloat32
.partition
: The partitioning configuration for distributed computing, default is[1, 8]
.relu01
: Boolean that determines whether the final relu layer is applied
ParametricDFNOs.DFNO_3D.initModel
— MethodinitModel(model::Model)
Initializes the Model by allocating and setting the initial values for the parameters.
Arguments
model
: An instance of Model to be initialized.
Returns
Returns the initialized parameters θ
which may be placed on GPU if the global gpu_flag
is set to true
.
3D Forward Pass
ParametricDFNOs.DFNO_3D.forward
— Methodforward(model::Model, θ, x::Any)
Performs the forward pass using the model defined by Model
.
The function applies a series of transformations to the input data x
using the model parameters θ
and the configurations within the model
.
Arguments
model
: The Model object that contains configurations and parameters for the forward pass.θ
: The parameters of the model, likely initialized byinitModel
.x
: Input data that will be passed through the model.
Returns
The output of the model after the forward pass, reshaped to the dimensions appropriate for the number of output channels, time steps, and spatial dimensions.
Details
The process includes:
- Reshaping the input and applying lifting operations.
- Processing through a series of blocks that includes spectral and standard convolutions, followed by batch normalization and non-linear activation functions (ReLU).
- Final projection to the output channels and application of a double ReLU operation to finalize the forward pass.
Notes
This function will move x
to GPU and perform GPU-enabled computations if gpu_flag
is set.
Caution
The input x
should have the correct number of elements but does not need to have any particular shape. However, incorrect permutation of the input dimensions will lead to incorrect solution operators.
3D Training
ParametricDFNOs.DFNO_3D.TrainConfig
— TypeTrainConfig
A configuration struct for setting up the training environment.
Fields
nbatch
: The number of samples per batch.epochs
: The number of full training cycles through the dataset.seed
: The random seed for reproducibility of shuffling and initialization.plot_every
: Frequency of plotting the evaluation metrics (everyn
epochs).learning_rate
: The step size for the optimizer.x_train
: Training input data.y_train
: Training target data.x_valid
: Validation input data.y_valid
: Validation target data.
Usage
This struct is used to encapsulate all necessary training parameters, including data and hyperparameters, to configure the training loop.
ParametricDFNOs.DFNO_3D.train!
— Methodtrain!(config::TrainConfig, model::Model, θ::Dict; comm=MPI.COMM_WORLD, plotEval::Function=plotEvaluation)
Conducts the training process for a given model using distributed computing and tracks the training and validation loss.
Arguments
config
: The training configuration settings as specified by the TrainConfig struct.model
: The neural network model to be trained as specified by the Model struct.θ
: A dictionary of model parameters to be optimized during training.comm
: The MPI communicator to be used for distributed training (defaults toMPI.COMM_WORLD
).plotEval
: The plotting function called to evaluate training progress (defaults to plotEvaluation).
Training Procedure
- The function sets up the optimizer and initializes training and validation data.
- It goes through the specified number of training epochs, updating model parameters via gradient descent.
- If the
gpu_flag
is set, computations are performed on a GPU. - At specified intervals, the training process is evaluated by plotting the training and validation losses and predicted vs. actual outputs.
Outputs
- Updates the model's parameters in-place.
- Produces plots every epoch to visually evaluate model performance.
- Saves weights every
2
epochs
Notes
- A progress meter will be displayed.
- Make sure to set up TrainConfig properly
3D Data Loading
See Data Partitioning for instructions on how to set it up properly.
ParametricDFNOs.DFNO_3D.DataConfig
— TypeDataConfig
A struct for configuring the data loading process for model training and validation.
Fields
ntrain
: Number of training samples.nvalid
: Number of validation samples.x_key
: Key under which input (X) data is stored in the JLD2 file.x_file
: Path to the file containing input (X) data.y_key
: Key under which output (Y) data is stored in the JLD2 file.y_file
: Path to the file containing output (Y) data.modelConfig
: An instance of ModelConfig that contains model-specific configurations.
Description
This struct stores paths and keys for data files, along with the counts of training and validation samples, to facilitate data preparation and loading in a distributed computing environment. It is tightly coupled with the model's configuration, especially for partitioning the data across different processing nodes.
ParametricDFNOs.DFNO_3D.loadDistData
— MethodloadDistData(config::DataConfig; dist_read_x_tensor=UTILS.dist_read_tensor, dist_read_y_tensor=UTILS.dist_read_tensor, comm=MPI.COMM_WORLD)
Loads and distributes training and validation data across processes for distributed training.
Arguments
config
: An instance of DataConfig which holds configuration for data loading.dist_read_x_tensor
: Function to read the distributed x tensors (defaults todist_read_tensor
).dist_read_y_tensor
: Function to read the distributed y tensors (defaults todist_read_tensor
).comm
: MPI communicator used for distributed data loading (defaults toMPI.COMM_WORLD
).
Functionality
- Initializes MPI communication to distribute data according to the model's partitioning scheme.
- Loads input and output data from specified files and keys, and distributes them according to the data partitioning logic defined in the model configuration.
- Prepares and separates the data into training and validation sets.
Returns
- Four arrays: Training inputs, training outputs, validation inputs, and validation outputs, each formatted for the distributed training process.
x_train, y_train, x_valid, y_valid
This function manages the distribution and partitioning of large datasets across multiple nodes in a parallel computing environment, using MPI for communication. It is essential for ensuring that data is appropriately sliced and distributed to match the computational architecture and memory constraints.
View Custom 3D Time varying FNO for an example of how you can extend this distributed read to a complex storage scheme.
ParametricDFNOs.DFNO_3D.UTILS.dist_loss
— Methoddist_loss(local_pred_y, local_true_y)
Calculates the distributed normalized root mean squared error (NRMSE) between predicted and actual values.
Arguments
local_pred_y
: Local tensor of predicted values, typically a subset of the whole prediction corresponding to the data handled by a specific node.local_true_y
: Local tensor of actual values corresponding to a subset of the data
Returns
- A scalar value representing the relative L2 error of the predictions to the true values, which quantifies the prediction error normalized by the magnitude of the actual values.
ParametricDFNOs.DFNO_3D.UTILS.dist_read_tensor
— Methoddist_read_tensor(file_name, key, indices)
Reads a tensor slice from an HDF5 file based on provided indices and reshapes the result.
Arguments
file_name
: The name or path of the HDF5 file from which data is to be read.key
: The key within the HDF5 file that corresponds to the dataset of interest.indices
: The indices specifying the slice of the dataset to be extracted.
Returns
- A reshaped array where the first dimension is singleton, extending the dimensions of the original data slice by one.
Functionality
- Opens an HDF5 file in read-only mode.
- Accesses the dataset associated with the provided
key
. - Extracts the slice of the dataset specified by
indices
. - Reshapes the extracted data, adding a singleton dimension at the beginning.
Example Usage
# To read a specific slice from a dataset within an HDF5 file
tensor_data = dist_read_tensor("data.h5", "dataset_key", (1:10, 5:15, 2:2))
3D Plotting
ParametricDFNOs.DFNO_3D.plotEvaluation
— MethodplotEvaluation(modelConfig::ModelConfig, x_plot, y_plot, y_predict; trainConfig::TrainConfig, additional::Dict{String,Any} = Dict{String,Any}())
Generates a plots comparing the true and predicted values along with the input data and the absolute difference magnified by a factor of 5 along the time dimension.
Arguments
modelConfig
: A ModelConfig struct specifying the dimensions and parameters of the model.x_plot
: Input data to the model.y_plot
: True output data from the model.y_predict
: Predicted output data from the model.trainConfig
: An optional TrainConfig struct containing training configurations, used for constructing the filename for the plot.additional
: An optional dictionary of additional objects that are added to the save file name.
This is a specific plotting function used for a 2 phase fluid flow problem, you can override this by passing your own plotting code that follows this method signature to train!
3D Checkpoints
ParametricDFNOs.DFNO_3D.loadWeights!
— MethodloadWeights!(θ, filename, key, partition; comm=MPI.COMM_WORLD, isLocal=true)
Loads and distributes weights across processes for a parallelized model.
Arguments
θ
: Dictionary of model parameters to be updated with loaded weights.filename
: Name or path of the file containing the saved weights.key
: Key under which the weights are saved in the file.partition
: The partitioning scheme used for distributed tensor weights.comm
: MPI communicator for the distributed system (defaults toMPI.COMM_WORLD
).isLocal
: Flag indicating whether the file path should be under the generated 'weights' folder. (relative filepath or not)
Functionality
- Loads weights from a JLD2 file and distributes them according to the partitioning across MPI ranks.
- If
gpu_flag
is set, ensures weights are moved to GPU memory.
ParametricDFNOs.DFNO_3D.saveWeights
— MethodsaveWeights(θ, model::Model; additional=Dict{String,Any}(), comm=MPI.COMM_WORLD)
Saves the current state of the model's weights to a file, only executed by the rank 0 process.
Arguments
θ
: The current state of the model's parameters.model
: The Model instance containing the model configurations.additional
: Include a Dict of strings that you would like your filename to contain and objects your file should containcomm
: The MPI communicator to be used for determining the process rank, can usually be ignored.
Functionality
- Collects distributed weights from all processes.
- Saves the weights to a JLD2 file with additional metadata.
Notes
- The file is saved with a unique name generated from model parameters and additional metadata.