Constrained FWI with

This tutorials demonstrates the use of constrains for FWI. The wave equation is performed with JUDI and the projections wiht SetIntersectionProjection.

Note on runtime

Warning: this notebook takes more than 1 hour to run for 16 shots with two workers on an Intel 8168.

lscpu CPU information: Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70GHz

using SlimOptim
using Distributed, JUDI.TimeModeling, LinearAlgebra, PyPlot, SetIntersectionProjection, Printf
1. Prepare models

n = (251, 251) # nx, nz
d = (15., 15.) # hx, hz
o = (0., 0.); # ox, oz
# Squared slowness
m = 1.5f0^(-2) * ones(Float32, n)
m[101:150, 101:150] .= 1.7f0^(-2)
m0 = 1/1.5^2 * ones(Float32, n);
model = Model(n,d,o,m)
model0 = Model(n,d,o,m0);


vmin,vmax = extrema(m)
dmin,dmax = -.1,.1

subplot(3,1,1); imshow(m,aspect="auto",cmap="jet"); 
colorbar(); clim(vmin,vmax); title("True squared slowness (m)")

subplot(3,1,2); imshow(m0,aspect="auto",cmap="jet");
colorbar(); clim(vmin,vmax); title("Initial squared slowness (m0)");

subplot(3,1,3); imshow(m.-m0,aspect="auto",cmap="seismic");
colorbar(); clim(dmin,dmax); title("Difference (m-m0)");



2. Setup Constraints with SetIntersectionProjection

constraint = Vector{SetIntersectionProjection.set_definitions}()
constraint2 = Vector{SetIntersectionProjection.set_definitions}()

We setup two constaints:

  • Bounds that limit maximum and minimum velocity
  • TV, that limits variation and force a piece-wise constant structure
m_min = 0 .* m .+ minimum(m).*.5
m_max = 0 .* m .+ maximum(m)
set_type = "bounds"
TD_OP = "identity"
app_mode = ("matrix","")
custom_TD_OP = ([],false)
push!(constraint, set_definitions(set_type,TD_OP,vec(m_min),vec(m_max),app_mode,custom_TD_OP));
push!(constraint2, set_definitions(set_type,TD_OP,vec(m_min),vec(m_max),app_mode,custom_TD_OP));
(TV,dummy1,dummy2,dummy3) = get_TD_operator(model0,"TV",options.FL)
m_min = 0.0
m_max = norm(TV*vec(m),1) * .5
set_type = "l1"
TD_OP = "TV"
app_mode = ("matrix","")
custom_TD_OP = ([],false)
push!(constraint, set_definitions(set_type,TD_OP,m_min,m_max,app_mode,custom_TD_OP));
#set up constraints with bounds only, precompute some things and define projector
(P_sub2,TD_OP2,set_Prop2) = setup_constraints(constraint2, model0,options.FL)
(TD_OP2,AtA2,l2,y2) = PARSDMM_precompute_distribute(TD_OP2,set_Prop2,model0,options)
options2 = deepcopy(options)
options2.rho_ini = ones(length(TD_OP2))*10.0

proj_intersection2 = x-> PARSDMM(x, AtA2, TD_OP2, set_Prop2, P_sub2, model0, options2)  

# Projection function
function prj2(input)
    input = Float32.(input)
    (x,dummy1,dummy2,dymmy3) = proj_intersection2(vec(input))
    return x
prj2 (generic function with 1 method)
#set up constraints with bounds and TV
(P_sub,TD_OP,set_Prop) = setup_constraints(constraint, model0,options.FL)
(TD_OP,AtA,l,y) = PARSDMM_precompute_distribute(TD_OP,set_Prop,model0,options)
options.rho_ini = ones(length(TD_OP))*10.0

proj_intersection = x-> PARSDMM(x, AtA, TD_OP, set_Prop, P_sub, model0, options)

# Projection function
function prj(input)
    input = Float32.(input)
    (x,dummy1,dummy2,dymmy3) = proj_intersection(vec(input))
    return x
prj (generic function with 1 method)

3. Build a small local compute cluster (2 workers)

Setup OMP environment variables for the cluster

In the distributed compute case the workers that we add would be on different hardware, and we might add tens of workers in 2D and hundreds in 3D. Here we run on a single machine with only 2 workers, and so we need to be careful with details related to high performance computing. If we did not specify thread affinity, the two workers would compete for the same physical cores and the modeling would be incredibly slow.

We spin up the small 2-worker cluster by calling addprocs(2), and because we set the environment variable ENV["OMP_DISPLAY_ENV"] = "true" we will see the OMP environment printed out on each worker. In that output (below) we can verify that half of the total threads (44/2 = 22) are assigned to each socket on this 2 socket system. You can obtain more details about the hardware with the shell command lscpu.

We set four environment variables related to OpenMP:

  • OMP_DISPLAY_ENV prints out the OpenMP environment on each worker
  • OMP_PROC_BIND specifies that threads should be bound to physical cores
  • OMP_NUM_THREADS specifies the number of threads per workers is 1/2 the number of physical cores
  • GOMP_CPU_AFFINITY specifies which physical cores the threads run on for each worker

If you run the shell command top during execution, you will see 3 julia processes: the main process and two workers. The two workers should generally have about 50% of the system, and load average should tend towards the physical number of cores.

nthread = Sys.CPU_THREADS
nw = 2

ENV["OMP_PROC_BIND"] = "close"
ENV["OMP_NUM_THREADS"] = "$(div(nthread, nw))" 
@show workers()
for k in 1:nworkers()
    place1 = (k - 1) * div(nthread,nworkers())
    place2 = (k + 0) * div(nthread,nworkers()) - 1
    @show place1, place2, div(nthread, nw)
    @spawnat workers()[k] ENV["GOMP_CPU_AFFINITY"] = "$(place1)-$(place2)";
workers() = [2, 3]
(place1, place2, div(nthread, nw)) = (0, 3, 4)
(place1, place2, div(nthread, nw)) = (4, 7, 4)
@everywhere using Distributed, JUDI.TimeModeling, JUDI.SLIM_optim, LinearAlgebra, PyPlot, SetIntersectionProjection
4. Create source and receivers geometries

We use 8 shot locations evenly distributed across the left of the model.

tn = 3500  # Recording time in ms
dt = 2f0  # Shot record sampling rate in ms
f0 = 0.005 # Peak frquency in kHz
nsrc = 8
xsrc = convertToCell(d[1].*ones(Float32, nsrc))
ysrc = convertToCell(range(0f0, stop = 0f0, length = nsrc))
zsrc = convertToCell(range(0f0, (n[2] - 1)*d[2], length=nsrc))
src_geom = Geometry(xsrc, ysrc, zsrc; dt=dt, t=tn);
nrec = 251
xrec = (n[1] - 2)*d[1] .* ones(Float32, nrec)
yrec = 0f0
zrec = convertToCell(range(0f0, (n[2] - 1)*d[2], length=nrec))
rec_geom = Geometry(xrec, yrec, zrec; dt=dt, t=tn, nsrc=nsrc);

Visualize geometry

vmin,vmax = extrema(m)
dmin,dmax = -.1,.1

imshow(m,aspect="auto",cmap="jet", extent=[0, 3750, 3750, 0]); 
colorbar(); clim(vmin,vmax); title("True squared slowness (m)")
scatter(xsrc, zsrc, c="g", label="Sources")
scatter(xrec[1:4:end], zrec[1:4:end], c="c", label="Receiver")


5. Build F, the JUDI modeling operator

# True model operator
ntComp = get_computational_nt(src_geom, rec_geom, model)
info = Info(prod(n), nsrc, ntComp)
F = judiModeling(info, model, src_geom, rec_geom)

# Intial model operator
ntComp = get_computational_nt(src_geom, rec_geom, model0)
info = Info(prod(n), nsrc,ntComp)
F0 = judiModeling(info, model0, src_geom, rec_geom);
# Source function
fsrc = judiVector(src_geom, ricker_wavelet(tn, dt, f0));

6. Use F to create the data in both models

t1 = @elapsed begin
    dobs = F*fsrc;
@info @sprintf("Time in true model; %.2f seconds\n", t1);

t2 = @elapsed begin
    d0 = F0*fsrc;
@info @sprintf("Time in init model; %.2f seconds\n", t2);
┌ Info: Time in true model; 17.57 seconds
└ @ Main In[20]:4
┌ Info: Time in init model; 6.78 seconds
└ @ Main In[20]:9

Compute the residual data

r = d0 - dobs;

7. Visualize data

shots = [1,4,8]
Plot shot gathers for true model, initial model, and residual

The table below describes the data images below. We flip the direction of the residual and modeled data in order to help display the match with the true data.

Initial Residual Data
True Data Initial Data

Note that the data modeled in the initial model lacks a lot of reflectivity that is evident in the data modeled in the true model. We expect to recover this missing reflectivity with the FWI.

scale = 10.0 / sqrt(norm(dobs)^2 / length(
@show scale

nzero = 5
pad = ones(Float32,1751,nzero)

figure(figsize=(8,9)); clf()
for (iplot,ishot) in enumerate(shots)
    cat2 = hcat(reverse([ishot],dims=2), pad,[ishot], pad, reverse([ishot],dims=2))
    title(" Initial Residual sz=$(zsrc[ishot])   |   True sz=$(zsrc[ishot])   |   Initial sz=$(zsrc[ishot]) (flipped)");


scale = 0.01818860621034772

8. Assess if data is cycle skipped at the farthest offsets

Next we plot the far offset traces for these three shots in order to assess if the data is cycle skipped.

You can ovbserve in the plots below that the refraction waveforms (first arrivals) in the initial model are not cycle skipped with respect to the true model, so we can proceed.

A very significant part of the residual wavefield is actually reflections in this example.

scale = 10.0 / sqrt(norm(dobs)^2 / length(
t = [0.0:dt:tn;]

figure(figsize=(8,9)); clf()
for (iplot,ishot) in enumerate(shots)
    plot(t,[ishot][:,end],label="True Model $(ishot) at z=$(zsrc[ishot])");
    plot(t,[ishot][:,end],label="Initial Model $(ishot) at z=$(zsrc[ishot])");


9. Build the objective functions

Build src/rec positions mask

We use this mask to remove the imprint in gradients of proximity to source locations. The mask is set to 0 wherever a source or receiver is close, and is set to 1 otherwise. Without this mask most of the gradient updates would be concentrated close to sources where the model is correct.

wb_mask = ones(Float32,size(m))
wb_mask[1:5, :] .= 0;
wb_mask[end-5:end, :] .= 0;

imshow(wb_mask', aspect="auto",cmap="gray_r",clim=[0,+2]);
title("Water Bottom Mask");


Build the objective function

This method is called by the solver whenever the gradient is required. Steps in computing the gradient are as follows:

  1. Apply the adjoint of the Jacobian to the current residual J' * [F*v - d]
  2. Apply simple scaling based on the size of the first gradient, and save to apply to future gradients
# build Jacoian
J = judiJacobian(F0, fsrc)

function objective(F0, G, m, dobs, wb_mask)
    F0.model.m .= m
    t = @elapsed begin
        d0 = F0*fsrc
        G = J' * (d0 .- dobs)
    G .*= wb_mask
    ϕ = .5*norm(d0 .- dobs)^2
    if gscale == 0.0
        # compute scalar from first gradient, apply to future gradients
        global gscale = .25 ./ maximum(G) 
        @show gscale
    G .*= gscale
    return ϕ, vec(

# struct to save the first gradient scalar
gscale = 0f0
g(x) = objective(F0, x, dobs, wb_mask)
g (generic function with 1 method)

Compute gradient

tgrad1 = @elapsed begin
    grad1 = objective(F0, vec(m0), dobs, wb_mask)[2]
    gscale = 0
@show tgrad1;
gscale = 3.164575347982762e-5
tgrad1 = 20.692245796
dm = m0 .- m
grad1 = reshape(grad1, n)
mg2 = reshape(m0 .- grad1, n)

imshow(grad1' ./ maximum(abs,grad1),aspect="auto",cmap="seismic");
title("Initial Gradient without Illumination Compensation");

imshow(dm ./ maximum(abs,dm),aspect="auto",cmap="seismic");
title("Squared slowness Difference: (m0 - m)");

title("Updated squared slowness: (m0 - grad1)");

imshow(reshape(prj(mg2), n)',aspect="auto",cmap="seismic");
title("Updated projected (bounds + TV) squared slowness: prj(m0 - grad1)");

imshow(reshape(prj2(mg2), n)',aspect="auto",cmap="seismic");
title("Updated projected (bounds) squared slowness: prj(m0 - grad1)");

relative evolution to small, exiting PARSDMM (iteration 36)


input to PARSDMM is feasible, returning

10. Perform the FWI using minConf_PQN

We will do 10 functions evaluation cost of projected quasi-Newton with two setup:

  • Bounds constraints only
  • Bounds + tv constrains
# FWI with PQN
niter = 10
gscale = 0f0
options_pqn = pqn_options(progTol=0, store_trace=true, verbose=3, maxIter=niter)
JUDI.SLIM_optim.PQN_params(3, 1.0f-5, 0, 10, 0.0001f0, 10, false, false, true, 1.0f-6, 1.0f-7, 10, false, 20)
sol = pqn(g, vec(m0), prj, options_pqn);
Running PQN...
Number of L-BFGS Corrections to store: 10
Spectral initialization of SPG: 0
Maximum number of SPG iterations: 10
SPG optimality tolerance: 1.00e-06
SPG progress tolerance: 1.00e-07
PQN optimality tolerance: 1.00e-05
PQN progress tolerance: 0.00e+00
Quadratic initialization of line search: 0
Maximum number of function evaluations: 10
sol2 = minConf_PQN(g, vec(m0), prj2, options_pqn);
Running PQN...
Number of L-BFGS Corrections to store: 10
Spectral initialization of SPG: 0
Maximum number of SPG iterations: 10
SPG optimality tolerance: 1.00e-06
SPG progress tolerance: 1.00e-07
PQN optimality tolerance: 1.00e-05
PQN progress tolerance: 0.00e+00
Quadratic initialization of line search: 0
Maximum number of function evaluations: 10
mf = reshape(prj(sol.sol), n) # optimal solution
ϕ = sol.f_trace   # cost vs iteration
m1 = sol.x_trace  # model vs iteration
collect(m1[i] = reshape(m1[i], n) for i=1:length(ϕ));
relative evolution to small, exiting PARSDMM (iteration 15)
mf2 = reshape(prj(sol2.sol), n) # optimal solution
ϕ2 = sol2.f_trace   # cost vs iteration
m2 = sol2.x_trace  # model vs iteration
collect(m2[i] = reshape(m2[i], n) for i=1:length(ϕ2));
relative evolution to small, exiting PARSDMM (iteration 15)

11. Visualize velocity models and objective function

figure(figsize=(8,9)); clf()

colorbar(orientation="vertical");clim(vmin,vmax);title("Initial Velocity");

colorbar(orientation="vertical");clim(vmin,vmax);title("FWI Velocity");

colorbar(orientation="vertical");clim(vmin,vmax);title("FWI Velocity with TV");

colorbar(orientation="vertical");clim(vmin,vmax);title("True Velocity")



Display the velocity difference models

rms_v2 = @sprintf("%.1f m/s", sqrt(norm(m .- m0)^2 / length(m)))
rms_vf = @sprintf("%.1f m/s", sqrt(norm(m .- mf)^2 / length(m)))
rms_vf2 = @sprintf("%.1f m/s", sqrt(norm(m .- mf2)^2 / length(m)))

figure(figsize=(8,6)); clf()

subplot(3,1,1);imshow(m .- m0,aspect="auto",cmap="seismic");
title("Vtrue - Vinit difference, rms=$(rms_v2)");

subplot(3,1,2);imshow(m .- mf,aspect="auto",cmap="seismic");
title("Vtrue - Vfwi difference, rms=$(rms_vf)");

subplot(3,1,3);imshow(m .- mf2,aspect="auto",cmap="seismic");
title("Vtrue - Vfwi_TV difference, rms=$(rms_vf2)");



Display the cost function

figure(figsize=(8,4)); clf()
iters = [0:1:niter;]
plot(ϕ[2:end] ./ ϕ[2], marker="o", label="FWI_TV")
plot(ϕ2[2:end] ./ ϕ2[2], marker="o", label="FWI")
xlabel("Nonlinear Iteration")
ylabel("Normalized cost ||f(v) - d||")
title(@sprintf("FWI Objective Function reduced %.1f percent and %.1f percent with TV",
               100 * (ϕ[2] - ϕ[end]) / ϕ[2], 100 * (ϕ2[2] - ϕ2[end]) / ϕ2[2]));


Display data misfit vs model misfit

figure(figsize=(8,4)); clf()

c = [norm(m1[i] .- m, 2) for i in 1:length(m1)]
c2 = [norm(m2[i] .- m, 2) for i in 1:length(m2)]
loglog(c[2:end], ϕ[2:end], label="FWI_TV", marker="s", linewidth=1)
loglog(c2[2:end], ϕ2[2:end], label="FWI", marker="s", linewidth=1)
xlabel("Log Model residual")
ylabel("Log Data residual")
title("Misfit Trajectory, LOOK AT THAT TV MODEL ERROR");


12. Visualize data match

Generate data in the FWI velocity model

tf = @elapsed begin
    F0.model.m .= vec(mf)
    df = F0*fsrc;
@show tf;

tf2 = @elapsed begin
    F0.model.m .= vec(mf2)
    df2 = F0*fsrc;
@show tf2;
tf = 8.125829833
tf2 = 8.094312038

Compute residuals

rf = df - dobs;
rf2 = df2 - dobs;

Plot shot gathers for true, initial model, and fwi models

The table below describes the data images below. We will flip the direction of the residual and modeled data in order to help display the match with the true data. We include the initial data as shown above for easier comparison.

Initial Residual Data
True Data Initial Data
FWI Residual Data
True Data FWI Data

We first make a function to create the plots that we can re-use for the selected shots.

zsrc = trunc.(zsrc; digits=6)
function make_plot(index)
    figure(figsize=(8,6)); clf()
    cat2 = hcat(reverse([index],dims=2), pad,[index], pad, reverse([index],dims=2))
    catf = hcat(reverse([index],dims=2), pad,[index], pad, reverse([index],dims=2))
    catf2 = hcat(reverse([index],dims=2), pad,[index], pad, reverse([index],dims=2))
    title(" Initial Residual sz=$(zsrc[index])   ||   True sz=$(zsrc[index])   ||   Initial sz=$(zsrc[index]) (flipped)");
    title(" FWI Residual sz=$(zsrc[index])   ||   True sz=$(zsrc[index])   ||   FWI sz=$(zsrc[index]) (flipped)");
    title("TV FWI Residual sz=$(zsrc[index])   ||   True sz=$(zsrc[index])   ||   FWI sz=$(zsrc[index]) (flipped)");
make_plot (generic function with 1 method)

Data for the 1st shot, generated in the initial and FWI models



Data for the 4th shot, generated in the initial and FWI models



Data for the 8th shot, generated in the initial and FWI models



14. Remove workers
