Monitor performance of target GPU simulations

try reproducing rdma error

Nathanael Efrat-Henrici canceled after 29m 19s
buildkite-agent pipeline upload .buildkite/gpu_...
init :GPU:
buildkite-agent pipeline upload .buildkite/gpu_...buildkite-agent pipeline upload .buildkite/gpu_pipeline/pipeline.yml
Waited 25s
·
Ran in 6s
init :GPU:echo "--- Instantiate examples" && julia --project=examples -e 'using Pkg; Pkg.instantiate(;verbose=true)' && julia --project=examples -e 'using Pkg; Pkg.precompile()' && julia --project=examples -e 'using CUDA; CUDA.precompile_runtime()' && julia --project=examples -e 'using Pkg; Pkg.status()' && echo "--- Download artifacts" && julia --project=examples artifacts/download_artifacts.jl
Waited 52s
·
Ran in 6m 36s
dry baroclinic wavemkdir -p target_gpu_implicit_baroclinic_wave && nsys profile --trace=nvtx,mpi,cuda,osrt --output=target_gpu_implicit_baroclinic_wave/output_active/report julia --threads=3 --color=yes --project=examples examples/hybrid/driver.jl --config_file config/gpu_configs/target_gpu_implicit_baroclinic_wave.yml
Waited 24s
·
Ran in 1m 1s
moist Held-Suarezmkdir -p gpu_hs_rhoe_equil_55km_nz63_0M && nsys profile --trace=nvtx,mpi,cuda,osrt --output=gpu_hs_rhoe_equil_55km_nz63_0M/output_active/report julia --threads=3 --color=yes --project=examples examples/hybrid/driver.jl --config_file config/gpu_configs/gpu_hs_rhoe_equil_55km_nz63_0M.yml
Waited 24s
·
Ran in 1m 7s
moist Held-Suarez - 4 gpusmkdir -p gpu_hs_rhoe_equil_55km_nz63_0M_4process && srun --cpu-bind=threads --cpus-per-task=4 nsys profile --trace=nvtx,mpi,cuda,osrt --output=gpu_hs_rhoe_equil_55km_nz63_0M_4process/output_active/report-%q{PMI_RANK} julia --threads=3 --color=yes --project=examples examples/hybrid/driver.jl --config_file config/gpu_configs/gpu_hs_rhoe_equil_55km_nz63_0M_4process.yml
Waited 24s
·
Ran in 1m 9s
dry baroclinic wave - 4 gpusmkdir -p target_gpu_implicit_baroclinic_wave_4process && srun --cpu-bind=threads --cpus-per-task=4 nsys profile --trace=osrt,nvtx,cuda,mpi,ucx --output=target_gpu_implicit_baroclinic_wave_4process/output_active/report-%q{PMI_RANK} julia --threads=3 --color=yes --project=examples examples/hybrid/driver.jl --config_file config/gpu_configs/target_gpu_implicit_baroclinic_wave_4process.yml
Waited 1m 33s
·
Ran in 57s
gpu_aquaplanet_dyamond - strong scaling - 1 GPUmkdir -p gpu_aquaplanet_dyamond_ss_1process && srun --cpu-bind=threads --cpus-per-task=4 nsys profile --trace=nvtx,mpi,cuda,osrt --output=gpu_aquaplanet_dyamond_ss_1process/output_active/report julia --threads=3 --color=yes --project=examples examples/hybrid/driver.jl --config_file config/gpu_configs/gpu_aquaplanet_dyamond_ss_1process.yml
Waited 1m 35s
·
Ran in 11m 55s
gpu_aquaplanet_dyamond - strong scaling - 2 GPUsmkdir -p gpu_aquaplanet_dyamond_ss_2process && srun --cpu-bind=threads --cpus-per-task=4 julia --threads=3 --color=yes --project=examples examples/hybrid/driver.jl --config_file config/gpu_configs/gpu_aquaplanet_dyamond_ss_2process.yml
Waited 1m 35s
·
Ran in 8m 25s
gpu_aquaplanet_dyamond - strong scaling - 4 GPUsmkdir -p gpu_aquaplanet_dyamond_ss_4process && srun --cpu-bind=threads --cpus-per-task=4 julia --threads=3 --color=yes --project=examples examples/hybrid/driver.jl --config_file config/gpu_configs/gpu_aquaplanet_dyamond_ss_4process.yml
Waited 2m 31s
·
Ran in 7m 40s
gpu_aquaplanet_dyamond - strong scaling plotsmkdir -p gpu_aquaplanet_dyamond_ss && julia --color=yes --project=examples post_processing/plot_gpu_strong_scaling.jl gpu_aquaplanet_dyamond_ss
Waited 53s
·
Ran in 15s
gpu_aquaplanet_dyamond - weak scaling - 1 GPUmkdir -p gpu_aquaplanet_dyamond_ws_1process && srun --cpu-bind=threads --cpus-per-task=4 julia --threads=3 --color=yes --project=examples examples/hybrid/driver.jl --config_file config/gpu_configs/gpu_aquaplanet_dyamond_ws_1process.yml
Waited 2m 31s
·
Ran in 10m 7s
gpu_aquaplanet_dyamond - weak scaling - 2 GPUsmkdir -p gpu_aquaplanet_dyamond_ws_2process && srun --cpu-bind=threads --cpus-per-task=4 julia --threads=3 --color=yes --project=examples examples/hybrid/driver.jl --config_file config/gpu_configs/gpu_aquaplanet_dyamond_ws_2process.yml
Waited 10m 1s
·
Ran in 10m 8s
gpu_aquaplanet_dyamond - weak scaling - 4 GPUsmkdir -p gpu_aquaplanet_dyamond_ws_4process && srun --cpu-bind=threads --cpus-per-task=4 julia --threads=3 --color=yes --project=examples examples/hybrid/driver.jl --config_file config/gpu_configs/gpu_aquaplanet_dyamond_ws_4process.yml
Waited 10m 13s
·
Ran in 10m 34s
gpu_aquaplanet_dyamond - weak scaling plotsmkdir -p gpu_aquaplanet_dyamond_ws && julia --color=yes --project=examples post_processing/plot_gpu_weak_scaling.jl gpu_aquaplanet_dyamond_ws
Waited 36s
·
Ran in 15s
gpu_aquaplanet_diagedmf - 1 GPUmkdir -p gpu_aquaplanet_diagedmf && nsys profile --trace=nvtx,mpi,cuda,osrt --output=gpu_aquaplanet_diagedmf/output_active/report julia --threads=3 --color=yes --project=examples examples/hybrid/driver.jl --config_file config/gpu_configs/gpu_aquaplanet_diagedmf.yml
Canceled
Waited 12m 40s
·
Ran in 9m 15s
gpu_aquaplanet_progedmf - 1 GPUmkdir -p gpu_aquaplanet_progedmf && nsys profile --trace=nvtx,mpi,cuda,osrt --output=gpu_aquaplanet_progedmf/output_active/report julia --threads=3 --color=yes --project=examples examples/hybrid/driver.jl --config_file config/gpu_configs/gpu_aquaplanet_progedmf.yml
Canceled
Waited 13m 33s
·
Ran in 8m 29s
Total Job Run Time: 1h 28m