3. feelpp.benchmarking for advanced performance analysis

1. What you will learn in this course

  • How to configure feelpp.benchmarking to overcome common benchmarking challenges

  • How to create custom figures to visualize benchmarking results

  • How to create a continuous benchmarking workflow

When benchmarking applications on HPC systems, multiple challenges arise. For example,

  • how does one guarantees reproducibility of an application across different systems?

  • how to compare the performance of an application across different systems?

  • how do I handle large input datasets and memory requirements?

This course will teach you how to use feelpp.benchmarking to overcome these challenges.

2. Configuring containers in feelpp.benchmarking

A good practice for guaranteeing reproducibility of an application is to containerize it. This way, the application can be run in the same environment across different systems. feelpp.benchmarking supports the use of containers to run benchmarks.

2.1. Specifying container platforms in ReFrame system settings.

The first step to supporting containers, is to modify ReFrame’s system configuration. This can be simply done by adding the following lines to your system partition configuration:

"container_platforms":[{ "type": "Apptainer" }]

Multiple container runtimes are supported:

  • Docker

  • Singularity

  • Apptainer

  • Sarus

  • Shifter

Custom modules and environment variables can be specified in this field.

2.2. Configuring containers in the Machine Specifications file

The next step is to configure the container settings in the machine specifications file.

feelpp.benchmarking's machine specification interface is equiped with a container field that allows you to specify common directory paths, options and some settings for executing benchmarks in containers.

Container settings in the Machine Specification file
"containers":{
    "apptainer":{
        "image_base_dir":"/data/my_images/",
        "options":[ "--bind /opt/:/opt/" ],
        "cachedir": "/tmp/apptainer_cache",
        "tmpdir": "/tmp/apptainer_tmp"
    }
}

containers is a dictionary containing the container runtime name as the key and the settings as value. The settings include:

  • image_base_dir: The base directory where the container images are stored, or will be stored.

  • options: A list of options to pass to the container execution command.

  • executable: If the command used for pulling images is different from the default, it can be specified here.

  • cachedir: The directory where the container cache is stored.

  • tmpdir: The directory where container temporary files are stored.

At the moment, only the apptainer container is supported. Support for other container runtimes will be added in future releases.

2.3. Configuring containers in the Benchmark Specification file

Concerning the benchmark specification file, the container settings are specified in the platform field.

This field will containg all possible platform configurations, including the built-in (local). However, it is the machine configuration file that will determine where tests will be executed, by specifying it in the targets field.

Platform settings in the Benchmark Specification file
"platforms": {
    "apptainer":{
        "image": {
            "url":"oras://ghcr.io/feelpp/my_image:master-sif",
            "filepath":"{{machine.containers.apptainer.image_base_dir}}/my_image.sif"
        },
        "input_dir":"/input_data",
        "options": [
            "--home {{machine.output_app_dir}}",
            "--bind {{machine.input_dataset_base_dir}}/:{{platforms.apptainer.input_dir}}",
            "--env OMP_NUM_THREADS=1"
        ],
        "append_app_option":[]
    },
    "builtin":{
        "input_dir":"{{machine.input_dataset_base_dir}}",
        "append_app_option":[]
    }
}

This example will first pull the oras://ghcr.io/feelpp/my_image:master-sif image and store it in the {{machine.containers.apptainer.image_base_dir}}/my_image.sif directory. Then, ReFrame will launch tests as:

apptainer exec --home /path/to/output --bind /path/to/input_data/:/input_data --env OMP_NUM_THREADS=1 data/my_images/my_image.sif ...

Let’s break down the platforms field:

  • apptainer: The container runtime name. Must match the name specified in the machine configuration file.

  • image: The image field contains the path where the container image can, or will be, be found. If the URL is specified, the image will be pulled and stored in filepath.

  • input_dir: The directory where input data is stored in the container.

  • options: A list of options to pass to the container execution command.

  • append_app_option: A list of options to append to the application command. This allows customizing the application execution depending on the platform.

Image pulling will be done ONCE, and usually from a head node.

3. Remote Input Files

It is possible to download input files for your application from a remote location. This is useful when input files are not yet available on the system where the benchmark is executed, and for users to not have to manually upload the files.

To use remote input files, you can use the remote_input_dependencies field in the Benchmark Specification file.

The remote_input_dependencies field is a dictionary containing a custom name as the key and a RemoteData object as a value.

The RemoteData object can be constructed using the following syntax:

{
    // File, folder or item
    "my_data_management_platform": { "file": "my_id" },
    // The path to download the files to
    "destination": "/path/to/destination"
}
Using remote input files in the Benchmark Specification file
"remote_input_dependencies":{
    "my_input_file": { "girder":{"file":"abcdefg123456"},"destination":"/path/to/destination.txt"}
}

This example will download the file with the ID abcdefg123456 from the Girder platform and store it as /path/to/destination.txt.

Finally, the destination path can be accessed by other fields using the {{remote_input_dependencies.my_input_file.destination}} syntax.

4. Input File Dependencies

The feelpp.benchmarking framework allows you to specify input file dependencies in the Benchmark Specification file. This is notably useful to impose validation checks on the input files before running the benchmark.

Additionally, this field can be used to handle data transfer between different disks.

For example, most HPC clusters have Large Capacity disks and High Performance disks. A common problem is that the application’s performance can be bottlenecked by the disk’s read/write speed if the input files are stored on the Large Capacity disk. In this case, you can use the input_file_dependencies field to copy the input files to the High Performance disk before running the benchmark, and delete them after the benchmark is completed so that the High Performance disk does not become cluttered.

4.1. How to specify file dependencies

The input_file_dependencies field is a dictionary containing a custom name as the key and an absolute or relative path to the file as the value.

However, this field is highly dependent on a special field on the Machine Specification file: input_user_dir.

  • If this field is not specified and the filepaths are relative, they are relative to machine.input_dataset_base_dir.

  • If this field is specified, the filepaths should be relative to machine.input_user_dir, and they will be copied from machine.input_user_dir to machine.input_dataset_base_dir keeping the same directory structure.

Using file dependencies

If we have the following diretory paths set on the machine configuration file:

{
"input_dataset_base_dir":"/hpd/input_data",
"input_user_dir":"/lcd/input_data"
}

And the following input_file_dependencies field in the Benchmark Specification file:

"input_file_dependencies":{
    "my_input_file": "input_file.txt"
}
  1. First, the framework will check if the file lcd/input_data/input_file.txt exists.

  2. Right before job submission, it will then copy the file from lcd/input_data/input_file.txt to hpd/input_data/input_file.txt.

  3. The jobs will run using the file hpd/input_data/input_file.txt.

  4. After the job is completed, the file hpd/input_data/input_file.txt will be deleted.

The existance of all file dependencies will be verified.
Parameters can be used in the input_file_dependencies field for refactoring.

5. Custom Performance Variables

Custom performance variables can be created by performing operations on existing ones. This can be done in the feelpp.benchmarking framework by specifying a custom performance variable in the Benchmark Specification file.

The custom_varialbes field should be added on the scalability section. This field should contain a list of dictionaries, each dictionary containing the following fields:

  • name: The given name of the custom performance variable.

  • operation: The operation to be performed on the existing performance variables. (available operations: sum, mean, max, min ).

  • columns: The list of performance variables to be used in the operation.

  • unit: The unit of the custom performance variable.

Custom variables can be used in other custom variables. Recursion is allowed.
Custom performance variables

Let’s say that the /output/data.csv file contains the following performance variables:

initialization, simulation, postprocessing
0.1, 0.2, 0.3

Then, the total time custom performance variable can be specified as follows:

"scalability": {
    "directory":"/output",
    "stages": [
        {
            "name": "stage1",
            "filepath":"data.csv",
            "format":"csv"
        }
    ],
    "custom_variables":[
        {
            "name": "total_time",
            "op":"sum",
            "columns": ["stage1_initialization", "stage1_simulation", "stage1_postprocessing"],
            "unit":"s"
        }
    ]
}
Variables from different files (stages) can be aggregated in the same custom variable.

6. Memory management

feelpp.benchmarking supports specifying the total memory required by your application under the resources field. Using the system’s available memory per node, the framework will ensure that the necessary resources are allocated for each test to run.

The system’s available memory per node should be specified in ReFrame’s System settings file, in GB, under an extra.memory_per_node field the partition configuration.

Available Memory per Node in ReFrame system settings
"partitions":[{
    "extras":{
        "memory_per_node":256
    }
}]

6.1. Specifying memory in resources

If the memory requirements by your application are independent of the specified parameter space, users can simply specify the memory requirements (in GB) in the resources field of the Benchmark Specification file.

Static memory requirements
"resources": {
    "tasks": 128,
    "memory": 1024
}

If the system has 256GB of availabe memory per node, and 128 tasks per node. The above configuration will still execute the application with 128 tasks, but will distribute the tasks on 4 nodes: 32 tasks per node. This way, the application will use 256GB of memory per node, under the assumption that good partitioning is used.

Specify a little less memory than the available memory per node to avoid swapping.

6.2. Parameter dependent memory specifications

The memory field can be parametrized. This is quite useful if your application requires different memory for different parameter values. For example, if you want to compare the application’s performance for different mesh sizes.

Dynamic memory requirements

Let’s suppose that you want to benchmark an application using 3 different mesh resolutions: M1, M2, M3.

The benchmark specification will then have the following parameter

{
"parameters":[
    {
        "name":"mesh",
        "sequence":["M1", "M2", "M3"]
    }
]
}

However, these mesh resolutions require the following memory to be free: 512GB, 1024GB, 2048GB respectively.

Then, we should create an additional parameter and set its value in the resources field.:

{
"resources":{
    "tasks":128,
    "memory": "{{parameters.memory.value}}"
},
"parameters":[
    {
        "name":"memory",
        "sequence":[512, 1024, 2048]
    },
    {
        "name":"mesh",
        "sequence":["M1", "M2", "M3"]
    }
]
}

But…​ doing just this will create 9 tests (3x3) and we do not want to run the same test with different memory values. To avoid this, we should constrain the memory parameter to be dependent on the mesh parameter:

{
"resources":{
    "tasks":128,
    "memory": "{{parameters.memory.value}}"
},
"parameters":[
    {
        "name":"memory",
        "sequence":[512, 1024, 2048]
    },
    {
        "name":"mesh",
        "sequence":["M1", "M2", "M3"],
        "conditions":{
            "M1":[{"memory":[512]}],
            "M2":[{"memory":[1024]}],
            "M3":[{"memory":[2048]}]
        }
    }
]
}

The previous configuration will ensure that the case M1 will be run with 512GB of memory, M2 with 1024GB, and M3 with 2048GB.

7. Website persistance and aggregating results

The framework allows users to create a dashboard that aggregates results from multiple benchmarks. This is useful for comparing different machines, applications, or use cases.

7.1. The dashboard configuration file

Whenever a benchmark is done, a file called website_config.json is created inside the directory specified under reports_base_dir on the machine specification file.

If this file already exists from a previous benchmark, it will be updated to include the latest results. For example, when a benchmark runs on a new system.

This file describes the dashboard hierarchy and where to find the benchmark results for each case/machine/application.

Each website configuration file corresponds exactly to one dashboard, so users can have multiple dashboards by defining multiple website configuration files.

The file is structured as follows:

{
"execution_mapping":{
    "my_application":{
        "my_machine":{
            "my_usecase": {
                "platform": "local", //Or "girder"
                "path": "./path/to/use_case/results" //Directory containing reports for a single machine-application-usecase combination
            }
        }
    }
},
"machines":{
    "my_machine":{
        "display_name": "My Machine",
        "description": "A description of the machine",
    }
},
"applications":{
    "my_application":{
        "display_name": "My Application",
        "description": "A description of the application",
        "main_variables":[] //Used for aggregations
    }
},
"use_cases":{
    "my_usecase":{
        "display_name": "My Use Case",
        "description": "A description of the use case",
    }
}

}

The execution_mapping field describes the hierarchy of the dashboard. That is, which applications where run on which machines and for which use cases. The path field supports having reports stored on remote locations, such as Girder.

The machines, applications, and use_cases fields define the display names and descriptions for each machine, application, and use case, respectively.

7.2. Aggregating results

If our benchmarks contain more than 2 parameters, it can be difficult to visualize results on a single figure. The framework allows users to aggregate results by reducing the number of dimensions in the data. In addition, the framework indexes reports based on their date, the system they where executed on, the platform and environment, as well as the application and use case benchmarked.

7.2.1. The aggregations field on Figure Specifications

In order to aggregate data (and reduce the number of dimensions), the aggregations field can be used on the figure specification file. This field is a list of dictionaries, each containing the fields column and agg, describing the column/parameter to aggregate and the operation to perform, respectively.

Available aggregation operations are:

  • mean: The mean of the values

  • sum: The sum of the values

  • min: The minimum value

  • max: The maximum value

  • filter:value : Filters the data based on the value of the column

Example of aggregating data

Let’s consider the benchmark with the following parameters:

  • tasks : The number of tasks used in the benchmark

  • mesh : The mesh size used in the benchmark

  • solver : The solver used in the benchmark

Then, if we want to plot the mean execution times based on the tasks parameter, we will do:

{
    "title": "Avg Execution times by # tasks",
    "plot_types": [ "scatter" ],
    "transformation": "performance",
    "xaxis": { "parameter": "mesh", "label": "mesh levels" },
    "yaxis": { "label": "Time (s)" },
    "secondary_axis": {"parameter":"solver", "label":"Solver"},
    "color_axis":{ "parameter": "performance_variable", "label":"Performance variables" },
    "aggregations":[ {"column":"tasks","agg":"mean"} ]
}

This will generate the following figure:

Aggregation Example

7.2.2. Overview configuration file

It is possible to create overview pages that show the performance of a group of benchmarks. This is useful for comparing different machines, applications, or use cases.

Each perforamnce value, for all reports, is indexed by:

  • The parameter space

  • The variable name

  • The benchmark date

  • The system where the benchmark was executed

  • The platform where the benchmark was executed

  • The environment where the benchmark was executed

  • The application

  • The use case

Depending on the dashboard page you are located, benchmarks are filtered by default. For example, if you are in the machine page, only benchmarks for that machine will be shown.

Accessible column names:

  • environment

  • platform

  • result

  • machine

  • usecase

  • date

Example: Plot the performance evolution in time of reports for a specific machine-application-use_case combination
{
    "title": "Performance over time",
    "plot_types": [ "scatter" ],
    "transformation": "performance",
    "xaxis": { "parameter": "date", "label": "Report Date" },
    "yaxis": { "label": "Performance (s)" },
    "color_axis":{ "parameter": "solver", "label":"Solver" },
    "aggregations":[
        {"column":"tasks","agg":"mean"},
        {"column":"mesh","agg":"mean"},
        {"column":"performance_variable","agg":"sum"}
    ]
}

The dashboard administrator can define the overview configuration file, which is a JSON file that describes the figures to be displayed on each overview page.

This overview configuration file is currently too extensive and verbose and needs to be simplified, so it will not be treated in this course. However, be aware that it is possible to create overview pages that show the performance of a group of benchmarks.

8. Continuous benchmarking

It is possible to configure a workflow for feelpp.benchmarking to continuously benchmark an application. This will allow users to:

  • Centralize benchmark results in a dashboard

  • Connect the CB pipeline to an existing CI pipeline

  • Ensure non-regression and analyse the performance over time

  • Compare the performance of different versions of the application

8.1. Launching a benchmark

Currently, the workflow is not directly available from feelpp.benchmarking. However, a template repository is coming soon to help users set up the workflow.

In order to have this workflow running, self-hosted runners for your machines are required. And one "default" runner is needed to orchestrate launching the application on different machines.

Benchmark workflow
  1. The workflow takes as input: the benchmark configurations paths, a list of machines to run, and optionally a Girder ID to updload the website to (more data management platorms will be supported in the future).

  2. In a default runner (can even be a GitHub hosted runner), machine specification files are read and filtered by the given machine name list input. This allows to launch the application on multiple machines. Then, a "Matrix" is created to later tell the workflow what runners to launch.

  3. Then, only desired machine configurations are uploaded as GitHub artifacts.

  4. On each HPC system, the machine configuration is downloaded.

  5. A Python environment is then set up, depending on the system. (e.g. loading necessary modules, creating the python virtual environment, downloading feelpp-benchmarking).

  6. feelpp.benchmarking launches all parametrized tests using the feelpp-benchmarking-exec command.

  7. When benchmarks are done, results are uploaded as GitHub artifacts in order to communicate them with the default runner.

  8. The default runner then collects all results and uploads them to the data management platform.

  9. If a custom remote ID is provided to upload the dashboard to, the dashboard is uploaded to the data management platform. Otherwise, a pull request tagged with new-benchmark is created to preview the results. We do the preview with Netlify.

This workflow requires setting up the Girder data management platform so that it contains to following folders: staging, production, denied.

8.2. Deploying the dashboard

GitHub pages can be configured to have a persistent dashboard containing benchmarks. But we want to have only pertinent benchmarks on this publicly available dashboard. To do so, the following workflow is proposed.

Dashboard workflow
  1. When a pull request tagged with new-benchmark is closed, the deploy.yml workflow is launched.

  2. If the pull request is closed, all benchmarks present on the staging/ folder are moved to the denied folder (on the data management platform).

  3. If the pull request is merged, staging and production dashboard configuration files are downloaded.

  4. Both configuration files are merged, so that staging benchmarks are added to the production dashboard.

  5. The new dashboard configuration is uploaded, and staging benchmarks are moved to production.

  6. This will trigger the GitHub pages deployment, having all production benchmarks.

9. Wrap-up and Q&A

feelpp.benchmarking just recently made its first release. It is still under development and many features are still being added and improved. Contribution and feature requests are more than welcome on this open-source project. Check out the GitHub repository.

  • Questions?

  • Suggestions? Feedback?

  • Star the project on GitHub!