The Domain Level

Neon Domain level’s goal is to provide users with simple mechanisms for some specific domains. Currently, Neon focus on those domains where a regular cartesian discretizations are leveraged. Using a simple example will look at how the level mechanisms can be used.

Working with dense domains

Let’s use aspects of implicit geometries and the finite difference method to check out some elements of Neon. Our goal for this tutorial is to first define a sphere on a discrete dense domain via the level set method, then design two operators: one to expand the sphere and the other to compute its gradient.

We divide the tutorial in the following steps steps:

Neon backend - Choosing the hardware for the computation
Neon grid - Setting up a dense cartesian discretization
Neon field - Initializing the level set of a sphere
Neon map containers Expanding the sphere
Neon stencil containers - Computing the grad of the sphere

The complete code of the tutorial is located in the Neon/tutorials/introduction/domainLevel directory. However, in the following, we’ll start with an empty main function and guide you step by step.

Neon backend: choosing the hardware for the computation

We start our program by specifying the hardware we want to use. The process is introduced in depth by the Neon Set Level section.

Neon/tutorials/introduction/domainLevel/domainLevel.cpp
int main(int, char**)
{
    // Step 1 -> Neon backend: choosing the hardware for the computation
    Neon::Backend backend = [] {
        Neon::init();
        // auto runtime = Neon::Runtime::openmp;
        auto runtime = Neon::Runtime::stream;
        // We are overbooking XPU 0 three times
        std::vector<int> xpuIds{0, 0, 0};
        Neon::Backend    backend(xpuIds, runtime);
        // Printing some information
        NEON_INFO(backend.toString());
        return backend;
    }();

    return 0;
}

Just briefly, through a Neon::Backend object, we declare the runtime type (CUDA streams or OpenMP) and a list of resources IDs. In our example, we target the first GPU in the system (GPU ID 0) as it’s a typical configuration; however, we overbook the GPU three times to showcase the multi-GPU capabilities.

Note

Overbooking a GPU can have a negative impact on performance.
However, checking out (or debugging) multi-GPU mechanisms using a single GPU is quite handy.

Finally, before returning from the main, we also log some hardware information on the terminal to check that everything is as expected.

Warning

Remember always to call Neon::init(); to ensure that the Neon runtime has been initialized. The function can be call more than once.

By testing out our first code draft, we get the following output on the terminal:

Execution output
$ ./tutorial-domainLevel 
[20:33:16] Neon: CpuSys_t: Loading info on CPU subsystem
[20:33:16] Neon: GpuSys_t: Loading info on GPU subsystem 2 GPUs were detected.
[20:33:16] Neon: Backend_t (0x7ffd171aad40) - [runtime:stream] [nDev:3] [dev0:0 NVIDIARTXA6000] [dev1:0 NVIDIARTXA6000] [dev2:0 NVIDIARTXA6000] 

In particular, the last line describes the selected backend by providing the type, the number of devices as well as the device name. In this case, we are working on an Nvidia A4000 GPU.

Neon grid: setting up the cartesian discretization

Let’s now extend our previous code by defining a cartesian discretization for our problem; for example, let’s create our sphere on top of a 100^3 grid.

Neon/tutorials/introduction/domainLevel/domainLevel.cpp
    // ...

    // Step 2 -> Neon grid: setting up a 100^3 dense cartesian domain
    const int32_t  n = 100;
    Neon::index_3d dim(n, n, n);     // Size of the domain
    const double   voxelEdge = 1.0;  // Size of a voxel edge

    using Grid = Neon::domain::eGrid;  // Selecting one of the grid provided by Neon
    Grid grid = [&] {
        Neon::domain::Stencil gradStencil([] {
            // We use a center difference scheme to compute the grad
            // The order of the points is important,
            // as we'll leverage the specific order when computing the grad.
            // First positive direction on x, y and z,
            // then negative direction on x, y, z respectively.
            return std::vector<Neon::index_3d>{
                {1, 0, 0},
                {0, 1, 0},
                {0, 0, 1},
                {-1, 0, 0},
                {0, -1, 0},
                {0, 0, -1}};
        }());

        // Actual Neon grid allocation
        Grid grid(
            backend,  // <- Passing the target hardware for the computation
            dim,      // <- Dimension of the regular grid used for the discretizasion.
            [&](const Neon::index_3d&) -> bool {
                // We are looking for a dense domain,
                // so we are interested in all the points in the grid.
                return true;
            },             // <-  defining the active cells.
            gradStencil);  // <- Stencil that will be used during computations on the grid

        // Exporting some information
        NEON_INFO(grid.toString());
        grid.ioDomainToVtk("domain");

        return grid;
    }();

    return 0;
}

Neon Domain provides various grid implementations, and they all share the same interface. Dense, element sparse and block sparse are some of the provided grids (description and comparison of all the grids are under work). In our example, we selected an element sparse grid (Neon::domain::eGrid); however, we create a C++ alias instead of directly using the grid type (line 43). This approach allows us to switch in the future from one grid to another by just redefining the alias.

Note

Switching from one grid to another, like in the example, is done at compile time. Neon does not directly support dynamic grid switching. However, it can be implemented by the users via std::variant.

The main information to provide is:

For the initialization of the grid (line 108), we provide the following information:

dimension of the discretization box,
the set of cells of the discretization box that we are interested in,
the hardware to be used for any computation,
and finally a stencil which is basically defined as a vector of offsets (line 44).

A stencil may be seen as an uncommon parameter to provide as part of a grid initialization. The stencil is the union of all the stencils that are used by any computation on the grid.

Neon uses the stencil information for many critical optimization aspects. An ad-hoc compiler could automatically extract what stencil are used. However, Neon is developed as a C++ library, so it must rely on the user to retrieve such information.

Running again the tutorial with the additional code lines we obtain the following output:

Execution output
$ ./tutorial-domainLevel 
[20:33:16] Neon: CpuSys_t: Loading info on CPU subsystem
[20:33:16] Neon: GpuSys_t: Loading info on GPU subsystem 2 GPUs were detected.
[20:33:16] Neon: Backend_t (0x7ffd171aad40) - [runtime:stream] [nDev:3] [dev0:0 NVIDIARTXA6000] [dev1:0 NVIDIARTXA6000] [dev2:0 NVIDIARTXA6000] 
[20:33:18] Neon: [Domain Grid]:{eGrid}, [Background Grid]:{(100, 100, 100)}, [Active Cells]:{1000000}, [Idx Distribution]:{(333334,333333,333333)}, [Backend]:{Backend_t (0x55ce949fd450) - [runtime:stream] [nDev:3] [dev0:0 NVIDIARTXA6000] [dev1:0 NVIDIARTXA6000] [dev2:0 NVIDIARTXA6000] }

By logging the grid information (NEON_INFO(grid.toString()); on line 71), we can check some information about the grid. The last terminal line shows the selected grid type (eGrid in this case), the dimension of the grid, the number of active cells, as well as the number of cells per XPU.

By calling ioDomainToVtk method, we can also inspect the created domain (grid) via Paraview as the code generates a vtk file (domain). With an application working on a dense domain, the Paraview will show all the cells in the discretization box as active. Moreover, it will show the mapping between cells and hardware devices as reported in the following picture:

Mapping between cells and hardware devices

Neon field: defining data over the cartesian discretization

Let’s now create some metadata on top of our 100^3 cartesian discretization to store the level set of a sphere. Neon fields, which are allocated from a Neon grid, are the tool for the task. A Neon field is characterized by the type of each cell and the number of components stored in each cell (we call it cardinality of the field). The cell type is defined at compile time, while cardinality can be defined either at compile time or at runtime. In our example, we use the runtime version.

Neon/tutorials/introduction/domainLevel/domainLevel.cpp
    //...

    // Step 3 -> Neon field: initializing a sphere through its signed distance function

    auto sphere = [dim, n, voxelEdge, &grid] {
        // Creating a scalar field over the grid.
        // Inactive cells will get associated with a default value of -100 */
        auto sphere = grid.newField<double>("sphere",  // <- Given name of the field.
                                            1,         // <- Number of field's component per grid point.
                                            -100);     // <- Default value for non active points.

        const double r = (n * voxelEdge / 2) * .5;

        // We initialize the field with the level set of a sphere.
        // We leverage the forEachActiveCell method to easily iterate over the active cells.
        sphere.forEachActiveCell([&](const Neon::index_3d& idx, int, double& value) {
            double sdf = sdfCenteredSphere(idx, dim, voxelEdge, r);
            value = sdf;
        });
        return sphere;
    }();

    // Exporting some information of the level set field on terminal and on a vtk file.
    NEON_INFO(sphere.toString());
    sphere.ioToVtk("sphere-levelSet", "levelSet");


    return 0;
}

Neon map containers: expanding the sphere via a level set

Let’s now manipulate the sdf to expand zero level set of the sphere. The operation can be simply implemented by adding a constant to all the cells of the grid. The result will be a level set field (not a signed distance function).

At the moment the values of the sphere field are stored on the host, indeed working on the host side is the only way we can do IO type of operations on Neon grids. To move the data to the XPUs for the computation we just need to call the updateIOmethod.

Warning

All Neon operations related to runnig computation or moving data have a asynchronous semantic. At the domain level it is up to the user to handle any sort of synchronization manually.

Neon/tutorials/introduction/domainLevel/domainLevel.cpp
    // ...

    // Step 4 -> Neon map containers: expanding the sphere via a level set

    {  // loading the sphere to XPUs
        sphere.updateCompute(Neon::Backend::mainStreamIdx);
    }
    // Run a container that adds a value to the sdf sphere.
    // The result is a level set of an expanded sphere (not more a sdf).
    // We run the container asynchronously on the main stream
    expandLevelSet(sphere, 9.0).run(Neon::Backend::mainStreamIdx);

    {  // Moving asynchronously the values of the newly computed
        // level set values to the host
        sphere.updateIO(Neon::Backend::mainStreamIdx);
        // Waiting for the transfer to complete.
        backend.sync(Neon::Backend::mainStreamIdx);
        // Exporting once again the field to vtk
        sphere.ioToVtk("extended-sphere-levelSet", "levelSet");
    }

    return 0;
}

The expandLevelSet method returns a Neon Container, which is the equivalent of a kernel in CUDA. A Neon Container executes in parallel a computation over all the XPU devices defined by the Backend. The following is the structure of the Container used to expand our sphere.

A Container can be generated from a Neon grid and is composed by two parts: a Loading Lambda where user declare what files are going to be used for the computation and a Compute Lambda, which contains the actual XPU code.

Once a field is loaded we can read and write its values by using a reference to a Idx (which is passed to the Compute Lambda by the Neon runtime) and specifying what component of the field we are interested in. Once a Container is created, it can be executed by calling the run method (see the above code).

Neon/tutorials/introduction/domainLevel/expandLevelSet.cu
template <typename Field>
auto expandLevelSet(Field& sdf,
                    double expansion)
    -> Neon::set::Container
{
    return sdf.getGrid().getContainer(
        "ExpandLevelSet",
        // Neon Loading Lambda
        [&, expansion](Neon::set::Loader& L) {
            auto& px = L.load(sdf);

            // Neon Compute Lambda
            return [=] NEON_CUDA_HOST_DEVICE(
                       const typename Field::Idx& cell) mutable {
                px(cell, 0) -= expansion;
            };
        });
}

Neon stencil containers: computing the grad of the level set field

Neon/tutorials/introduction/domainLevel/domainLevel.cpp
    // ...

    // Step 5 -> Neon stencil containers: computing the grad of the level set field
    auto grad = grid.newField<double>("grad",  // <- Given name of the field.
                                      3,       // <- Number of field's component per grid point.
                                      0);      // <- Default value for non active points.

    Neon::set::HuOptions huOptions(Neon::set::TransferMode::get,
                                   true);
    sphere.haloUpdate(huOptions);

    // Execution of a container that computes the gradient of the sphere
    computeGrad(sphere, grad, voxelEdge).run(Neon::Backend::mainStreamIdx);

    {  // Moving the grad data onto the host and exporting it to vtk
        grad.updateIO(Neon::Backend::mainStreamIdx);
        backend.sync(Neon::Backend::mainStreamIdx);
        grad.ioToVtk("extended-sphere-grad", "grad");
    }

    return 0;

Neon/tutorials/introduction/domainLevel/expandSphere.cu
/**
 * A function that generates a Neon Container to compute the grad over a scalar field.
 * Note: in Neon only constant field can be used as input for stencil computation
 */
template <typename Field>
auto computeGrad(const Field& levelSetField /** input scalar field we want to compute the grad.*/,
                 Field&       gradField /** input scalar field we want to compute the grad.*/,
                 double       h)
    -> Neon::set::Container
{
    if (levelSetField.getCardinality() != 1 || gradField.getCardinality() != 3) {
        // We check that the level set field is a scalar field,
        // while the gradient is a three component field.
        NEON_THROW_UNSUPPORTED_OPERATION("Wrong cardinality detected.");
    }

    // The following Neon compute-lambda works with the assumption that the first elements of the stencil
    // given to the grid initialization are as follow:
    //
    //      {1, 0, 0},
    //      {0, 1, 0},
    //      {0, 0, 1},
    //      {-1, 0, 0},
    //      {0, -1, 0},
    //      {0, 0, -1}
    return levelSetField.getGrid().getContainer(
        "computeGrad",
        // Neon Loading Lambda
        [&, h](Neon::set::Loader& L) {
            // Loading the sdf field for a stencil type of computation
            // as we will be using a 6 point stencil to compute the gradient
            auto& levelSet = L.load(levelSetField, Neon::Compute::STENCIL);
            auto& grad = L.load(gradField);

            // We can nicely compute the inverse of the spacing in the loading lambda
            const auto oneOverTwoH = 1. / (2.0 * h);

            // Neon Compute Lambda
            return [=] NEON_CUDA_HOST_DEVICE(
                       const typename Field::Idx& cell) mutable {
                // Central difference
                for (int i = 0; i < 3; i++) {
                    auto upIdx = i;
                    auto dwIdx = i + 3;

                    auto [valUp, isValidUp] = levelSet.nghVal(cell, upIdx, 0, 0);
                    auto [valDw, isValidDw] = levelSet.nghVal(cell, dwIdx, 0, 0);

                    if (!isValidUp || !isValidDw) {
                        grad(cell, 0) = 0;
                        grad(cell, 1) = 0;
                        grad(cell, 2) = 0;
                        break;
                    } else {
                        grad(cell, i) = (valUp - valDw) / oneOverTwoH;
                    }
                }
            };
        });
}

Staggered Grids