Optimization

To improve the performance of our program we would like to parallelize it as much as possible. Since the processing of trees (i.e. setting the foliage, computing the cartesian coordinates, scaling, translating…​) is independent from one tree to another, we can already parallelize this part of the program.

Do do so, we’ll use the async library from the C++ standard library for asynchronous tasks. We’ll create a task for each tree and run them in parallel.

We’ll to start by putting the logic of the for loop in a function that we’re going to pass to the std::async function.

processTree(
    Tree<MeshCgal, K> &tree, const MeshCgal &foliage_round,
    const MeshCgal &foliage_oval, const MeshCgal &foliage_cone,
    const std::array<double, 2> &M_origin,
    const std::array<double, 2> &M_height_range, int &M_nNoHeight,
    const Config &config, std::shared_ptr<MeshGeom> terrainMesh)

Then we’ll create a vector of std::future objects that will store the results of the asynchronous tasks and we’ll run the tasks in parallel.

std::vector<std::future<std::shared_ptr<MeshGeom>>> futures;

for (auto &tree : M_treeLibrary) {
	futures.push_back(
		std::async(std::launch::async, [this, &tree, &foliage_round,
										&foliage_oval, &foliage_cone] {
			std::shared_ptr<MeshGeom> localTerrainMesh =
				std::make_shared<MeshGeom>();
			this->processTree(tree, foliage_round, foliage_oval,
								foliage_cone, M_origin, M_height_range,
								M_nNoHeight, this->config(),
								localTerrainMesh);
			return localTerrainMesh;
		}));
}

Where :

  • std::async is a function that runs a function asynchronously (potentially in a separate thread) and returns a std::future object that will store the result of the function.

  • std::launch::async is a policy that tells std::async to run the function asynchronously.

  • The lambda function passed to std::async is the function that will be run asynchronously. It takes a tree and the foliage meshes as arguments and returns a std::shared_ptr<MeshGeom> object.

Note that we had to create a localTerrainMesh object for each task because the processTree function modifies the terrainMesh object passed as an argument. If we used the same terrainMesh object for all the tasks, we would have concurrency issues.

Finally, we will wait for all the tasks to finish, collect the results and merge them into the main terrain mesh.

for (auto &future : futures) {
	// Wait for task to complete and get result
	auto localTerrainMesh = future.get();
	// Merge result into the main terrain mesh
	terrainMesh->merge(*localTerrainMesh);
}

The improvement in performance is evaluated in the Complexity section.

To further improve performance, we could try to parallelize the autorefine function, which is responsible for merging meshes and excluding intersections between them. This function is one of the most time-consuming parts of the program, making it a good candidate for parallelization. However, the autorefine function is not inherently thread-safe.

The lack of thread safety arises because autorefine modifies shared data structures—like the mesh and its associated geometries—during its execution. When multiple threads attempt to modify the same data simultaneously, it can lead to data corruption, race conditions, or inconsistent results, as the threads may interfere with each other’s operations.

To make autorefine thread-safe, one approach could be to split the mesh into smaller, independent segments, with each segment processed in a separate thread. This way, threads would not interfere with each other, and the function could safely operate in a multi-threaded environment, ensuring both performance gains and data integrity.

stras cutted
Figure 1. Terrain mesh splitted into smaller parts, the red lines represent the boundaries of the parts and should not be parallelized

References