Research | Scaling Results | PHASTA Scaling | FMDB iMesh Implementation | Mesh Adapt | Zoltan Partitioning | Mesquite Mesh Quality | FronTier parallel scaling |
The parallel implementation of Mesquite has been achieved as part of SciDAC-2 and preliminary scaling studies are underway. In general, parallel smoothing is done using the partitioning of vertices determined by the application program. At each smoothing step, communication must be done to ensure that vertices that are adjacent in the graph are updated correctly and the mesh does not become inverted. For most applications, this can be achieved with nearest neighbor, point-to-point communication with a limited number of processors. It is also necessary to determine when all of the work is complete which has traditionally be done using an MPI all-reduce each iteration to check for remaining vertices to be smoothed. As the number of processors grows, this MPI all-reduce is expected to become increasingly expensive, and so a new algorithm was created that allows us to pre-compute the expected work for each iteration using only nearest neighbor communication.
Preliminary weak scaling results on up to 128 processors show that nearly 90% efficiency can be obtained. Results on larger numbers of processors are being collected and will be posted soon.
Strong scaling studies of Mesquite on up to 128 processors show how the increasing surface to volume ratio of vertices that require smoothing impacts the communication overhead. As the number of partition boundary (PB) vertices increases from 0 to 10% of the total number of vertices, the time it takes to smooth the PB increases from 0 to 30%.
Proc count |
time |
speed-up |
PB smoothing |
Vtx per |
PB vtx per |
||
1 |
11.28 |
1.00 |
0.00 |
0 % |
186K |
0 |
0 % |
2 |
5.78 |
1.95 |
0.13 |
2 % |
93.3K |
465 |
½% |
4 |
2.90 |
3.89 |
0.05 |
2 % |
46.7K |
434 |
1 % |
8 |
1.47 |
7.68 |
0.07 |
4 % |
23.3K |
438 |
2 % |
16 |
0.75 |
15.14 |
0.05 |
6 % |
11.7K |
331 |
3 % |
32 |
0.38 |
29.58 |
0.04 |
10 % |
5.8K |
227 |
5 % |
64 |
0.21 |
53.01 |
0.04 |
19 % |
2.9K |
192 |
7 % |
128 |
0.12 |
90.96 |
0.04 |
30 % |
1.4K |
150 |
10 % |
Table 1. Strong scaling of mesquite on up to 128 processors.