Parallel Mergesort
I have recently implemented 2 versions of parallel mergesort using OpenMP which can be found here.
The first uses OpenMP Sections
construct, and the second uses OpenMP’s newer, Task
construct.
Both are worksharing constructs that are used to divide work among threads when data-level parallelism or loop parallelism is not easily exploited. This is useful when there is some task level parallelism available. Divide and conquer algorithms like mergesort, are a perfect use case.
Sections
The sections
construct is a non-iterative worksharing construct that contains a set of structured blocks that are to be distributed among and executed by the threads in a team. Each structured block is executed once by one of the threads in the team in the context of its implicit task. There is an implicit barrier at the end of a sections
construct unless a nowait
clause is specified.
Tasks
Tasks are queued and executed whenever possible at the so-called task scheduling points. Under some conditions, the runtime could be allowed to move task between threads, even in the mid of their lifetime. Such tasks are called untied and an untied task might start executing in one thread, then at some scheduling point it might be migrated by the runtime to another thread.
Example
Tasks and sections are in many ways similar. For example, the following two code fragments achieve essentially the same result:
// sections
#pragma omp sections
{
#pragma omp section
foo();
#pragma omp section
bar();
}
// tasks
#pragma omp single nowait
{
#pragma omp task
foo();
#pragma omp task
bar();
}
#pragma omp taskwait
Mergesort
Similarly the only difference in my mergesort algorithms look like this:
// mergesort-tasks.c
#pragma omp parallel
{
#pragma omp single nowait
{
#pragma omp task
{
mergesort_parallel_omp(a, size / 2, temp, threads / 2);
}
#pragma omp task
{
mergesort_parallel_omp(a + size / 2, size - size / 2,
temp + size / 2, threads - threads / 2);
}
#pragma omp taskwait
{
merge(a, size, temp);
}
}
}
// mergesort-sections.c
#pragma omp parallel sections
{
#pragma omp section
{
mergesort_parallel_omp (a, size / 2, temp, threads / 2);
}
#pragma omp section
{
mergesort_parallel_omp (a + size / 2, size - size / 2,
temp + size / 2, threads - threads / 2);
}
}
merge(a, size, temp);
Note that because there is an implied barrier at the end of the sections
clause, the final merge can happen outside the sections clause with no explicit barrier like in the task example, which needed to use the taskwait
clause to ensure synchronization. Also note the single
and nowait
clauses in the task version. This is because we only want one thread to be responsible for spawning the tasks. The nowait
clause allows the calling thread to immediately spawn the second half of the mergesort call, without having to wait for the first task to finish.
My next post will show some performance results.
Thanks for Reading! --- @avcourt
Questions? Join my Discord server => discord.gg/5PfXqqr
Follow me on Twitter! => @avcourt