Parallel Mergesort

I have recently implemented 2 versions of parallel mergesort using OpenMP which can be found here.

The first uses OpenMP Sections construct, and the second uses OpenMP’s newer, Task construct.

Both are worksharing constructs that are used to divide work among threads when data-level parallelism or loop parallelism is not easily exploited. This is useful when there is some task level parallelism available. Divide and conquer algorithms like mergesort, are a perfect use case.

Sections

The sections construct is a non-iterative worksharing construct that contains a set of structured blocks that are to be distributed among and executed by the threads in a team. Each structured block is executed once by one of the threads in the team in the context of its implicit task. There is an implicit barrier at the end of a sections construct unless a nowait clause is specified.

Tasks

Tasks are queued and executed whenever possible at the so-called task scheduling points. Under some conditions, the runtime could be allowed to move task between threads, even in the mid of their lifetime. Such tasks are called untied and an untied task might start executing in one thread, then at some scheduling point it might be migrated by the runtime to another thread.

Example

Tasks and sections are in many ways similar. For example, the following two code fragments achieve essentially the same result:

// sections

#pragma omp sections
{
   #pragma omp section
   foo();
   #pragma omp section
   bar();
}

// tasks

#pragma omp single nowait
{
   #pragma omp task
   foo();
   #pragma omp task
   bar();
}
#pragma omp taskwait

Mergesort

Similarly the only difference in my mergesort algorithms look like this:

// mergesort-tasks.c
#pragma omp parallel
{
    #pragma omp single nowait
    {
        #pragma omp task
        {
            mergesort_parallel_omp(a, size / 2, temp, threads / 2);
        }
        #pragma omp task
        {
            mergesort_parallel_omp(a + size / 2, size - size / 2,
                                   temp + size / 2, threads - threads / 2);
        }
        #pragma omp taskwait
        {
            merge(a, size, temp);
        }
    }
}

// mergesort-sections.c
#pragma omp parallel sections
{
    #pragma omp section
    {
        mergesort_parallel_omp (a, size / 2, temp, threads / 2);
    }
    #pragma omp section
    {
        mergesort_parallel_omp (a + size / 2, size - size / 2,
                                        temp + size / 2, threads - threads / 2);
    }
}
merge(a, size, temp);

Note that because there is an implied barrier at the end of the sections clause, the final merge can happen outside the sections clause with no explicit barrier like in the task example, which needed to use the taskwait clause to ensure synchronization. Also note the single and nowait clauses in the task version. This is because we only want one thread to be responsible for spawning the tasks. The nowait clause allows the calling thread to immediately spawn the second half of the mergesort call, without having to wait for the first task to finish.

My next post will show some performance results.