Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
818 views
in Technique[技术] by (71.8m points)

for loop - gcc openmp thread reuse

I am using gcc's implementation of openmp to try to parallelize a program. Basically the assignment is to add omp pragmas to obtain speedup on a program that finds amicable numbers.

The original serial program was given(shown below except for the 3 lines I added with comments at the end). We have to parallize first just the outer loop, then just the inner loop. The outer loop was easy and I get close to ideal speedup for a given number of processors. For the inner loop, I get much worse performance than the original serial program. Basically what I am trying to do is a reduction on the sum variable.

Looking at the cpu usage, I am only using ~30% per core. What could be causing this? Is the program continually making new threads everytime it hits the omp parallel for clause? Is there just so much more overhead in doing a barrier for the reduction? Or could it be memory access issue(eg cache thrashing)? From what I read with most implementations of openmp threads get reused overtime(eg pooled), so I am not so sure the first problem is what is wrong.

#include<stdio.h>
#include<stdlib.h>
#include<math.h>
#include <omp.h>
#define numThread 2
int main(int argc, char* argv[]) {
    int ser[29], end, i, j, a, limit, als;
    als = atoi(argv[1]);
    limit = atoi(argv[2]);
    for (i = 2; i < limit; i++) {
        ser[0] = i;
        for (a = 1; a <= als; a++) {
            ser[a] = 1;
            int prev = ser[a-1];
            if ((prev > i) || (a == 1)) {
                end = sqrt(prev);
                int sum = 0;//added this
                #pragma omp parallel for reduction(+:sum) num_threads(numThread)//added this
                for (j = 2; j <= end; j++) {
                    if (prev % j == 0) {
                        sum += j;
                        sum += prev / j;
                    }
                }
                ser[a] = sum + 1;//added this
            }
        }
        if (ser[als] == i) {
            printf("%d", i);
            for (j = 1; j < als; j++) {
                printf(", %d", ser[j]);
            }
            printf("
");
        }
    }
}
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

OpenMP thread teams are instantiated on entering the parallel section. This means, indeed, that the thread creation is repeated every time the inner loop is starting.

To enable reuse of threads, use a larger parallel section (to control the lifetime of the team) and specificly control the parallellism for the outer/inner loops, like so:

Execution time for test.exe 1 1000000 has gone down from 43s to 22s using this fix (and the number of threads reflects the numThreads defined value + 1

PS Perhaps stating the obvious, it would not appear that parallelizing the inner loop is a sound performance measure. But that is likely the whole point to this exercise, and I won't critique the question for that.

#include<stdio.h>
#include<stdlib.h>
#include<math.h>
#include <omp.h>

#define numThread 2
int main(int argc, char* argv[]) {
    int ser[29], end, i, j, a, limit, als;
    als = atoi(argv[1]);
    limit = atoi(argv[2]);
#pragma omp parallel num_threads(numThread)
    {
#pragma omp single
        for (i = 2; i < limit; i++) {
            ser[0] = i;
            for (a = 1; a <= als; a++) {
                ser[a] = 1;
                int prev = ser[a-1];
                if ((prev > i) || (a == 1)) {
                    end = sqrt(prev);
                    int sum = 0;//added this
#pragma omp parallel for reduction(+:sum) //added this
                    for (j = 2; j <= end; j++) {
                        if (prev % j == 0) {
                            sum += j;
                            sum += prev / j;
                        }
                    }
                    ser[a] = sum + 1;//added this
                }
            }
            if (ser[als] == i) {
                printf("%d", i);
                for (j = 1; j < als; j++) {
                    printf(", %d", ser[j]);
                }
                printf("
");
            }
        }
    }
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...