Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
723 views
in Technique[技术] by (71.8m points)

matlab - Questions regarding reorganizing loops for code's efficiency

Here's the problem

Consider the following function:

function A = plodding(N,d)
for ii = 1:N
   jj = 1;
   A(ii,jj) = randn;
   while abs(A(ii,jj)) < d
      jj = jj + 1;
      A(ii,jj) = randn;
   end
end

Rewrite this function to eliminate the allocation problem that is slowing it down. Call the new function, cruising. On a Dell Latitude E6410, with 7.8 gigabytes of usable memory, eliminating the allocation problem produces a speed-up factor of 7.

Here's my work:

The original version with rng(0)

function A = plodding(N,d)
rng(0); % To compare between the original and the modified version
for ii = 1:N
   jj = 1;
   A(ii,jj) = randn;
   while abs(A(ii,jj)) < d
      jj = jj + 1;
      A(ii,jj) = randn;
   end
end
end

The modified version

function A = cruising(N,d)
rng(0);
for jj = 1:N % Reorganize, so elems are added column-wise
   ii = 1;
   A(ii,jj) = randn;
   while abs(A(ii,jj)) < d
      ii = ii + 1;
      A(ii,jj) = randn;
   end
end
A = A'; % To get the matrix desired
end

But when I test:

tic; A = plodding(5,4.5); toc
Elapsed time is 0.176091 seconds.

tic; A1 = cruising(5,4.5); toc;
Elapsed time is 39.285447 seconds.

B = A - A1; sum(B(:))
ans = 0

So certainly A = A1

Based on what I learned from the lesson, my logic should be right, because MATLAB stores elements column-wise. Could someone please help me????

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Allocation

The differences are more than likely arising from when and how often Matlab has to reallocate the dynamically growing matrix. Consider the following output:

>> clear();n = 1E5;A = rand(n,1);tic;for k = 2:500; A(:,k) = rand(n,1);end;toc;whos();
Elapsed time is 1.294744 seconds.
  Name           Size                 Bytes  Class     Attributes

  A         100000x500            400000000  double              
  k              1x1                      8  double              
  n              1x1                      8  double              

>> clear();n = 1E5;A = rand(1,n);tic;for k = 2:500; A(k,:) = rand(1,n);end;toc;whos();
Elapsed time is 31.106700 seconds.
  Name        Size                    Bytes  Class     Attributes

  A         500x100000            400000000  double              
  k           1x1                         8  double              
  n           1x1                         8  double

Despite growing the array A to the same size, growing it column-wise is nearly 25 times faster. Since Matlab is a column-major language, the columns of its arrays are always contiguous in memory. So in the second loop, Matlab needs to reallocate every expanded column to a contiguous section of memory at the beginning of every iteration, which can have a huge overhead. Whereas, in the first loop, Matlab can address the new column right after the previous one, if space exists, or add a pointer connecting the end of the previous column to the new, if Matlab supports non-contiguous arrays (I don't know anything about Matlab internals, but I'd lean toward this method).

The function plodding grows column-wise (inner) then row-wise (outer) while cruising grows row-wise (inner) then grows column-wise. The running time differences depend on these two allocation methodologies but also on the condition used to terminate the allocation (i.e., the threshold value d).


Probability

The probability of receiving a number less than or equal to d is calculated from the cumulative distribution function of the normal distribution. This can be calculated using the Statistics Toolbox's normcdf or, if you do not have the toolbox, the following anonymous function:

normcdf = @(x,mu,sigma) 0.5*(1 + erf((x-mu)./(sigma*sqrt(2))));

The probability of receiving a 2 or less is normcdf(2,0,1) which is 97.7%, and the probability of receiving a value higher than 2 is one minus the probability: 1 - normcdf(2,0,1) or 2.28%. The probabilities can be used to estimate the number of elements in an array needed to get a number greater than d. For example, since there is a 2.28% chance of getting a value greater than 2, an array of 100 elements will more than likely produce at about two 2s on average.

For another example, the probability of getting a number greater than 5 is 0.000029%, and that means that an array of around 1 million elements is needed to get some 5s on average. Note that the formula I used to estimate the array size is

elemApprox = 10^(abs(log10(1 - normcdf(5,0,1))))

and then I divide it by the significant digits of the probability (2.8665 in this case) to get a rough estimate for the approximate number of elements to get one 5. It doesn't hurt to approximate like this since we are dealing with random numbers. For more certainty, you can always multiply the guess by 2 or 3 or more.


Results Discussion

Now let's discuss the results you got. From the comments, you presented

>> tic;A = plodding(10000,2);toc
Elapsed time is 9.289355 seconds.
>> tic;A1 = cruising(10000,2);toc;
Elapsed time is 0.078783 seconds.

As discussed above, the array will have dimensions of 10,000 by about 100 (due to 2's probability). The call plodding(10000,2) will then generate about 100 columns for its first iteration, and then grow the array row-wise for 10,000 iterations. As discussed above, that is a very slow process in Matlab and best avoided if possible. The call cruising(10000,2) will then generate about 100 rows for its first iteration, and then grow the array column-wise for 10,000 iterations. As discussed above, this is a far faster process than row-wise growth for constant row length but can be extremely slow in certain cases.

You also posted the results:

>> tic;A = plodding(5,5);toc
Elapsed time is 1.168961 seconds.
>> tic;A1 = cruising(5,5);toc;
% When I posted this thread, it's already more than 10 mins and MATLAB's still "busy"!

The call plodding(5,5) will generate about 1,000,000 columns for its first iteration, and then grow the array row-wise for 5 iterations. So after every successful inner-loop, Matlab must reallocate 1 million or-so 5 64-bit columns into contiguous memory slots. While not necessarily fast, finding a million separate locations of 40 bytes probably isn't the worst thing in the world with modern memory sizes. The call cruising(5,5) will generate about 1,000,000 rows for its first iteration, and then grow the array column-wise for 5 iterations. This appears better than the plodding call since the rows are being allocated first and then columns added; however, if the number of required rows for subsequent iterations is greater than the current row count, the array's columns must be reallocated after every inner-iteration until the break condition is met.

Diving a little deeper, the first iteration (on my computer) of cruising(5,5) takes about 2.8 seconds to run and allocates 756,670 rows. The second iteration takes about 2.6 seconds to iterate over the allocated rows, and after waiting for 60 seconds past that point, the second iteration has only advanced 20,366 rows. I would conclude that the repeated addition of rows and making Matlab re-allocate about 6MB contiguous blocks of memory per-column makes the function extremely slow.


Recommendation

My recommendation for performance improvement would be to allocate the number of rows of A before entering the loop. For plodding, this means adding A(N,elemApprox) = 0; before the loop, and for cruising, this means adding A(elemApprox,N) = 0; before the loop. elemApprox is the approximate number of rows/columns needed to consistently achieve the condition without growing the array.

Even though I approximated elemApprox to be 1,000,000 earlier for d = 5, it turns out that maximum number of rows with rng(0) specified is 3,151,807. Therefore, for a consistent upper bound on elemApprox, I'd amend the above approximation formula to be

elemApprox = 10^(ceil(abs(log10(1 - normcdf(5,0,1)))));

which is 10,000,000. That's a little steep if memory is limited, but it will greatly increase the running time. Using the allocation formula, the performance of cruising is greatly improved:

>> tic;plodding(5,5);toc;
Elapsed time is 0.576764 seconds.
>> tic;cruising(5,5);toc;
Elapsed time is 0.906496 seconds.

I'm not sure why plodding is performing better, but at least cruising now completes in a reasonable amount of time.

I'd also like to point out that adding just A(N,1) = 0 to plodding and letting Matlab know how many rows there will be results in a large performance gain for small d:

>> tic;plodding(1E4,2);toc; %   without A(N,1) = 0;
Elapsed time is 18.479698 seconds.
>> tic;plodding(1E4,2);toc; %   WITH A(N,1) = 0;
Elapsed time is 0.052307 seconds.

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...