Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
449 views
in Technique[技术] by (71.8m points)

c++ - Efficient SSE NxN matrix multiplication

I'm trying to implement SSE version of large matrix by matrix multiplication. I'm looking for an efficient algorithm based on SIMD implementations.

My desired method looks like:

A(n x m) * B(m x k) = C(n x k)

And all matrices are considered to be 16-byte aligned float array.

I searched the net and found some articles describing 8x8 multiplication and even smaller. I really need it as efficient as possible and I don't want to use Eigen library or similar libraries. (Only SSE3 to be more specific).

So I'd appreciate if anyone can help me find some articles or resources on how to start implementing this.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

The main challenge in implementation of arbitrary-size matrix-matrix multiplication is not the use of SIMD, but reuse of cached data. The paper Anatomy of High-Performance Matrix Multiplication by Goto and Van de Geijn is a must-read if you want to implement cache-friendly matrix-matrix multiplication, and it also discusses the choice of kernels to be SIMD-friendly. After reading this paper expect to achieve 50% of machine peak on matrix-matrix multiplication after two weeks of efforts.

However, if the purpose of this work is not pure learning, I strongly recommend to use a highly optimized library. On x86 your best options are OpenBLAS (BSD-licensed, supports dynamic CPU dispatching), BLIS (BSD-licensed, easily portable to new processors), and Intel MKL (commercial, supports dynamic CPU dispatching on Intel processors). For performance reasons it is better to avoid ATLAS unless you target a very exotic architecture which is not supported by other libraries.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...