What's my best bet for computing the dot product of a vector x with a large number of vectors y_i, where x and y_i are of length 10k or so.
- Shove the y's in a matrix and use an optimized
- Or maybe try handcoding an SSE2 solution (I don't have SSE3, according to cpuinfo).
I'm just looking for general guidance here, so any suggestions will be useful.
And yes, I do need the performance. Thanks for any light.
Actionscript 3 import package.* vs import package.Class
Optimizing Kohana-based Websites for Speed and Scalability
So you could probably make use of DirectX or OpenGL libraries to perform the vector operations.
Which conditional statement is faster in SQL?
D3DXVec2Dot This will also save you CPU time..
Is there a way to load embedded YouTube videos faster on my website?
Optimizing bitwise filtering in SQL
Does avoiding functions increase the performance?
Slow implementation and runs out of heap space (even when vm args are set to 2g)
How much this will bring over a BLAS routine must be determined by you.. The greatest speedup is derived by structuring the data into a format, so that you can exploit data parallelism and alignment.
It's the hight perfomance kernel routines.
The many times better than MKL and BLAS..
. http://www.applied-mathematics.net/miniSSEL1BLAS/miniSSEL1BLAS.html. If you have an nVidia graphics card you can get cuBLAS which will perform the operation on the graphics card.
. http://developer.nvidia.com/cublas. For ATI (AMD) graphics cards. http://developer.amd.com/libraries/appmathlibs/pages/default.aspx.