Performance comparison with ndarray

Hello,
first real congrats on this really very nice library and incredible genericity. Incredible work here.

I am currently playing a bit with nalgebra and ndarray. I am performing only matrix multiplication operations or swaps of elements in both cases. While swaps have a very comparable performance, I noticed a major difference between in performance between the two when making a general matrix multiplication: for similar matrix sizes ndarray with blas enabled is about 3 times faster than nalgebra. I guess the difference is the blas binding of ndarray which (if I’m not mistaken) is not present in nalgebra. Is that correct? I saw there are some functions that are present for factorizations/decompositions. Is there something similar for gemm? If not are there plans to include it? I would be very happy to help. If that is not the case. Would you have any suggestions as where to start to try to add the binding?

Thank you again for the great work.

Best regards

It’s true that nalgebra does not rely on any blas binding. That must be the reason of the performance gap with ndarray. Adding them would be a great contribution! For example, we could replace the pure-rust BLAS implementation on the blas.rs file by the corresponding binding whenever the user activate some feature (e.g. we could add various features to nalgebra’s Cargo.toml to activate the blas bindings with a specific backend).

There is this issue about blas and lapack integration. Lapack integration is already done on the separate nalgebra-lapack crate. I think it is best to integrate blas directly on the main nalgebra crate though.