'Slow multiplication of transpose of sparse matrix

I'm having speed issues multiplying the transpose of a sparse matrix with a column vector.

In my code the matrix A is

501×501 SparseMatrixCSC{Float64, Integer} with 1501 stored entries:

⠻⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸

⠀⠈⠻⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸

⠀⠀⠀⠈⠻⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸

⠀⠀⠀⠀⠀⠈⠻⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⢸

⠀⠀⠀⠀⠀⠀⠀⠈⠻⣦⡀⠀⠀⠀⠀⠀⠀⢸

⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣦⡀⠀⠀⠀⠀⢸

⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣦⡀⠀⠀⢸

⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣦⡀⢸

⠀⠀⠀⠀⠀⠀⠀⠀⠀⡀⠀⠀⠀⠀⠀⠈⠻⣾

These are the results I get from the multiplication with f0 = rand(Float64,501,1):

Method 1

A_tr = transpose(A)

@benchmark A_tr*f

BenchmarkTools.Trial: 10000 samples with 1 evaluation.

Range (min … max): 350.083 μs … 9.066 ms ┊ GC (min … max): 0.00% … 95.44%

Time (median): 361.208 μs ┊ GC (median): 0.00%

Time (mean ± σ): 380.269 μs ± 355.997 μs ┊ GC (mean ± σ): 4.06% ± 4.15%

Memory estimate: 218.70 KiB, allocs estimate: 11736.

Method 2

A_tr = Matrix(transpose(A))

@benchmark A_tr*f

BenchmarkTools.Trial: 10000 samples with 1 evaluation.

Range (min … max): 87.375 μs … 210.875 μs ┊ GC (min … max): 0.00% … 0.00%

Time (median): 88.542 μs ┊ GC (median): 0.00%

Time (mean ± σ): 89.286 μs ± 3.266 μs ┊ GC (mean ± σ): 0.00% ± 0.00%

Memory estimate: 4.06 KiB, allocs estimate: 1.

Method 3

A_tr = sparse(Matrix(transpose(A)))

@benchmark A_tr*f

BenchmarkTools.Trial: 10000 samples with 9 evaluations.

Range (min … max): 2.102 μs … 1.017 ms ┊ GC (min … max): 0.00% … 99.40%

Time (median): 2.477 μs ┊ GC (median): 0.00%

Time (mean ± σ): 2.725 μs ± 13.428 μs ┊ GC (mean ± σ): 6.92% ± 1.41%

Memory estimate: 4.06 KiB, allocs estimate: 1.

Why doesn't Method 1 produce a similar performance as Method 3? I'm probably missing something basic here.

Thank you for your help!



Solution 1:[1]

using the following MWE

using LinearAlgebra, BenchmarkTools, SparseArrays

A = sprand(501,501,0.005)
At1 = transpose(A)
At2 = sparse(Matrix(transpose(A)))
f = rand(Float64,501,1)

you will find no significant performance difference between

@benchmark $At1*$f

and

@benchmark $At2*$f

As was pointed out by @SGJ the trick is to have a primitive type as parameter for your container, i.e. SparseMatrixCSC{Float64, Int64} instead of SparseMatrixCSC{Float64, Integer}, which is what sprand(501,501,0.005) generates.


@CRJ

IIRC, transpose(A) makes a view of A through LinearAlgebra, which requires translating coordinates for every access. I don't think the fast ways of doing MV math will work through that interface. I'm not surprised that converting your transpose to a matrix object instead of trying to do math through the view is faster.

transpose(A) yields Transpose(A), where Transpose is a lazy transpose wrapper. For sparse-matrix-dense-vector multiplication there are tailored methods, which do not require any mutations of A.

Solution 2:[2]

501×501 SparseMatrixCSC{Float64, Integer} with 1501 stored entries

Integer is an abstract type. This is what is slowing your code down. See the performance tips.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Martin Möddel
Solution 2 SGJ