'Julia vs c++ performance almost factor 30
The Julia program below takes about 6 seconds on my laptop (second test(n)). An equivalent C++ program (using Eigen) only 0.19 s. According to the results I have seen on https://programming-language-benchmarks.vercel.app/cpp, I expected a much smaller difference. What is wrong with my Julia program? I would be very appreciative for hints on how to improve my Julia program.
using StaticArrays
using Printf
struct CoordinateTransformation
b1::SVector{3,Float64}
b2::SVector{3,Float64}
b3::SVector{3,Float64}
r0::SVector{3,Float64}
mf::SMatrix{3,3,Float64}
mb::SMatrix{3,3,Float64}
end
function dot(a::SVector{3,Float64}, b::SVector{3,Float64})
a[1]*b[1] + a[2]*b[2] + a[3]*b[3]
end
function CoordinateTransformation(b1::SVector{3,Float64}, b2::SVector{3,Float64}, b3::SVector{3,Float64}, r0::SVector{3,Float64})
mf = MMatrix{3,3,Float64}(undef)
e1::SVector{3, Float64} = [1.0, 0.0, 0.0]
e2::SVector{3, Float64} = [0.0, 1.0, 0.0]
e3::SVector{3, Float64} = [0.0, 0.0, 1.0]
mf[1, 1] = dot(b1, e1);
mf[1, 2] = dot(b1, e2);
mf[1, 3] = dot(b1, e3);
mf[2, 1] = dot(b2, e1);
mf[2, 2] = dot(b2, e2);
mf[2, 3] = dot(b2, e3);
mf[3, 1] = dot(b3, e1);
mf[3, 2] = dot(b3, e2);
mf[3, 3] = dot(b3, e3);
mb = inv(mf)
CoordinateTransformation(b1, b2, b3, r0, mf, mb)
end
@inline function transform_point_f(at::CoordinateTransformation, v::MVector{3,Float64})
at.mf * v + at.r0
end
@inline function transform_point_b(at::CoordinateTransformation, v::MVector{3,Float64})
at.mb * (v - at.r0)
end
@inline function transform_vector_f(at::CoordinateTransformation, v::MVector{3,Float64})
at.mf * v
end
@inline function transform_vector_b(at::CoordinateTransformation, v::MVector{3,Float64})
at.mb * v
end
function test(n)
theta = 1.0;
c = cos(1.0);
s = sin(1.0);
b1::SVector{3, Float64} = [c, 0.0, s]
b2::SVector{3, Float64} = [0.0, 1.0, 0.0]
b3::SVector{3, Float64} = [-s, 0.0, c]
r0::SVector{3, Float64} = [0.0, 0.0, 1.0]
at::CoordinateTransformation = CoordinateTransformation(b1, b2, b3, r0)
@printf("%e\n", n)
points = Array{MVector{3, Float64}, 1}(undef, n)
@inbounds for i in 1:n
points[i] = [1.0, 0.0, 0.0]
end
@inbounds for i in 1:n
points[i] = transform_point_f(at, points[i])
end
println(points[n])
@inbounds for i in 1:n
points[i] = transform_point_b(at, points[i])
end
println(points[n])
end
n = 10000000
@timev test(n)
@timev test(n)
Solution 1:[1]
A major issue with your test function is that a massive number of MVectors are allocated in the 3 loops. In addition, since MVectors are mutable structs, which are reference types, the points vector is a vector of references, which is not great for performance.
Instead, I recommend changing points to a vector of SVectors and modifying the code to accommodate this (e.g., replace every MVector with SVector). In the first loop, points[i] = [1.0, 0.0, 0.0] should be changed to points[i] = SA[1.0, 0.0, 0.0] to avoid allocations from creating temporary vectors. (See also Eric's comment on this.)
Implementing these simple changes, I see an improvement from
2.523284 seconds (40.00 M allocations: 1.714 GiB, 43.11% gc time)
to
0.171544 seconds (267 allocations: 228.891 MiB)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Vincent Yu |
