Category "simd"

Program in assembly x86 [closed]

I recently made a program with C++ and ASM. Can anyone help me make this code a more efficient one , in the ASM part or both. I would really a

Efficiently shift-or large bit vector

I have large in-memory array as some pointer uint64_t * arr (plus size), which represents plain bits. I need to very efficiently (most performant/fast) shift th

Bit-twiddling Wizardry for Index of Min or Max Element in XMM/YMM/ZMM

Is there an instruction or efficient branchless sequence of instructions to figure out the INDEX of (not the value of) the largest (or smallest) element of an u

Implementing matrix operation using AVX in C

I'm trying to implement the following operation using AVX: for (i=0; i<N; i++) { for(j=0; j<N; j++) { for (k=0; k<K; k++) { d[i][j] += 2 *

Does gcc use Intel's SSE 4.2 instructions for text processing if available?

I read here that Intel introduced SSE 4.2 instructions for accelerating string processing. Quote from the article: The SSE 4.2 instruction set, first implement

SIMD intrinsic and memory bus size - How CPU fetches all 128/256 bits in a single memory read?

Hello Forum – I have a few similar/related questions about SIMD intrinsic for which I searched online including stackoverflow but did not find good answer

Fastest Implementation of the Natural Exponential Function Using SSE

I'm looking for an approximation of the natural exponential function operating on SSE element. Namely - __m128 exp( __m128 x ). I have an implementation whic

fftw simd-altivec.h cannot compile

I'm using fftw on a Mac using Xcode 4.4. In my project, I added the whole fftw source code into the project and tried to compile it. It cannot compile successf