'How do I lower the runtime of a linked-list function?

I got a remove_x() function which seems to have a quadratic runtime. How do I lower its runtime to Q(n) using recursion or iteration?

typedef struct node {
    int val;
    struct node *next;
} NODE;

struct list_struct {
    NODE *front;
    NODE *back;
};

// Removes all occurrences of x from given list and returns the number of occurences
int remove_x(LIST *l, int x) {
    int n = 0;
    // remove_first(): Removes first occurrence of x and returns 0 or 1 depending on whether x was found
    while (remove_first(l, x))
        n++;
    return n;
}

c linked-list

Solution 1:^[1]

Typically each node in a linked list is allocated from the heap (e.g. with malloc()) which means:

a hidden header (e.g. probably 16 bytes) for heap management precedes the memory you asked for; so when you allocate small pieces of memory (e.g. 16 byte NODE structures) the amount of RAM consumed can be doubled.
after a program has been running for a while (after lots of pieces of memory are allocated and freed) the memory you allocate ends up being scattered everywhere randomly and mixed in with other unrelated data.

Modern CPUs are fast and (in comparison) RAM is slow (e.g. maybe 150 CPU cycles to access). To reduce this CPUs have multiple layers of caches ranging from "very fast" L1 (e.g. maybe 4 cycles) to "not so fast" L3 or L4 (e.g. maybe 50 cycles).

Modern CPUs also like doing lots of things in parallel (e.g. 4 or more instructions in parallel, with 100 or more instructions "in flight"). If a CPU has to wait to fetch something from caches or RAM, then the CPU stalls and does nothing, possibly for 100 cycles or more (which, if a CPU is capable of doing 4 instructions in parallel, works out to 400 instructions or more). To avoid some of the costs modern CPUs also have hardware prefetchers, to fetch data from RAM before your code asks for it so that it's already cached when your code asks for it.

Now lets look at how this all works when iterating through a linked list:

the code accesses something (e.g. NODE.val), but the address is unpredictable (due to the way malloc() works) and there's no way for the CPU to prefetch it; and the chance that it's on the same cache line (or even in the same virtual memory page) as previously used data is nearly zero (because all the memory being used is scattered and mixed with memory used for other purposes); so the CPU stalls until the data arrives.
the code does almost nothing (e.g. comparing NODE.val to x) so the CPU can be fast (do lots of things in parallel) for almost nothing.
the code does some kind of currentNode = currentNode->next; causing the cycle to repeat (go back to the first step of having a huge stall)

The consequence is that performance is ruined by a factor of around 100.

How do I lower the runtime of a linked-list function?

Stop using a linked list.

For an array, all the memory is consecutive (high chance that array[n+1] is in the same cache line as array[n] and already in the cache) and iteration causes a nice sequential access pattern (that's easily prefetched). Even better, the compiler may be able to use SIMD so that each instruction can work on 4 or 8 pieces of data in parallel (in addition to doing 4 or more instructions in parallel). Compared to a linked list this can work out to be several hundred times faster.

The problem with arrays is that they make other things (inserting and deleting items from the list) slower; and how much slower depends on the size of the array.

That leads to a hybrid approach - a linked list of arrays, where each array is relatively small (e.g. maybe 4 KiB, or 1000 int) where insertion/deletion isn't slow because the array is small; where all the problems of a linked list (stalls, etc) happen 1000 times less often.

However; if you're primarily concerned with finding a value (x); then maybe you could have a different list for each value of x, or a different list for each range of values of x (e.g. one list for nodes where NODE.val is from 0 to 999, a second list for nodes where NODE.val is from 1000 to 1999, etc). If you have 100 lists then each list will be (on average) 1/100th of the size, and searching through a list will be about 100 times faster. This is more often done with hashes (e.g. compute a hash, then use the hash to determine which list to use) to help ensure that lists are likely to have similar sizes (and you don't just get a huge list for values from 0 to 999 while all the other lists are barely used).

If you can combine both these approaches; then "(several hundred times faster) * (100 times faster) = several tens of thousands of times faster".

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Brendan

'How do I lower the runtime of a linked-list function?

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]