'Why does ARM have 16 registers?

Why does ARM have only 16 registers? Is that the ideal number?

Does distance of registers with more registers also increase the processing time/power ?



Solution 1:[1]

To choose one of 16 registers you would need 4bit therefore it could be that this is the best match for opcodes (machine commands) otherwise you would have to introduce a more complex instructions set, which would lead to bigger coder which implies additional costs (execution time).

Wikipedia says It has "Fixed instruction width of 32 bits to ease decoding and pipelining" so it is a reasonable tradeoff.

Solution 2:[2]

32-bit ARM has 16 registers because it only use 4 bits for encoding the register, not because 16 is the ideal number. Likewise x86 has only 8 registers because in history they used 3 bits to encode the register so that some instructions fit in a byte.

That's such a limited number so both x86 and ARM when going to 64-bit doubled the number to 16 and 32 registers respectively. The old ARM instruction encoding has no remaining bit left enough for the larger register number so they must do a trade-off by dropping the ability to execute almost every instruction conditionally and use the 4-bit condition for the new features (that's an oversimplification, in reality it's not exactly like that because the encoding is new, but you do need 3 more bits for the new registers).

Solution 3:[3]

Back in the 80's (IIRC) an academic paper was published that examined a number of different workloads, comparing expected performance benefits of different numbers of registers. This was at a time when RISC processors were transitioning from academic ideas to mainstream hardware, and it was important to decide what was optimal. CPUs were already pulling ahead of memory in speed, and RISC was making this worse by limiting addressing modes and having separate load and store instructions. Having more registers meant you could "cache" more data for immediate access and therefore access main memory less.

Considering only powers of two, it was found that 32 registers was optimal, although 16 wasn't terribly far behind.

Solution 4:[4]

ARM is unique in that each of the registers can have a conditional execution code avoiding tests & branches. Don't forget, many 32 register machines fix R0 to 0 so conditional tests are done by comparing to R0. I know from experience. 20 years ago I had to program a 'Mode 7' (from SNES terminology) floor. The CPUs were SH2 for the 32x (or rather 2 of them), MIPS3000 (Playstation) and 3DO (ARM), the inner loop of the code were 19,15 & 11. If the 3DO had been running at the same speed as the other 2, it would have been twice as fast. As it was, it was just a bit slower.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 stacker
Solution 2
Solution 3 Timothy Miller
Solution 4 Sean