'LDUR and STUR in ARM v8

I've had a couple of courses that touched on ARMv8 assembly, but both teachers described LDUR/STUR instructions a different way and now I've become pretty lost. Can someone help to clarify?

If I had the instruction:

LDUR R3, [R1, #8]

I'll be putting the answer in R3, but what am I taking from R1 and how does the offset operate? Is it like a logical shift? The ARM manual describes it as "byte offset" but then doesn't describe how that offset functions on R1. Do I shift the value stored in R1 (say R1 has value 50 in it) or is there a memory address outside of the R1 that I need to be thinking about? Other sources say I need to think of R1 as an array somehow?



Solution 1:[1]

LDUR is Load (unscaled) Register. It loads a value (32-bits or 64-bits) from an address plus an offset to a register. unscaled means that in the machine-code, the offset will not be encoded with a scaled offset like ldr uses, i.e. no shift will be applied to the immediate offset bits. The offset (simm signed immediate) will be added to the base register Xn|SP.

Thus it's possible to use displacements that aren't a multiple of 4 or 8 with ldur, unlike with ldr

These are the prototypes for LDUR:

    -- loads a 32-bit value
    LDUR <Wt>, [<Xn|SP>{, #<simm>}]

    -- loads a 64-bit value
    LDUR <Xt>, [<Xn|SP>{, #<simm>}]

STUR is Store (unscaled) Register and works in the same way but it stores the value in a register to memory.

These are the prototypes for STUR:

    -- stores a 32-bit register
    STUR <Wt>, [<Xn|SP>{, #<simm>}]

    -- stores a 64-bit register
    STUR <Xt>, [<Xn|SP>{, #<simm>}]

LDUR/STUR allow accessing 32/64-bit values when they are not aligned to the size of the operand. For example, a 32-bit value stored at address 0x52.


In your example,

    LDUR R3, [R1, #8]

this instruction will load to R3 the value pointed by R1 plus 8 bytes. This is what the ARM Reference Manual means by byte offset. So if R1 holds the value 0x50, this will load the value stored at address 0x58. The value of R1 will not be modified.


The instruction LDR R3, [R1, #8] (LDR (immediate) the Unsigned offset variant) produces the same operation, however, the prototype is different:

-- loads a 32-bit value
LDR <Wt>, [<Xn|SP>{, #<pimm>}]

-- loads a 64-bit value
LDR <Xt>, [<Xn|SP>{, #<pimm>}]

The immediate offset pimm is different, LDUR uses a simm. This means that the offset is interpreted in a different way. The first (pimm) is a positive offset and its range is different for the 32-bit variant and the 64-bit variant.

In the 32 bit version:

  • It ranges from 0 to 16380 and can only be a multiple of 4

In the 64 bit version:

  • It ranges from 0 to 32760 and can only be a multiple of 8

This means that some of the offsets combinations of LDUR and LDR (immediate) are going to produce the same operation.

Solution 2:[2]

From ArmĀ® A64 Instruction Set Architecture: Armv8, for Armv8-A architecture profile

if HaveMTEExt() then
boolean is_load_store = MemOp_LOAD IN {MemOp_STORE, MemOp_LOAD};
SetNotTagCheckedInstruction(is_load_store && n == 31);

bits(64) address;
bits(datasize) data;

if n == 31 then
    CheckSPAlignment();
    address = SP[];
else
    address = X[n];

address = address + offset;

data = Mem[address, datasize DIV 8, AccType_NORMAL];
X[t] = ZeroExtend(data, regsize);

this pseudo-code shows how does the offset operates is applied.

Solution 3:[3]

In unsigned offset mode, LDR's imm should keep 4 byte or 8 byte align.

If imm % 4 == 0, stur Wt, [Xn|SP, #imm] is equal to str Wt, [Xn|SP, #imm] in A32.

If imm % 8 == 0, stur Xt, [Xn|SP, #imm] is equal to str Xt, [Xn|SP, #imm] in A64.

LDR can not addressing from -256 to 255 byte by byte while keeping the base register unchanged. That's what LDUR does.

Solution 4:[4]

Instruction Meaning

LDUR R3, [R1, #8]

if:

R1=50

then:

  • [R1, #8] = value of address for 58 (=50 + #8)
  • LDUR R3, [R1, #8] = Load value of address for 58 to R3 register

Q&A

but what am I taking from R1 and how does the offset operate?

offset #8 operate normally, just: R1 + #8 = 50 + 8 = 58

Is it like a logical shift?

no any logical shift.

The ARM manual describes it as "byte offset" but then doesn't describe how that offset functions on R1.

ARM manual is here: LDUR, the full description is

LDUR Wt, [Xn|SP{, #simm}] ; 32-bit general registers ... simm Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0.

I have same confusion same with you before. now clear:

Detail Explanation

LDR

Function: Load Register (immediate)

syntax

LDR Wt, [Xn|SP], #simm ; 32-bit general registers, Post-index
LDR Xt, [Xn|SP], #simm ; 64-bit general registers, Post-index
LDR Wt, [Xn|SP, #simm]! ; 32-bit general registers, Pre-index
LDR Xt, [Xn|SP, #simm]! ; 64-bit general registers, Pre-index

LDR Wt, [Xn|SP{, #pimm}] ; 32-bit general registers
LDR Xt, [Xn|SP{, #pimm}] ; 64-bit general registers

->

  • 32-bit general registers
    • simm
      • Post-index
        • LDR Wt, [Xn|SP], #simm ;
      • Pre-index
        • LDR Wt, [Xn|SP, #simm]!
    • pimm
      • LDR Wt, [Xn|SP{, #pimm}] ;
  • 64-bit general registers
    • simm
      • Post-index
        • LDR Xt, [Xn|SP], #simm ;
      • Pre-index
        • LDR Xt, [Xn|SP, #simm]! ;
    • pimm
      • LDR Xt, [Xn|SP{, #pimm}] ;

-?

  • 32-bit / 64-bit general registers
    • simm
      • Post-index
        • LDR Wt/Xt, [Xn|SP], #simm ;
      • Pre-index
        • LDR Wt/Xt, [Xn|SP, #simm]!
    • pimm
      • LDR Wt/Xt, [Xn|SP{, #pimm}] ;

Note:

  • value range
    • simm?-256 ~ 255
    • pimm
      • 32-bit : 0 ~ 16380
        • but is must mutilple of 4, that is: pimm % 4 == 0
      • 64-bit : 0 ~ 32760
        • but is must mutilple of 8, that is: pimm % 8 == 0

LDUR

Function: Load register (unscaled offset)

syntax:

LDUR  Wt, [Xn|SP{, #simm}]    ; 32-bit general registers
LDUR  Xt, [Xn|SP{, #simm}]    ; 64-bit general registers

->

  • 32-bit / 64-bit general registers
    • LDUR Wt/Xt, [Xn|SP{, #simm}]

Note:

  • value range
    • simm?-256 ~ 255

LDUR vs LDR

Let's talk about LDR first:

It supports 3 ways to get value

  • The first type: LDR Wt/Xt, [Xn|SP], #simm ;
    • called: Post-index
  • Second: LDR Wt/Xt, [Xn|SP, #simm]!
    • called: Pre-index
  • The third type: LDR Wt/Xt, [Xn|SP{, #pimm}] ;

Note:

The commonly used writing method here is: the third:

ldur q0, [x19, #0xa8]

which is:

The last part is:

[register name, #immediate data]

And: LDUR also supports the third type (the first and second types are not supported)

Then comes the difference:

  • The third type of LDR: LDR Wt/Xt, [Xn|SP{, #pimm}]
    • pimm value range: *Different cases * 32-bit: 0 ~ 16380, and pimm % 4 == 0 (is a multiple of 4) * 64-bit: 0 ~ 32760, and pimm % 8 == 0 (is a multiple of 8)
      • Key point: MUST be a multiple of so-and-so (4 or 8)
  • LDUR: LDUR Wt/Xt, [Xn|SP{, #simm}]
    • simm value range: -256 ~ 255
      • Important: NO need to be a multiple of so-and-so (4 or 8)

-> must be a multiple of 4 (or 8), which is called:

offset (i.e. imm here) is scaled by 4 (or 8)

-> Here:

  • scale=scale = a multiple of= is a multiple of so-and-so
    • Must be a multiple of 4 or 8
      • So when addressing, it must be 4 bytes or 8 bytes to address
  • unscale=unscaled = not necessarily a multiple of so-and-so
    • Not necessarily a multiple of 4 or 8
      • So when addressing, it does not have to be 4-byte or 8-byte addressing
        • Can be addressed by one byte by oneself = addressing byte by byte

-?

  • LDR=LoaD Register = LoaD (Scaled) Register = Load scaled immediate value into register
  • LDUR=LoaD Unscaled Register = load unscaled immediate value into register

-> which is:

Syntax of LDR and LDUR:

LDR Wt/Xt, [Xn|SP{, #imm}]

the core differences are:

  • value of imm
    • The value range is different
      • LDR: relatively large (32-bit 0~16380, 64-bit 0~32760), and all positive numbers
      • LDUR: relatively small, only -256~255, and can be positive or negative
    • Numerical requirements vary
      • LDR: imm must be a multiple of so-and-so
        • Depending on the platform, 32-bit or 64-bit, 4 or 8
          • 32-bit: imm%4==0
          • 64-bit: imm%8==0
      • LDUR: imm does not need to be a multiple of so-and-so

Diff Meaning in Real Case

The difference is found, that:

  • Q: What is the difference between LDR and LDUR corresponding to the actual usage meaning or scenario?
  • A: If you want to address byte by byte (and the range does not exceed -256-255), you can only use LDUR, that is, the minimum moving unit of imm is 1
    • Otherwise, LDR cannot be used, because imm in LDR must be at least a multiple of 4 or 8.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 schspa
Solution 3
Solution 4 crifan