Quantcast
Channel: Raspberry Pi Forums
Viewing all articles
Browse latest Browse all 3891

Bare metal, Assembly language • Re: SIMD LDR from device memory

$
0
0
Yeah, I agree that a Q sized LDR would be inappropriate here, but I only mentioned that as a side note and to further demonstrate the oddity.

The instructions GCC is generating are 32bit S sized LDRs and the size isn't the issue - it's the addressing. If you look at the example asm code above, an offset is included with the base address as the program attempts to read each of the subsequent 32 bit registers in turn (in the example, it actually begins at the last address and decrements to the first). The problem is that each read, even though it has a valid offset, always returns the first 32 bits of the 128 bit address range!

In other words, if the base address (loaded into X1) is 0x3f202010 and you do a LDR S1, [X1], that will work, but and LDR with a 4 byte offset will STILL return the first 32 bits. Ie, LDR S1 [X1, #4] will not load the value at 0x3f202014, it will STILL load the value at 0x3f202010! Taken from the opposite direction, if the base address is 0x3f20201c and you do a LDR S1, [X1], you will still get the data from 0x3f202010! This is the weirdness.

Again, this only seems to happen in device type memory as offsets work fine in regular memory or off the stack (as do D or Q sized loads but that's merely incidental).

I don't think this is a high level issue, or even a compiler issue - it's something specific to device memory type and SIMD LDR/LDUR instructions in ARMv8-A.

I guess I could just ask whether anyone else is successfully doing baremetal with -O2 (or -O3), or just with the -fftree-slp-vectorize optimization enabled, and happens to see any LDRs to SIMD registers from device memory regions?

My specific example is retrieving the SDCard CID data from the RPI 3B SDHOST device after a valid CMD2, from the SDRSP0, SDRSP1, SDRSP2, SDRSP3 registers. The card's Product Name is 7x ascii characters the contents are split across SDRSP3 and SDRSP2 registers so I have a several shift right and bit mask functions in C++ to pull out the values. GCC is loading both registers into separate SIMD vectors and then performing those bitwise operations on the SIMD registers.

My SDCard_CID constructor:

Code:

SDCardCID(unsigned int b127_96, unsigned int b95_64, unsigned int b63_32, unsigned int b31_0) : manuf_id(b127_96 >> 24), rev_major(b63_32 >> 28), rev_minor(b63_32 >> 24 & 0xF),serial_no((b63_32 << 8) | (b31_0 >> 24)), manuf_month(b31_0 >> 8 & 0xF), manuf_year(b31_0 >> 12 & 0xFF){oem_id[0] = b127_96 >> 16 & 0xFF;oem_id[1] = b127_96 >> 8 & 0xFF;prod_name[0] = b127_96 & 0xFF;prod_name[1] = b95_64 >> 24 & 0xFF;prod_name[2] = b95_64 >> 16 & 0xFF;prod_name[3] = b95_64 >> 8 & 0xFF;prod_name[4] = b95_64 & 0xFF;}
The resulting assembly (Via objdump):

Code:

mov     x1, #0x201c                   movk    x1, #0x3f20, lsl #16mov     x0, x1ldr     s2, [x0], # - 4ldur    s1, [x1, # - 4]ldur    w2, [x0, # - 4]ldur    w1, [x0, # - 8]adrp    x3, c5000 <irq_handlers + 0x118>add     x3, x3, #0x520add     x0, x3, #0x10ushr    v0.2s, v2.2s, #24ushr    v7.2s, v2.2s, #16ushr    v6.2s, v2.2s, #8ushr    v5.2s, v1.2s, #24ushr    v4.2s, v1.2s, #16ushr    v3.2s, v1.2s, #8mov     v0.b[1], v7.b[0]mov     v0.b[2], v6.b[0]mov     v0.b[3], v2.b[0]mov     v0.b[4], v5.b[0]mov     v0.b[5], v4.b[0]mov     v0.b[6], v3.b[0]mov     v0.b[7], v1.b[0]str     d0, [x3, #16]lsr     w3, w2, #28strb    w3, [x0, #8]ubfx    x3, x2, #24, #4strb    w3, [x0, #9]extr    w2, w2, w1, #24str     w2, [x0, #12]ubfx    x2, x1, #8, #4strb    w2, [x0, #16]lsr     w1, w1, #12strb    w1, [x0, #17]
You can see gcc trying to vectorize the SDRSP3 (0x3f20201c) and SDRSP2 (0x3f202018) addresses into S2 and S1 respectively. From there, it performs various shift-rights on the SIMD registers before storing them. Problem is, S2 and S1 always get the value of SDRSP0 (0x3f202010) as described above so the result is always bad!

Cheers!

Statistics: Posted by willdieh — Thu May 16, 2024 10:34 pm



Viewing all articles
Browse latest Browse all 3891

Trending Articles