My understanding is that it’s my responsibility as the callee to preserve r4, so the push/pop really IS needed. And as far as I can tell there is no way to convince C to only promote one of the operands, so any attempt to do 32x32=>64 turns into 64x64=>64 - the compiler keeps spitting out 6 multiplies no matter how I tweak the C, hence resorting to assembly code.
https://gcc.gnu.org/onlinedocs/gcc/Exte ... -RegistersCan also use inline assembler, get rid of PUSH/POP, and let the compiler deal with register usage optimization (not really need to preserve R4).
When assessing timings you should take in account cycles spent, not the number of instructions.
POP, PUSH, branches (call, return), LD/ST to RAM take at least 2 cycles each.
Statistics: Posted by gmx — Fri Sep 12, 2025 8:58 pm