Quantcast
Channel: Raspberry Pi Forums
Viewing all articles
Browse latest Browse all 8042

General • TFLite micro on RP2040

$
0
0
Hey all,
I'm trying to implement the classic micro speech example from TFLite Micro on an RP2040.

I trained the model for different positive keywords, and got this working perfectly on an Arduino Nano BLE 33. I put this on the RP2040, and ... the accuracy is considerably less, perhaps ~20% compared to ~70% on the arduino.

The general code is:
- get audio from mic
- FFT on audio
- pass this into Neural Net
- classify results (basic averaging and sliding window stuff)
(my code here: https://github.com/dmckinnon/mlcard)

I have the same microphone setup with each, and have recorded sound from the Arduino mic and played it back on the RP2040, same worse results. So it's something to do with either the FFT preprocessing, or TFLite micro, or output classification.

The code is the same for the entire stack, outside of getting the audio - I copy-pasted the binary of the model as well. Both chips are Cortex-M based, although the Arduino Nano uses an nRF which is Cortex M4 - this has CMSIS, and this is the only meaningful difference I can think of within TFLite Micro
But there are reference implementations of each op, so while the RP2040 won't be as fast it should be just as accurate.

My question is, then:
could CMSIS be the culprit here in that it computes things sufficiently differently to account for this difference in output? If not, what else could it be? Any TfLite micro experts on here that are willing to assist?

EDIT:
wrote all the above before checking some more things, now I have more data. On Arduino, it appears that many sets of features are aggregated under PopulateFeatureData, with the Arduino, before any inference. Then the inference gives

Average scores:
54
5
1
30
1
0
2
5
3
139
0
4
6


Heard eight (139) @1248ms


Whereas on the RP2040, we have it performing inference on a feature vector, getting scores, generating one more feature vector (as opposed to aggregating a lot), and then getting
Average scores:
101
4
0
20
1
0
4
3
3
92
0
2
19


General inference, heard silence (101) @1248ms

the 4th from bottom category spikes on both, but "silence", top category, spikes on the RP2040, and a little higher too.
So there are two things:
- why, if the code is the same, are features being aggregated differently? I can't figure this one out
- if this is operating on the same feature vectors (given the first point, maybe it isn't), why are we getting different outputs?

Does anyone have any insight into TFLite Micro that could help? I know this is a broad question, but I have looked at the Gitter group for TFLM and it looks ... dead, hoping to find knowledge elsewhere

Statistics: Posted by mckinnonbuilding — Thu May 15, 2025 4:13 am



Viewing all articles
Browse latest Browse all 8042

Trending Articles