Nop, and as NEON is just other instructions set, it is counted by htop to be regular CPU usage, right? The CPU can't be fully utilised on the NEON part but not on the "regular" part?CPU resource is finite. Does htop report the CPU being fully utilised?
I'm already using dma buffers actually (from /dev/dma_heap/linux,cma) but wasn't synchronizing (I didn't know about that), but the synchronization isn't about speed right? more about data integrity ?I don't know how complex your NEON code is, but as a very quick initial test try allocating a secondary buffer and doing a memcpy of the V4L2 data into it before calling the NEON code. You secondary buffer will naturally be cached, so the NEON code will run more efficiently should it need to load the data more than once.
If that does help then you can look at doing the same tricks that libcamera does to allocate the buffers via the dma-heaps, and then import them into V4L2. Do be aware that you then need to use the dmabuf sync ioctls to ensure caches get flushed/invalidated at appropriate points.
Anyway, this is fine enough for my use case.
Thanks a lot!
Statistics: Posted by f_cam — Mon Jul 07, 2025 3:29 pm