30 sec boot time vs deep sleep
-
@Xykon @iwahdan Any tips on how to handle this?
The code that we run on the LoPy grew and grew over time as more features got added.
Now the boot time is a whopping 30 seconds.
The next request is to make it low-power, so I'm looking into sleep modes.
It looks like "light sleep" is not really very low power at 800uA or so.
"Deep sleep" is much better, but only the SRAM in the RTC domain is retained, so when waking up, we'll have a 30 second lag to load up all the code again, while all we might need to do is respond to an interrupt that can be handled in a couple milliseconds (or perhaps even less than that...)I read in another thread (https://forum.pycom.io/topic/3708/improve-boot-time) people are struggling as well, even with boot times as "short" as a couple seconds.
I'm already using mpy-cross to produce .mpy files, to skip parsing .py files on the device to shave off some of the load time.
Ideally, I'd just run directly from the file system without loading into the slow QSPI-PSRAM first. Unfortunately, even though the .mpy code could be read-only (and live in the QSPI flash / file system), Micropython unfortunately does not support running from ROM (see https://github.com/micropython/micropython/issues/4124 for reasons).
Another idea I had was to try to modify the firmware to keep the PSRAM alive while the ESP32 is in deep sleep (I'm assuming it cuts power to it, but I haven't checked). According to the datasheet (http://gamma.spb.ru/images/pdf/esp-psram32_datasheet_en.pdf), the standby current is only 50uA. A challenge there would be to get the main SRAM "in sync" again with the contents of the PSRAM after waking up from deep sleep.
-
Dear @martijnthe,
we have been able to successfully get the boot time of our Terkin-Datalogger [1] from about 25 seconds down to about 3 seconds by freezing the modules like outlined by @robert-hh. The code base including all third-party modules is around 10 kLOC and we have released the result as builds called Annapurna [2].
With kind regards,
Andreas.[1] https://github.com/hiveeyes/terkin-datalogger
[2] https://community.hiveeyes.org/t/annapurna-firmware-for-pycom-esp32/2754
-
Id like to take a look at this for you over the christmas holidays as 30 seconds really is abnormally long,
If your code could make its way over to paul@pycom.io with a link to this thread Ill keep it private and see what i can do :)
-
@jcaron I do not know how large the code is. So 30k lines uis just an extrapolation from the times I have seen. If the code is much smaller and still takes 30 to start, it's likely not just loading, but also a lot of code execution performed.
-
@robert-hh which brings me back to my earlier point: who needs 30K lines of code on a battery-powered IoT sensor? Does it really need to run on the device itself, rather than on a server?
Also note that the more code you have on the device, the more difficult updates are going to be, especially if you use very low bandwidth networks (especially on the downlinks) such as LoRaWAN.
@martijnthe the Pycom modules try to bring the advantages of (micro)Python to an embedded platform (which makes development usually easier and quicker), but this will obviously have drawbacks if you compare that to raw assembly running directly "on the metal". If you want to get the highest performance (in terms of speed, code size, memory use...) out of the devices, and have full control of the ESP32, you'll probably have to switch to C and/or assembler directly. And if you have something around 30K of micro python, that would be an ever larger (and more difficult to maintain) codebase in C or assembler...
But do you really need that on an IoT sensor?
-
@martijnthe About frozen bytecode: The 30 seconds boot time lets me gues that you have about 30.000 lines of code. Freezing that may overflow the partition size, which requires to rework the partition table.
-
@robert-hh said in 30 sec boot time vs deep sleep:
Frozen bytecode would be faster because that is not moved to RAM, but stays in flash.
Thanks for that tip.
That sounds much more attractive than breaking up the code to only load what's necessary.@reidfo said:
My own code is too large to deep sleep, and even with sleeps, LTE connection, and MQTT connection, plus several imports it only takes about 5-10 seconds before everything is ready.
TBH, 5-10 seconds latency to be able to respond to an external interrupt is unacceptably long. I'm used to microcontrollers where this takes only a couple microseconds!
I'm also more used to microcontrollers where you (as developer) can decide what portions of RAM you want to keep alive during sleep so that don't loose everything when going to sleep. The ESP32 has a couple Ks of "RTC memory" which is kept alive in deep sleep, but I'm not sure if 1) it's accessible somehow from .py and 2) whether/for what it's currently used.
Thanks for all the suggestions. Ideas I may try:
- Bake the code into the firmware using "frozen bytecode" (still does not )
- Investigate keeping PSRAM alive and re-using its contents + look into RTC RAM.
-
@martijnthe said in 30 sec boot time vs deep sleep:
@crumble said in 30 sec boot time vs deep sleep:
Check your code for all sleeps. Sounds like you wait a lot for the network inits and other stuff.
There are no sleeps. The 30 secs is just for the import statements to complete (imports of modules that only contains function and class definitions).
That's really a huge amount of time and code for an embedded device, especially if you want it to sleep. As suggested earlier, can you break up your code and only load what is necessary on each wakeup? My own code is too large to deep sleep, and even with sleeps, LTE connection, and MQTT connection, plus several imports it only takes about 5-10 seconds before everything is ready.
-
@crumble said in 30 sec boot time vs deep sleep:
Check your code for all sleeps. Sounds like you wait a lot for the network inits and other stuff.
There are no sleeps. The 30 secs is just for the import statements to complete (imports of modules that only contains function and class definitions).
-
@martijnthe That seems to be a pretty large code, if it is pre-compiled and still requires 30 seconds for loading. Frozen bytecode would be faster because that is not moved to RAM, but stays in flash.
-
@jcaron said in 30 sec boot time vs deep sleep:
@martijnthe Are you sure the 30 seconds are really just loading the code?
Pretty sure. I measured before and after the imports in boot.py and main.py and that's where the time is going. The imports themselves don't "do" anything but declare classes and functions.
There's very little to no processing going on.
-
@martijnthe Are you sure the 30 seconds are really just loading the code?
If you have network init, sleeps, or large computations, those can probably be optimised. 30 seconds seems like a lot of work for an IoT device.
Also, I don't know what your actual use case is, but most IoT devices (especially battery-powered, with a long life) should really have a very basic scenario: wake up, collect data, send it, go to sleep. CPU use should be very limited. In my experience, there's more time waiting (e.g. for the RX window in the case of LoRaWAN, or for an IP address if using DHCP over Wi-Fi) than anything else.
If you have lots of processing that is done locally, you should probably think about moving that to the server side of things if at all possible, that's really where things should happen, not on the device.
-
Check your code for all sleeps. Sounds like you wait a lot for the network inits and other stuff.
-
@martijnthe said in 30 sec boot time vs deep sleep:
while all we might need to do is respond to an interrupt that can be handled in a couple milliseconds (or perhaps even less than that...)
could you not check the wake reason
machine.wake_reason()
as the first thing you do before requiring and loading any other code? load and run only a subset of the app to handle just the interrupt if that's why its woken up.