boot.py reliability



  • If I run my user deepsleep code as boot.py in a battery powered gpy its reliable. But if I try to run it as m2m.py from a different folder in flash with boot.py as

    import machine, time, pycom
    hook='client/m2m.py'
    print('trying for', hook)
    pycom.heartbeat(False)
    pycom.rgbled(0xffffff)
    machine.main(hook)
    

    it eventually stops working with the white light on so I know boot.py failed to run it. I know I've introduced an extra step but this should work consistently shouldn't it?



  • @robert-hh Thanks for the reply! So basically an unstable power supply can lead to flash errors. Good to know.
    (And you had it, I was using a Li-Ion 3.6V battery. I'll make sure to mention that next time)



  • @alexpul It looks like an unstable supply problem, caused by the battery and the breadboard connections. USB supplies 5V, the battery (assuming Li-Ion) about 3.6V. So the headspace for dealing with power fluctuations is smaller. And you did not tell the type and capacity of the battery.
    Using machine.main() does not at all affect the file system It just stores a string in memory with the file name, which is executed instead of main.py. Similar to an "import filename" at the end of boot.py.



  • I recently ran a test with the WiPy where it woke up, executed boot.py, which directed it to run "filename" using machine.main(filename), then went to deepsleep. It was connected to a battery (through a breadboard, not the expansion board). After some number of runs it stalled and when I looked at the files it had the same long list of '\x00\x00\x00\x00\x00\x00\x00\x00.\x00\x00\x00' files. I had run the script dozens of times before with the WiPy connected to a wall socket using a USB to UART cable and have never seen corrupt files like that. Is it the battery and machine.main() combo that does this?



  • I get these CDUP folders too on 1.19.0.b4. os.fsformat('/flash') fixes it but then it gets all jacked up again.



  • @robert-hh I used the large hammer as per your suggestion & load user software as main.py with the boot.py below.

    import machine, pycom
    red=0x7f0000; green=0x007f00; orange=0xff5100; blue=0x00007f; off=0x000000; white=0x7f7f7f; magenta=0x7f007f; cyan=0x007f7f; lowhite=(0x101010); yellow=(0x7f5100)
    hook='main.py'
    print('trying for', hook)
    pycom.heartbeat(False)
    pycom.rgbled(white)
    cycle=pycom.nvs_get('cycleboot'); cycle+=1; pycom.nvs_set('cycleboot', cycle)
    wdt=machine.WDT(timeout=20000)                                                                                                                                  # 20s for main.py to run & extend the wdt timeout
    machine.main(hook)                                  
    

    Because there are independent cycle counters in main.py & boot.py I can see how often boot.py fails to run main.py
    Timestamp observed Timestamp received Value
    2019-01-17 8:24 AM 2019-01-17 8:24 AM 1 cycle boot fail found @ cycle 161
    2019-01-17 1:17 AM 2019-01-17 1:17 AM 1 cycle boot fail found @ cycle 87
    Results much as before



  • @kjm Yes, when the file system was corrupted. You can rebuild it with os.mkfs. I would prefer to take the larger hammer and erase the whole flash, reload the firmware and the files.



  • @robert-hh So I just went to do some more work on this & os.listdir() returns

    x00\x00.\x00\x00\x00', '\x00\x00\x00\x00\x00\x00\x00\x00.\x00\x00\x00', '\x00\x00\x00\x00\x00\x00\x00\x00.\x00\x00\x00', '\x00\x00\x00\x00\x00\x00\x00\x00.\x00\x00\x00', '\x00\x00\x00\x00\x00\x00\x00\x00.\x00\x00\x00', '\x00\x00\x00\x00\x00\x00\x00\x00.\x00\x00\x00', '\x00\x00\x00\x00\x00\x00\x00\x00.\x00\x00\x00', '\x00\x00\x00\x00\x00\x00\x00\x00.\x00\x00\x00', '\x00\x00\x00\x00\x00\x00\x00\x00.\x00\x00\x00', '\x00\x00\x00\x00\x00\x00\x00\x00.\x00\x00\x00', '\x00\x00\x00\x00\x00\x00\x00\x00.\x00\x00\x00', 'main.py', 'sys', 'lib', 'cert', 'boot.py', 'project.pymakr']
    

    All these identical extra directories have mysteriously appeared. They are listed in Filezilla as CDUP 31 31:63 followed by a string of little y characters with 2 dots over them. Have you ever seen anything like this before?



  • @robert-hh So I looked into it. Hearbeat is started in main.c, ~ line 125. The code itself is in util/mperror.c, function mperror_heartbeat_signal(), it is directly called by the freertos_idle_hook, and I cannot see that it is affecting the WDT. Finally, it uses the RMT for controlling the LED, like the machine.rmt module. The only difference I see in using the RMT call is, that in the machine.rmt the call to rmt_write is enclose in a MP_THREAD_GIL_EXIT(), MP_THREAD_GIL_ENTER() pair.
    So it seems that something else fails. Did you try to add print statements to boot.py,. main.py or the like open entry and exit?



  • @robert-hh So just an update where I'm at with this:

    1. If user code is present in flash as boot.py or main.py it eventually fails to run (usually after several hundred deepsleeps). When this happens the gpy drops into zombie repl mode (blue light flashing but no usb or adhoc wifi & wdt_on_boot disabled).
    2. If user code is present in flash as main.py & is explicitly run with boot.py via machine.main('main.py') it eventually fails to run but gpy does not drop into zombie repl mode.
    3. #2) means wdt_on_boot reboots it & we're good for another few hundred cycles. Hey it's not pretty but it works!
    4. I have a hunch that this has something to do with lte since I've never seen user code fail to run when it's using wifi instead of lte, will confirm when sure about this.


  • @kjm I do not know whether the heartbeat interrupt affects the WDT, and actually I have doubts, that it does. I have to look into the code for analysis.



  • @robert-hh Sure would be nice if zombie heartbeat didn't kill pycom.wdt_on_boot(True), makes it a real cul-de-sac!



  • @kjm The respective code should be in esp32/mktask.c. Looking at it, that may happen if there is a problem in executing boot.py or main.py. which causes a FORCED_EXIT, whatever causes this (lines 278 and 300 of mptask.c). More anaylsis to be done.



  • @robert-hh I still can't get to the bottom of this. A couple of observations about the 4s blue flash mode the gpy drops into:

    1. It looks like regular blue flash mode (4s blue flashes) except it isn't. Can't log in on USB or the gpy ad hoc wifi.
    2. In regular blue flash mode I have to connect P12 to 3v3 then press the rest button to get safe boot. In this mode connecting P12 to 3v3 starts the orange safe boot light flashing, no need to press the reset button.
      Does this tell us anything about what might be happening to stop it running our code after a deepsleep & dropping into this zombie state instead?


  • @robert-hh REPL does not seem to feed the WDT in my experience. When in REPL mode my FiPy and GPy units will still reset themselves when the timer has elapsed if I haven't manually fed it.



  • @kjm said in boot.py reliability:

    import pycom; pycom.wdt_on_boot(True); pycom.wdt_on_boot_timeout(10000)

    I do not know. I would have to search if the wdt is maybe re-fed in REPL mode. That seems reasonable. But could'nt you just re-enable the WDT with the line above in boot.py? Then at least you know that it is enabled. Still that would not affect re-feeding by REPL.



  • @robert-hh I used

    import pycom; pycom.wdt_on_boot(True); pycom.wdt_on_boot_timeout(10000)
    

    from the cmd line to setup a 10s reboot if our code is not feeding the dog. I figured if the gpy booted into the 4s flash default mode after a deepsleep, the device would reboot after 10s. This doesn't happen even though the flag is still set

    >>> import pycom
    >>> pycom.wdt_on_boot()
    True
    

    Why is the default 4s blue flash mode immune to wdt_on_boot?



  • @kjm The same as now, except for the last statement, which tells to call client/m2m.py instead of main.py.



  • @robert-hh So what code should I have in boot.py in this instance?



  • @kjm No. boot.py always runs before main.py or what you define in boot.py to run instead of main.py. My suggestion was to skip that re-pointing of main.py in boot.py and copy the code from m2m.py into main.py, such that the sequence of execution is:

    boot.py
    main.py

    and not, as you tried to define it:

    boot.py
    client/m2m.py



Pycom on Twitter