boot.py reliability



  • If I run my user deepsleep code as boot.py in a battery powered gpy its reliable. But if I try to run it as m2m.py from a different folder in flash with boot.py as

    import machine, time, pycom
    hook='client/m2m.py'
    print('trying for', hook)
    pycom.heartbeat(False)
    pycom.rgbled(0xffffff)
    machine.main(hook)
    

    it eventually stops working with the white light on so I know boot.py failed to run it. I know I've introduced an extra step but this should work consistently shouldn't it?



  • @robert-hh I used the large hammer as per your suggestion & load user software as main.py with the boot.py below.

    import machine, pycom
    red=0x7f0000; green=0x007f00; orange=0xff5100; blue=0x00007f; off=0x000000; white=0x7f7f7f; magenta=0x7f007f; cyan=0x007f7f; lowhite=(0x101010); yellow=(0x7f5100)
    hook='main.py'
    print('trying for', hook)
    pycom.heartbeat(False)
    pycom.rgbled(white)
    cycle=pycom.nvs_get('cycleboot'); cycle+=1; pycom.nvs_set('cycleboot', cycle)
    wdt=machine.WDT(timeout=20000)                                                                                                                                  # 20s for main.py to run & extend the wdt timeout
    machine.main(hook)                                  
    

    Because there are independent cycle counters in main.py & boot.py I can see how often boot.py fails to run main.py
    Timestamp observed Timestamp received Value
    2019-01-17 8:24 AM 2019-01-17 8:24 AM 1 cycle boot fail found @ cycle 161
    2019-01-17 1:17 AM 2019-01-17 1:17 AM 1 cycle boot fail found @ cycle 87
    Results much as before



  • @kjm Yes, when the file system was corrupted. You can rebuild it with os.mkfs. I would prefer to take the larger hammer and erase the whole flash, reload the firmware and the files.



  • @robert-hh So I just went to do some more work on this & os.listdir() returns

    x00\x00.\x00\x00\x00', '\x00\x00\x00\x00\x00\x00\x00\x00.\x00\x00\x00', '\x00\x00\x00\x00\x00\x00\x00\x00.\x00\x00\x00', '\x00\x00\x00\x00\x00\x00\x00\x00.\x00\x00\x00', '\x00\x00\x00\x00\x00\x00\x00\x00.\x00\x00\x00', '\x00\x00\x00\x00\x00\x00\x00\x00.\x00\x00\x00', '\x00\x00\x00\x00\x00\x00\x00\x00.\x00\x00\x00', '\x00\x00\x00\x00\x00\x00\x00\x00.\x00\x00\x00', '\x00\x00\x00\x00\x00\x00\x00\x00.\x00\x00\x00', '\x00\x00\x00\x00\x00\x00\x00\x00.\x00\x00\x00', '\x00\x00\x00\x00\x00\x00\x00\x00.\x00\x00\x00', 'main.py', 'sys', 'lib', 'cert', 'boot.py', 'project.pymakr']
    

    All these identical extra directories have mysteriously appeared. They are listed in Filezilla as CDUP 31 31:63 followed by a string of little y characters with 2 dots over them. Have you ever seen anything like this before?



  • @robert-hh So I looked into it. Hearbeat is started in main.c, ~ line 125. The code itself is in util/mperror.c, function mperror_heartbeat_signal(), it is directly called by the freertos_idle_hook, and I cannot see that it is affecting the WDT. Finally, it uses the RMT for controlling the LED, like the machine.rmt module. The only difference I see in using the RMT call is, that in the machine.rmt the call to rmt_write is enclose in a MP_THREAD_GIL_EXIT(), MP_THREAD_GIL_ENTER() pair.
    So it seems that something else fails. Did you try to add print statements to boot.py,. main.py or the like open entry and exit?



  • @robert-hh So just an update where I'm at with this:

    1. If user code is present in flash as boot.py or main.py it eventually fails to run (usually after several hundred deepsleeps). When this happens the gpy drops into zombie repl mode (blue light flashing but no usb or adhoc wifi & wdt_on_boot disabled).
    2. If user code is present in flash as main.py & is explicitly run with boot.py via machine.main('main.py') it eventually fails to run but gpy does not drop into zombie repl mode.
    3. #2) means wdt_on_boot reboots it & we're good for another few hundred cycles. Hey it's not pretty but it works!
    4. I have a hunch that this has something to do with lte since I've never seen user code fail to run when it's using wifi instead of lte, will confirm when sure about this.


  • @kjm I do not know whether the heartbeat interrupt affects the WDT, and actually I have doubts, that it does. I have to look into the code for analysis.



  • @robert-hh Sure would be nice if zombie heartbeat didn't kill pycom.wdt_on_boot(True), makes it a real cul-de-sac!



  • @kjm The respective code should be in esp32/mktask.c. Looking at it, that may happen if there is a problem in executing boot.py or main.py. which causes a FORCED_EXIT, whatever causes this (lines 278 and 300 of mptask.c). More anaylsis to be done.



  • @robert-hh I still can't get to the bottom of this. A couple of observations about the 4s blue flash mode the gpy drops into:

    1. It looks like regular blue flash mode (4s blue flashes) except it isn't. Can't log in on USB or the gpy ad hoc wifi.
    2. In regular blue flash mode I have to connect P12 to 3v3 then press the rest button to get safe boot. In this mode connecting P12 to 3v3 starts the orange safe boot light flashing, no need to press the reset button.
      Does this tell us anything about what might be happening to stop it running our code after a deepsleep & dropping into this zombie state instead?


  • @robert-hh REPL does not seem to feed the WDT in my experience. When in REPL mode my FiPy and GPy units will still reset themselves when the timer has elapsed if I haven't manually fed it.



  • @kjm said in boot.py reliability:

    import pycom; pycom.wdt_on_boot(True); pycom.wdt_on_boot_timeout(10000)

    I do not know. I would have to search if the wdt is maybe re-fed in REPL mode. That seems reasonable. But could'nt you just re-enable the WDT with the line above in boot.py? Then at least you know that it is enabled. Still that would not affect re-feeding by REPL.



  • @robert-hh I used

    import pycom; pycom.wdt_on_boot(True); pycom.wdt_on_boot_timeout(10000)
    

    from the cmd line to setup a 10s reboot if our code is not feeding the dog. I figured if the gpy booted into the 4s flash default mode after a deepsleep, the device would reboot after 10s. This doesn't happen even though the flag is still set

    >>> import pycom
    >>> pycom.wdt_on_boot()
    True
    

    Why is the default 4s blue flash mode immune to wdt_on_boot?



  • @kjm The same as now, except for the last statement, which tells to call client/m2m.py instead of main.py.



  • @robert-hh So what code should I have in boot.py in this instance?



  • @kjm No. boot.py always runs before main.py or what you define in boot.py to run instead of main.py. My suggestion was to skip that re-pointing of main.py in boot.py and copy the code from m2m.py into main.py, such that the sequence of execution is:

    boot.py
    main.py

    and not, as you tried to define it:

    boot.py
    client/m2m.py



  • @robert-hh So you're saying make my code main.py & leave boot.py empty? I don't see the point of this? I've already established that boot.py runs before main.py & is more reliable than main.py



  • @kjm There are both _boot.py and _main.py in frozen bytecode, which are executed before boot.py. _boot.py contains:

    # _boot.py -- always run on boot-up, even during safe boot
    import os
    from machine import UART
    os.dupterm(UART(0, 115200))
    

    _main.py is empty.
    About your setup with boot.py and m2m.py: That way you actually use the mechanism of boot.py and main.py, only than you tell the firmware to use client/m2m.py instead of main.py. You could try to copy the code from client/m2m.py into main.py and drop the reassignung of main.py. That way, you can isolate the problems, whther it is a problem in your code or a problem with assignuing the name of a file, which is executed instead of main.py.



  • @robert-hh I haven't tried that. boot.py seems bullet proof. If I load my code into flash as boot.py it always runs after deepsleep.

    main.py I'm less impressed with, it mostly runs if there is no boot.py or nothing in boot.py other than the default 29bytes, but not always. Mean time to not loading is around 300 x 6min deepsleep cycles

    When I try to run my code from boot.py via machine.main I'm sure the eventual failure is due to not loading because the code uses the rgb led a lot for diagnosis & the colour I see when it stops working indicates the only thing that has run is boot.py, albeit unsuccessful in it's attempt to run my code.

    I'm not sure if this is relevant but the gpy has this strange behaviour with new code after a power cycle that troubles me. If I load a new regimen into flash, say a change from my code called by boot.py to my code as boot.py, I see led patterns in the first few seconds of the first run that indicate it is trying the previous scheme briefly. It's like there is an area in flash, invisible to us users, that the gpy goes to first, like an echo of the previous setup, before it switches to the new one.



  • @kjm Are you sure it completely fails to load/run m2m.py? Or could it start and just crash a bit further during execution?

    Are you also sure it never fails with the script directly in boot.py?


 

Pycom on Twitter