External Flash lose files



  • @tuftec Thanks. I'll give that a try.
    The long term test of simple writes are still running. both 3.5 Million on a WiPy3(-alike) and a LoPy4. I will stop it on the LoPy4 soon and only continue it on the WiPy. Then I can deal with your test.

    The heartbeat related crash did not re-appear after a code change now for ~1 Million cycels on a FIpy, where would not earlier never for more than 130_000 cycles, ~50_000 in average.



  • @robert-hh I have attempted to trim back my application so that you can run it at your end.
    It will consistently crash.
    Copy the 6 files onto a device (change the extension for DipConfig and NVvar to be .json) and it should run.
    Still a lot of extra stuff in there but you should be able to find your way.
    The program uses the .json files which it opens and reads/writes to.

    This code works fine on FAT but crashes with LFS.

    @peterp @Gijs your assistance here would also be appreciated.

    NVvar.py main.py DipConfig.py DipBoot.py DipApp.py boot.py



  • @robert-hh I have tested my code with the firmware build that has lfs 2.03.
    it crashes at the same sport.

    crash dump follows

    
    10:30:18: date/time =  20 12 26 9 26
    PurgeTimer =  0  PurgePeriod =  4
    backed up nv data
    closed file1
    10:30:20: abort() was called at PC 0x40099b38 on core 1
    
    ELF file SHA256: 0000000000000000000000000000000000000000000000000000000000000000
    
    Backtrace: 0x40091f2c:0x3ffe2c10 0x40092179:0x3ffe2c30 0x40099b38:0x3ffe2c50 0x40085476:0x3ffe2c80 0x40084f89:0x3ffe2ca0 0x4000bec7:0x3ffe2cc0 0x400f8b7a:0x3ffe2ce0 0x400f9cf6:0x3ffe2d00 0x400f9dbc:0x3ffe2d20 0x4010d081:0x3ffe2d40 0x40107e49:0x3ffe2d70 0x4010424d:0x3ffe2d90 0x401042dd:0x3ffe2db0 0x40111c03:0x3ffe2dd0 0x40107f5c:0x3ffe2e70 0x4010424d:0x3ffe2ea0 0x401042dd:0x3ffe2ec0 0x40111c03:0x3ffe2ee0 0x40107f5c:0x3ffe2f80 0x4010424d:0x3ffe2fe0 0x401042dd:0x3ffe3000 0x40111c03:0x3ffe3020 0x40107f5c:0x3ffe30c0 0x4010424d:0x3ffe3130 0x4010427a:0x3ffe3150 0x400e7369:0x3ffe3170 0x400e75f9:0x3ffe3210 0x400e6041:0x3ffe3230
    
    Rebooting...
    10:30:25: SYSTEM RESTART?
    10:30:27: start HW init
    disable the LTE modem for now
    [AT] 668231296 AT+CFUN?
    [AT-OK] +31 ```
    
    

    I will attempt to create test version of my code that can repeatedly produce the crash for you to test.



  • @robert-hh thanks for your help with the LFS and flash issues. I still plan to do some more testing and share the results, but for now I am focusing on trying to sort out some LTE reliability issues. Once that is bedded down, I will get back onto the flash life issues.
    Peter.



  • @serafimsaudade The core dump you showed is about different situation that the one I am hunting. No relation to the RGB led.



  • @robert-hh

    I also seeing some random coredumps. I create this thread.

    The heartbeat LED is disable and I also turn off the LED on boot.py:

    pycom.heartbeat(False)  # disable the heartbeat LED
    pycom.rgbled(0x000000) 
    

    Best Regard,



  • @serafimsaudade @gijs @tuftec I have a long term test running with the updated firmware, one WiPy3-alike and one LoPy4. The WiPy3 is at 2.5 Million writes, the LoP4 at 2 Million write. The LoPy4 is faster (for whatever reason), so it will overtake the WiPy in the next days. I plan to run in up to 10 Million cycles. That will take another 3 weeks.
    During the test I ran into core dumps, caused by the heartbeat LED flashing. So I have a FiPy running just for assessing that bug. I made an issue report for that: https://github.com/pycom/pycom-micropython-sigfox/issues/518
    It looks like a race condition between LED "On" and LED "Off", when due to the heavy load especially during file close the "On" cycle is delayed such that it collides with the "Off" Cycle. But I am pretty confident that I found a very simple bug fix. Similar to the one here by just changing a number. Note: Did not fix it. :-(



  • @robert-hh thanks. will looking into this further tomorrow.
    Peter.



  • @tuftec If you did not see it yet, Fipy tarball with lfs2.03 is here:
    https://github.com/robert-hh/Shared-Stuff/blob/master/FiPy-1.20.2.r4-lfs2.0.tar.gz



  • @tuftec said in External Flash lose files:

    Thanks. So it seems to happen in a free() call of memory management. Below is the expanded backtrace. That part is not new, and not different to lfs 2.03. And it is also different to the core dumps I see. What I can do is create an image with lfs 2.03 for comparison. Will be at the usual place in a few moments.

    BT-0: 0x40091f2c is in invoke_abort (/home/robert/Downloads/MicroPython/pycom-esp-idf/components/esp32/panic.c:156).
    156	        *((int *) 0) = 0;
    BT-1: 0x40092179 is in abort (/home/robert/Downloads/MicroPython/pycom-esp-idf/components/esp32/panic.c:171).
    171	    invoke_abort();
    BT-2: 0x40099b38 is in multi_heap_free_impl (/home/peter/docs/pycom-esp-idf/components/heap/multi_heap_platform.h:54).
    54	        abort();
    BT-3: 0x40085476 is in heap_caps_free (/home/peter/docs/pycom-esp-idf/components/heap/heap_caps.c:268).
    268	    multi_heap_free(heap->heap, ptr);
    BT-4: 0x40084f89 is in _free_r (/home/peter/docs/pycom-esp-idf/components/newlib/syscalls.c:42).
    42	    heap_caps_free( ptr );
    BT-6: 0x400f882a is in lfs_file_rawclose (littlefs/lfs_util.h:226).
    226	    free(p);
    BT-7: 0x400f93c5 is in lfs_file_close (littlefs/lfs.c:5089).
    5089	    err = lfs_file_rawclose(lfs, file);
    BT-8: 0x400fa076 is in littlefs_close_common_helper (littlefs/vfs_littlefs.c:326).
    326	    int res = lfs_file_close(lfs, fp);
    BT-9: 0x400fa13c is in file_obj_ioctl (littlefs/vfs_littlefs_file.c:105).
    105	            int res = littlefs_close_common_helper(&self->littlefs->lfs, &self->fp, &self->cfg, &self->timestamp_update);
    BT-10: 0x4010d399 is in mp_stream_close (../py/stream.c:422).
    422	    mp_uint_t res = stream_p->ioctl(stream, MP_STREAM_CLOSE, 0, &error);
    BT-11: 0x40108161 is in fun_builtin_1_call (../py/objfun.c:70).
    70	    return self->fun._1(args[0]);
    BT-12: 0x40104565 is in mp_call_function_n_kw (../py/runtime.c:624).
    624	        return type->call(fun_in, n_args, n_kw, args);
    BT-13: 0x401045f5 is in mp_call_method_n_kw (../py/runtime.c:640).
    640	    return mp_call_function_n_kw(args[0], n_args + adjust, n_kw, args + 2 - adjust);
    BT-14: 0x40111f4b is in mp_execute_bytecode (../py/vm.c:1002).
    1002	                    SET_TOP(mp_call_method_n_kw(unum & 0xff, (unum >> 8) & 0xff, sp));
    BT-15: 0x40108274 is in fun_bc_call (../py/objfun.c:287).
    287	    mp_vm_return_kind_t vm_return_kind = mp_execute_bytecode(code_state, MP_OBJ_NULL);
    BT-16: 0x40104565 is in mp_call_function_n_kw (../py/runtime.c:624).
    624	        return type->call(fun_in, n_args, n_kw, args);
    BT-17: 0x401045f5 is in mp_call_method_n_kw (../py/runtime.c:640).
    640	    return mp_call_function_n_kw(args[0], n_args + adjust, n_kw, args + 2 - adjust);
    BT-18: 0x40111f4b is in mp_execute_bytecode (../py/vm.c:1002).
    1002	                    SET_TOP(mp_call_method_n_kw(unum & 0xff, (unum >> 8) & 0xff, sp));
    BT-19: 0x40108274 is in fun_bc_call (../py/objfun.c:287).
    287	    mp_vm_return_kind_t vm_return_kind = mp_execute_bytecode(code_state, MP_OBJ_NULL);
    BT-20: 0x40104565 is in mp_call_function_n_kw (../py/runtime.c:624).
    624	        return type->call(fun_in, n_args, n_kw, args);
    BT-21: 0x401045f5 is in mp_call_method_n_kw (../py/runtime.c:640).
    640	    return mp_call_function_n_kw(args[0], n_args + adjust, n_kw, args + 2 - adjust);
    BT-22: 0x40111f4b is in mp_execute_bytecode (../py/vm.c:1002).
    1002	                    SET_TOP(mp_call_method_n_kw(unum & 0xff, (unum >> 8) & 0xff, sp));
    BT-23: 0x40108274 is in fun_bc_call (../py/objfun.c:287).
    287	    mp_vm_return_kind_t vm_return_kind = mp_execute_bytecode(code_state, MP_OBJ_NULL);
    BT-24: 0x40104565 is in mp_call_function_n_kw (../py/runtime.c:624).
    624	        return type->call(fun_in, n_args, n_kw, args);
    BT-25: 0x40104592 is in mp_call_function_0 (../py/runtime.c:598).
    598	    return mp_call_function_n_kw(fun, 0, 0, NULL);
    BT-26: 0x400e7351 is in parse_compile_execute (../lib/utils/pyexec.c:103).
    103	        mp_call_function_0(module_fun);
    BT-27: 0x400e75e1 is in pyexec_file (../lib/utils/pyexec.c:560).
    560	    return parse_compile_execute(filename, MP_PARSE_FILE_INPUT, EXEC_FLAG_SOURCE_IS_FILENAME);
    BT-28: 0x400e6029 is in TASK_Micropython (mptask.c:339).
    339	            int ret = pyexec_file(main_py);
    
    


  • @robert-hh I use the heartbeat led for various status indications.
    I do not currently have a cut-down test version of the script currently.

    Here is the crash dump.

    backed up nv data
    18:08:23: closed file1
    abort() was called at PC 0x40099b38 on core 1
    
    ELF file SHA256: 0000000000000000000000000000000000000000000000000000000000000000
    
    Backtrace: 0x40091f2c:0x3ffe2bf0 0x40092179:0x3ffe2c10 0x40099b38:0x3ffe2c30 0x40085476:0x3ffe2c60 0x40084f89:0x3ffe2c80 0x4000bec7:0x3ffe2ca0 0x400f882a:0x3ffe2cc0 0x400f93c5:0x3ffe2ce0 0x400fa076:0x3ffe2d00 0x400fa13c:0x3ffe2d20 0x4010d399:0x3ffe2d40 0x40108161:0x3ffe2d70 0x40104565:0x3ffe2d90 0x401045f5:0x3ffe2db0 0x40111f4b:0x3ffe2dd0 0x40108274:0x3ffe2e70 0x40104565:0x3ffe2ea0 0x401045f5:0x3ffe2ec0 0x40111f4b:0x3ffe2ee0 0x40108274:0x3ffe2f80 0x40104565:0x3ffe2fe0 0x401045f5:0x3ffe3000 0x40111f4b:0x3ffe3020 0x40108274:0x3ffe30c0 0x40104565:0x3ffe3130 0x40104592:0x3ffe3150 0x400e7351:0x3ffe3170 0x400e75e1:0x3ffe3210 0x400e6029:0x3ffe3230
    
    Rebooting...
    18:08:26: 
    

    Peter.



  • @tuftec Can you also send the shortest version of the script, that causes the bug? I have a FiPy here running w/o problems.
    Edit: Do you use the RGB led for anything else?



  • @robert-hh the heartbeat led is disable in my application already. I will try to capture and send the dump a little later. Thanks.



  • @tuftec Could you try to run the test with heartbeat LED off?



  • @tuftec I have looked at four trace dumps in my tests, and all happen during a file close operation. Somewhere down the line, a ISR Watchdog timer fires after entering a critical section. So it looks like a lock-up condition.
    At littlefs/vsf_littlefs_file.c, line 104-105 looks interesting. There, the semaphore is taken and then the close call proceeds, which finally ends up in the ISR Watchdog event. The ISR is caused by the rmt driver, which is called for the heartbeat LED!



  • @robert-hh I will try to capture this tomorrow when I have some time.



  • @tuftec Can you post the trace dump? Then I can loop the code with the elf file.



  • @robert-hh Yes. Happens every single time.



  • @tuftec Does your error happen every time you try?



  • @peterp do you have any thoughts on this one? Solving this one would make the system so much more reliable.


Log in to reply
 

Pycom on Twitter