External Flash lose files



  • Hi,
    I have install in 2019 11 wipy 3. And used flash to save some configs. Everything has runs ok until December 2020, when all the 11 wipy became offline. I bring them to the office to check the problem.
    The firmware installed on the devices didn't find the configs files saved into the flash.
    So I checked the flash on a FTP client, and seemed formated.
    I sended the configs files again over FTP and reboot the wipy and the firmware didn't boot because of missing config files. Then I opened the flash over FTP client and the flash has empty again.
    All the 11 wipy got the same problem. This is every concern.

    I'm afraid that the external memory flash are faulty on all the 11 wipy.
    I used the LFS on all devices.

    Best regards,
    Luís Santos



  • @robert-hh thanks. will looking into this further tomorrow.
    Peter.



  • @tuftec If you did not see it yet, Fipy tarball with lfs2.03 is here:
    https://github.com/robert-hh/Shared-Stuff/blob/master/FiPy-1.20.2.r4-lfs2.0.tar.gz



  • @tuftec said in External Flash lose files:

    Thanks. So it seems to happen in a free() call of memory management. Below is the expanded backtrace. That part is not new, and not different to lfs 2.03. And it is also different to the core dumps I see. What I can do is create an image with lfs 2.03 for comparison. Will be at the usual place in a few moments.

    BT-0: 0x40091f2c is in invoke_abort (/home/robert/Downloads/MicroPython/pycom-esp-idf/components/esp32/panic.c:156).
    156	        *((int *) 0) = 0;
    BT-1: 0x40092179 is in abort (/home/robert/Downloads/MicroPython/pycom-esp-idf/components/esp32/panic.c:171).
    171	    invoke_abort();
    BT-2: 0x40099b38 is in multi_heap_free_impl (/home/peter/docs/pycom-esp-idf/components/heap/multi_heap_platform.h:54).
    54	        abort();
    BT-3: 0x40085476 is in heap_caps_free (/home/peter/docs/pycom-esp-idf/components/heap/heap_caps.c:268).
    268	    multi_heap_free(heap->heap, ptr);
    BT-4: 0x40084f89 is in _free_r (/home/peter/docs/pycom-esp-idf/components/newlib/syscalls.c:42).
    42	    heap_caps_free( ptr );
    BT-6: 0x400f882a is in lfs_file_rawclose (littlefs/lfs_util.h:226).
    226	    free(p);
    BT-7: 0x400f93c5 is in lfs_file_close (littlefs/lfs.c:5089).
    5089	    err = lfs_file_rawclose(lfs, file);
    BT-8: 0x400fa076 is in littlefs_close_common_helper (littlefs/vfs_littlefs.c:326).
    326	    int res = lfs_file_close(lfs, fp);
    BT-9: 0x400fa13c is in file_obj_ioctl (littlefs/vfs_littlefs_file.c:105).
    105	            int res = littlefs_close_common_helper(&self->littlefs->lfs, &self->fp, &self->cfg, &self->timestamp_update);
    BT-10: 0x4010d399 is in mp_stream_close (../py/stream.c:422).
    422	    mp_uint_t res = stream_p->ioctl(stream, MP_STREAM_CLOSE, 0, &error);
    BT-11: 0x40108161 is in fun_builtin_1_call (../py/objfun.c:70).
    70	    return self->fun._1(args[0]);
    BT-12: 0x40104565 is in mp_call_function_n_kw (../py/runtime.c:624).
    624	        return type->call(fun_in, n_args, n_kw, args);
    BT-13: 0x401045f5 is in mp_call_method_n_kw (../py/runtime.c:640).
    640	    return mp_call_function_n_kw(args[0], n_args + adjust, n_kw, args + 2 - adjust);
    BT-14: 0x40111f4b is in mp_execute_bytecode (../py/vm.c:1002).
    1002	                    SET_TOP(mp_call_method_n_kw(unum & 0xff, (unum >> 8) & 0xff, sp));
    BT-15: 0x40108274 is in fun_bc_call (../py/objfun.c:287).
    287	    mp_vm_return_kind_t vm_return_kind = mp_execute_bytecode(code_state, MP_OBJ_NULL);
    BT-16: 0x40104565 is in mp_call_function_n_kw (../py/runtime.c:624).
    624	        return type->call(fun_in, n_args, n_kw, args);
    BT-17: 0x401045f5 is in mp_call_method_n_kw (../py/runtime.c:640).
    640	    return mp_call_function_n_kw(args[0], n_args + adjust, n_kw, args + 2 - adjust);
    BT-18: 0x40111f4b is in mp_execute_bytecode (../py/vm.c:1002).
    1002	                    SET_TOP(mp_call_method_n_kw(unum & 0xff, (unum >> 8) & 0xff, sp));
    BT-19: 0x40108274 is in fun_bc_call (../py/objfun.c:287).
    287	    mp_vm_return_kind_t vm_return_kind = mp_execute_bytecode(code_state, MP_OBJ_NULL);
    BT-20: 0x40104565 is in mp_call_function_n_kw (../py/runtime.c:624).
    624	        return type->call(fun_in, n_args, n_kw, args);
    BT-21: 0x401045f5 is in mp_call_method_n_kw (../py/runtime.c:640).
    640	    return mp_call_function_n_kw(args[0], n_args + adjust, n_kw, args + 2 - adjust);
    BT-22: 0x40111f4b is in mp_execute_bytecode (../py/vm.c:1002).
    1002	                    SET_TOP(mp_call_method_n_kw(unum & 0xff, (unum >> 8) & 0xff, sp));
    BT-23: 0x40108274 is in fun_bc_call (../py/objfun.c:287).
    287	    mp_vm_return_kind_t vm_return_kind = mp_execute_bytecode(code_state, MP_OBJ_NULL);
    BT-24: 0x40104565 is in mp_call_function_n_kw (../py/runtime.c:624).
    624	        return type->call(fun_in, n_args, n_kw, args);
    BT-25: 0x40104592 is in mp_call_function_0 (../py/runtime.c:598).
    598	    return mp_call_function_n_kw(fun, 0, 0, NULL);
    BT-26: 0x400e7351 is in parse_compile_execute (../lib/utils/pyexec.c:103).
    103	        mp_call_function_0(module_fun);
    BT-27: 0x400e75e1 is in pyexec_file (../lib/utils/pyexec.c:560).
    560	    return parse_compile_execute(filename, MP_PARSE_FILE_INPUT, EXEC_FLAG_SOURCE_IS_FILENAME);
    BT-28: 0x400e6029 is in TASK_Micropython (mptask.c:339).
    339	            int ret = pyexec_file(main_py);
    
    


  • @robert-hh I use the heartbeat led for various status indications.
    I do not currently have a cut-down test version of the script currently.

    Here is the crash dump.

    backed up nv data
    18:08:23: closed file1
    abort() was called at PC 0x40099b38 on core 1
    
    ELF file SHA256: 0000000000000000000000000000000000000000000000000000000000000000
    
    Backtrace: 0x40091f2c:0x3ffe2bf0 0x40092179:0x3ffe2c10 0x40099b38:0x3ffe2c30 0x40085476:0x3ffe2c60 0x40084f89:0x3ffe2c80 0x4000bec7:0x3ffe2ca0 0x400f882a:0x3ffe2cc0 0x400f93c5:0x3ffe2ce0 0x400fa076:0x3ffe2d00 0x400fa13c:0x3ffe2d20 0x4010d399:0x3ffe2d40 0x40108161:0x3ffe2d70 0x40104565:0x3ffe2d90 0x401045f5:0x3ffe2db0 0x40111f4b:0x3ffe2dd0 0x40108274:0x3ffe2e70 0x40104565:0x3ffe2ea0 0x401045f5:0x3ffe2ec0 0x40111f4b:0x3ffe2ee0 0x40108274:0x3ffe2f80 0x40104565:0x3ffe2fe0 0x401045f5:0x3ffe3000 0x40111f4b:0x3ffe3020 0x40108274:0x3ffe30c0 0x40104565:0x3ffe3130 0x40104592:0x3ffe3150 0x400e7351:0x3ffe3170 0x400e75e1:0x3ffe3210 0x400e6029:0x3ffe3230
    
    Rebooting...
    18:08:26: 
    

    Peter.



  • @tuftec Can you also send the shortest version of the script, that causes the bug? I have a FiPy here running w/o problems.
    Edit: Do you use the RGB led for anything else?



  • @robert-hh the heartbeat led is disable in my application already. I will try to capture and send the dump a little later. Thanks.



  • @tuftec Could you try to run the test with heartbeat LED off?



  • @tuftec I have looked at four trace dumps in my tests, and all happen during a file close operation. Somewhere down the line, a ISR Watchdog timer fires after entering a critical section. So it looks like a lock-up condition.
    At littlefs/vsf_littlefs_file.c, line 104-105 looks interesting. There, the semaphore is taken and then the close call proceeds, which finally ends up in the ISR Watchdog event. The ISR is caused by the rmt driver, which is called for the heartbeat LED!



  • @robert-hh I will try to capture this tomorrow when I have some time.



  • @tuftec Can you post the trace dump? Then I can loop the code with the elf file.



  • @robert-hh Yes. Happens every single time.



  • @tuftec Does your error happen every time you try?



  • @peterp do you have any thoughts on this one? Solving this one would make the system so much more reliable.



  • @robert-hh Yes, I have seen core dumps with previous versions of LFS. That is why I moved to FAT again. But I need the improved file system life.



  • @robert-hh adding 500ms delays before and during the file close sequence made no improvement.

    It appears to fail at the second close.

    The first file is only open for reading.
    The second file has been open for writing.

    There is a distinct possibility that the second file was closed at an earlier stage in the process before this close is called. But this should not cause a crash.

    Any ideas.



  • @tuftec I started on Saturday another long term test running (writing million of times a single file). The test is now at ~800_000 writes. During that test, I see regular core dumps. Not at fixed intervals. I'm on it to find a reason. Alternatively I could go back to LFS2.03 with wear leveling enabled.
    So you have seen the same error with the previous LFS? Then going back might not solve the problem.

    P.S.: I used to prefer FAT because it is more reliable in normal operations. However crashes during files writes have a high chance of total file system corruption.



  • @robert-hh unfortunately the LFS 2.3 build you created does not work for me.
    It dumps core when trying to close 2 x open JSON files, as does the previous LFS build that I tried.

    It appears to fail between the first and second file close.
    You made a comment about LFS being slower. Is there a chance that maybe the second close interferes with the first (that has not completed)????

    I am going to have to ultimately run LFS to get the flash life that I need.

    Any clues further ideas.

    I will try to add some delays to see what happens.



  • @robert-hh Thanks. I will give it a try.



  • @tuftec it's here: https://github.com/robert-hh/Shared-Stuff.git
    Changes to the recent Dev version:

    • use lfs 2.3 with block level wear-leveling enabled
    • support UART inverted mode
    • enforce guard time for the RGB LED
    • some minor tweaks, which may not apply for FiPy

    In addition, the frozen modules contain the upysh module (see: https://github.com/micropython/micropython-lib/tree/master/upysh) and the onboard editor pye (see: https://github.com/robert-hh/Micropython-Editor)

    Edit: LFS2.3 clears all files. So save them beforehand.


Log in to reply
 

Pycom on Twitter