External Flash lose files



  • Hi,
    I have install in 2019 11 wipy 3. And used flash to save some configs. Everything has runs ok until December 2020, when all the 11 wipy became offline. I bring them to the office to check the problem.
    The firmware installed on the devices didn't find the configs files saved into the flash.
    So I checked the flash on a FTP client, and seemed formated.
    I sended the configs files again over FTP and reboot the wipy and the firmware didn't boot because of missing config files. Then I opened the flash over FTP client and the flash has empty again.
    All the 11 wipy got the same problem. This is every concern.

    I'm afraid that the external memory flash are faulty on all the 11 wipy.
    I used the LFS on all devices.

    Best regards,
    Luís Santos



  • @robert-hh Many thanks for this detailed analysis and testing. The results are promising, based on your tests and also information from others I think it is clear that enabling wear-leveling does not have any significant negative effect.



  • @Géza-Husi I had a test running with that setting on three devices:

    • LoPy4, 7.4 Million cycles, ~350_000 cycles/day
    • FiPy, 4 Million cycles, ~200_000 cycles/day
    • Genuine ESP32 with WiPy3 firmware, 10 Million cycles, ~180_000 cycles/day

    The test consisted of writing the same 2k = 1 block sized file over and over again. The test code was set up to continue the test after a reboot.
    Reboots had to be forced by pushing reset about every seconds day due to WiFi loss. The devices were not connected to a computer and I looked into the REPL once a day using telnet.
    At no time there were file system corruptions. No file increase, no noticeable decrease in free file space. At the ESP32 with 1024 blocks I see still 1007 free blocks. 11 are used by the file system. 6 blocks used by the super block structure. That adds to 1024. So LFS performed really well.
    It seemed that the ESP32 got a little bit slower over time from about 200_000 cycles/day down to 180_000 cycles/day. But the first number were taken when connected to a PC with REPL open, the second when running detached with just a power supply at USB, talking to no one.
    No clue why the LoPy4 was so much faster. Most likely a faster flash chip.



  • @tuftec, @robert-hh : May I ask that have you found any issues (except the LED/Heartbeat topic) when enabled wear-leveling ? E.g. corrupted FS, reduced power-loss resilience capability, increased size of the files ? Thanks in advance !



  • @robert-hh good to hear. Thanks for the update.
    I am continuing, although slowly, with my developments. No other issues to speak of at this point. Peter.



  • @tuftec Test finished.
    The WiPy3 stopped as planned after 10 Million write cycles without problems using LFS with wear leveling ENABLED. I had to reboot it a few times manually since it lost WIFi connection. That relates to my AP.
    I stopped the test with LoP4 after 7.5 Million cycles, and with FiPy after 4 Million cycles. None of them had an issue with writing. The FiPy was in the game, because it had Core dumps caused by some interaction with the heartbeat flash every ~100_000 cycles. After changing code, that went well for 3 Million cycles.



  • @robert-hh Thanks. All good. I have tidied up my code and incorporated your suggestions. Seems to be working now with reliable lte and lfs. I will leave it run for some time to confirm. I can now move on to the next level of test and debug. Tanks again.



  • @serafimsaudade @Gijs @tuftec OK. This thread is overloaded now with the discussion of three different topics:

    a) Flash wear. For that I made a PR: https://github.com/pycom/pycom-micropython-sigfox/pull/516 Tests are still running well, with two devices beyond 4 Million writes of a single short file. All well. I will continue up to 10 Million writes.

    b) During that wear levelling test, I had core dumps at about every 50_000 writes. They were related to the heartbeat flash. I made a PR which changed that heartbeat timing a little bit. After that, the crashes disappeared, at least in that test. I am not 100% confident that the change cured the initial problem, which could also be a Core-0/Core-1 collision. But the change is also not intrusive and saves a few clock cycles. PR here: https://github.com/pycom/pycom-micropython-sigfox/pull/525

    c) There was a core dump when one attempted to close a file twice using littlefs. Even if that is a bad practice, it should not crash the firmware. I copied over the mechanism used by Damien George to ignore the succeeding closes. PR here: https://github.com/pycom/pycom-micropython-sigfox/pull/524

    Maybe some of that will eventually find it's way into the official build.



  • @robert-hh thanks



  • @tuftec The editor is here: https://github.com/robert-hh/Micropython-Editor with a README and a doc-file.
    I typically import it in boot.py or main.py with:
    from pye_mp import pye
    and call it with:
    pye(file_name)

    The upysh module is here (more or less): https://github.com/micropython/micropython-lib/tree/master/upysh
    I usually import it with:
    from upysh import *
    which creates a list of names with commands. Then there is a man command, which lists all available commands, something like ls or ls("name"), rm, mv, mkdir, rmdir, cat, cp, head.., Unix like commands.

    Note: Loading files to the editor is not fast. Your DipApp.py takes a while to load. And the file size is limited by RAM. With SPIRAM, that should not be the limiting issue. Not all keystrokes of the editor are available with all terminal emulators. It works best with Linux and picocom, or tio, or screen. On Windows, you may use Putty or TeraTerm.



  • @robert-hh thanks for the pointers.i will rework my code to see if I can get it to work reliably with lfs.
    I will provide feedback on my progress.

    As an aside, how do you use editor and other frozen bits that you typically add to your builds? Is there a user document somewhere?

    Thanks again.



  • @tuftec You may work with global variables. Or you can use classes and its bound variables. It is just a matter of preference. Only you have to avoid double close of files. Even if the latter should not hurt, there seem to be a special problem with micropython here, which as to be addressed separately.

    About files:

    • remove the isolated close statements. You do not need them.
    • for cases when you open a file for reading/writing, better use something like:
    			with open('DipConfig.json','w') as f:
    			    fp.write(json.dumps(dipconfig))
    			time.sleep(1)
    
    • I have only seen one open where the close did not immediately follow, and that is in line 657. That one may be obsolete.
    • There is an open with a following load in line 830. It looks as of you could close the file there (or better use the with.... structure, which will take care of the close).

    Yes, I understand that you have to communicate with a timer module. So the global variables. But structure of the whole thing seems overly complicated, with redundant definitions and imports.



  • @robert-hh i do not have enough experience with micropython. Is there a defined technique for how to share variables between modules. I have a timer that I share in an attempt to work out runtime for instance. I also found that I had to make some variables global so that my program would work correctly after coming out of deepsleep.
    I am probably doing something wrong but making variables truely global seemed to stop micropython from complaining.



  • @robert-hh thanks for looking at this.
    Yes, i suspected the multiple closes might be an issue. I might need to find a way to test a file for open before closing. It does seem a bit pointless though. There are multiple entry points in some areas and i need to ensure the files are closed correctly bwfore going to deepsleep.
    Yes, i struggled with the global local thing. I need to be able to access variables at the highest level and even between modules (DipBoot and DipApp as examples) without the inefficiencies of having to pass parameters.
    Maybe I can just catch the close error (assuming there is one and it doesnt just crash).

    As long as this is known and then well documented we can probably work around it. Although it would be nice if the behaviour was the same as for FAT.

    What would yousuggest next? Clearly I need to use lfs to maximise flash life.

    Thanks.



  • @tuftec OK. I can replicate the crash with a simple test at global level, pasted into repl or executed as a small module:

    f = open("test", "w")
    f.write("Nonsense")
    f.close()
    f.close()
    

    The same happens is you write for instance f = None instead global f.
    What cause the crash in your code is:
    a) defining fp and nvfp as global in line 40, or more to say, defining variables as global at module level is strange.
    b) trying to close files twice. Even without the global definition you would get an error at the second close, albeit not a crash.



  • @tuftec I found an inconsistency in DipApp.py:
    At line 1044, you close nvfp. The file seems to be closed already, and so lfs goes haywire about it. FAT seems not to care.
    Edit. There are quite a few single fp.close() and nvfp.close() calls in the app. As far as I could see, these are not required, since all open's are followed by close calls. Anyhow, it's better to use the with statement, which will take carte of the close, even in case of an error.

    Edit2: Using a simple test case, double close is not an issue. mmhh? So it's not clear why it fails here.



  • @tuftec More questions:
    a) The imports are case senstitive. Tha got lost on download and had to be adapted
    b) DipConfig.py must be DipConfig.json and NNvar.py must be NVvar.json.

    I started with erasing a FiPy module first.
    Then I uploaded a flash image with lfs2.3 enabled.
    Then I uploaded your files, name corrected.
    Then, after reboot, the code starts, flashes blue-yellow-blue & stops soon after without any error message. What is the code expecting?

    Edit: OK. Got it. P16 must be pulled high to start the code and get the crash



  • @tuftec about the test files: did you run them with or w/o REPL (P16 low or high)?



  • @robert-hh possibly. My interest is in wear levelling to get the best life from the flash. If the lfs is not reliable then it is kind of pointless.



  • @tuftec Maybe you should open a new thread for this discussion, because this one was initially about flash wear.


Log in to reply
 

Pycom on Twitter