External Flash lose files



  • @robert-hh power fail corruption is less of an issue for me since i have permanent solar/battery power. A short flash life is a major drama however.
    I might be able to reconfigure my code to use the nvram methods but I am not sure that will improve the situation.

    Thanks.



  • @tuftec said in External Flash lose files:

    If I change back to LFS, assuming my original crashing problem does not resurface, can I then get wear levelling to increase this to potentially 6 years?

    When chanhing back to LFS, and since you have to re-compile the code anyhow, you may consider updating LFS from the actual version 2.0 to the recent version. Maybe the problem will disappear then.
    P.S.: I am not overly happy with LFS. FAT is faster and seems more reliable under normal conditions. However, with FAT you will end up with a corrupted file system on a power fail during writes.



  • @tuftec said in External Flash lose files:

    I have current product in the field that uses a PIC micro at its heart.

    Even if flash vendors ensure at least 100000 write cycles, the actual numbers that can be achieved is several million. It depends a lot on the operating conditions. So 6 years of operation with an update every 2 minutes means ~ 1.6 Million cycles. The devices of @serafimsaudade broke after ~3 Million cycles.



  • @tuftec With respect to flash wear, FAT is bad. The FAT file system uses a file allocation table (FAT) at a fixed location, and the master directory also at a fixed location. The FAT is written whenever a file is created, or extended. The directory is written on every action that changes a file, at least on file close. So in your case, that FAT may not be changed on every file write, but the directory is. With FAT, the file data may be written at different places, but maybe not. With a block size of 4 K and a file size of 2k, it may as well be that the data is always written to the same block.
    LFS has at the moment at least wear leveling for the file data, but not for the superblock.
    The change to enable block level wear leveling requires re-compiling the code. But the file system may stay.



  • I have current product in the field that uses a PIC micro at its heart. This product writes data back to flash every 2 mins. I have a sizable number of units in the field, in remote applications that have been operating fault free for 6 years. Many of them have not been powered down or reset for over 2 years.
    I do not do anything special to preserve the flash memory on these units. They just work.
    I had expected to have a similar experience with the FiPy devices that I am planning to deploy into similar applications.



  • @robert-hh So what are you saying? With a FAT file system, does it keep writing the file to the same place or will it exercise some degree of levelling? If not it will wear out that block of flash in 70 days!!!!!

    If I change back to LFS, assuming my original crashing problem does not resurface, can I then get wear levelling to increase this to potentially 6 years?

    How do I make the changes to ensure the levelling is activated correctly? Is this something that can be done at run time or does it need a recompile of pycom os (I have never done this before and not keen on starting)?



  • @tuftec The experience of @serafimsaudade indicates that.
    But enabling block wear protection requires just setting that value from 0 to e.g. 100000. That's what the flash supplier guarantees.
    With an update every minute you will be at 100000 write cycles after 70 days. With the numbers seen by @serafimsaudade it would be ~6 years.



  • I have been watching this thread with interest.
    What does it all mean in terms of realistic flash file system life.
    I have an application where I am updating the data in a file (2K bytes) every minute.
    I am now concerned that the flash will wear out relatively quickly.
    I was using LFS but had to switch back to FAT due to some peculiarities with the way my code was functioning.
    I feel that I am going to need wear levelling otherwise the memory is likely to wear out in a few years. I need my devices to have a life expectancy of at least 10 years.
    Any ideas on the best approach to ensure maximum flash life????

    @peterp & @Gijs do you have any information that you can share on flash life, filesystem types, wear levelling and how to optimise its use??



  • @serafimsaudade So it looks like the option of block level wearout protection works for V1 too. I tested it with a setting of 100. But I imageine that this setting eats up superblocks (and space) permanently. So a level of 10000 could be more appropriate.

    Edit: The file & line to be changed: littlefs/sflash_diskio_littlefs.c, line 48

    Edit2: Looking in to the code, the actual LFS version used by pycom is 2.0, not 1.x as I assumed earler. The other version tried is 2.3. So not much difference then. It was just the non-enabled block-wear protection.



  • @serafimsaudade Some results;
    a) Using LFS2 just means to replac four files of teh source by the newer version. But the file system changes.
    b) Even then, block level wear leveling has to be enabled. And that flag exists in V1 too. So I go back to V1 and test, whether that works in V1 too.



  • @serafimsaudade said in External Flash lose files:

    The small numbers of superblock pairs used suggest that the flash will wear up must faster than expected?

    Yes, that's my fear. The interesting thing is, that even if the chip is told to withstand 100000 cycles, it lasted several million cycles. But that is not something one can trust on. Technically there are only a few option:

    • Using a more robust memory type. That can be a cf card, which have wear leveling embedded. Or something a F-RAM module.
    • Using LFS version 2. I will check, a) if lfs2 behaves different, and b) how much work it is to replace the existing LFS1 by LFS2. As far as I recall, the file systems are not compatible. So there is no OTA update, but at least the hardware stays the same. The Micropython.org version of MicroPython uses LFS2, so I can use that one for testing.


  • @robert-hh

    Tks very much for the work you have been doing.
    Checking the results you have post so far. The small numbers of superblock pairs used suggest that the flash will wear up must faster than expected?

    Best Regards,



  • @serafimsaudade Using a set of 512 different file names, the number of superblock pairs went up to 8. In a sequence of about 10000 file write, the following frequencies of these block pairs occured:

    Block     Frequency
    (0,1)       1120
    (44,45)     2028
    (755,756)   1200
    (817,818)   1170
    (828,829)   1000
    (881,882)   1147
    (943,944)   1145
    (1004,1005) 1147
    

    Interestingly, one block pair was used about twice as often as the others, like in the previous test with 128 different file names.



  • @serafimsaudade There seem to be a discussion about superblock wear: https://github.com/littlefs-project/littlefs/issues/376
    But this is from early 2020, and the Pycom firmware uses littlfs 1.x, which is before April 2019, when 2.0 was published.

    Version 2 improved wear leveling and prevents wear of the superblocks. See: https://github.com/littlefs-project/littlefs/releases/tag/v2.0.0

    P.S.: At 3300 file writes still only 2 superblock pairs are used, and not even at the same ratio. Blocks (0, 1) and (828, 829) are at a ratio of about 2:1



  • @serafimsaudade No change so far after 5000 writes. I changed the scheme writing to 128 different file names. And then alternative superblock numbers were used. Small section from the log:

    47 -------------------------------------
    *** block erase 427 ***
    *** block prog 427 ***
    *** block erase 0 ***
    *** block prog 0 ***
    48 -------------------------------------
    *** block erase 428 ***
    *** block prog 428 ***
    *** block erase 1 ***
    *** block prog 1 ***
    49 -------------------------------------
    *** block erase 429 ***
    *** block prog 429 ***
    *** block erase 0 ***
    *** block prog 0 ***
    50 -------------------------------------
    *** block erase 430 ***
    *** block prog 430 ***
    *** block erase 1 ***
    *** block prog 1 ***
    51 -------------------------------------
    *** block erase 431 ***
    *** block prog 431 ***
    *** block erase 0 ***
    *** block prog 0 ***
    52 -------------------------------------
    *** block erase 432 ***
    *** block prog 432 ***
    *** block erase 1 ***
    *** block prog 1 ***
    53 -------------------------------------
    *** block erase 433 ***
    *** block prog 433 ***
    *** block erase 0 ***
    *** block prog 0 ***
    54 -------------------------------------
    *** block erase 434 ***
    *** block prog 434 ***
    *** block erase 829 ***
    *** block prog 829 ***
    55 -------------------------------------
    *** block erase 435 ***
    *** block prog 435 ***
    *** block erase 828 ***
    *** block prog 828 ***
    56 -------------------------------------
    *** block erase 436 ***
    *** block prog 436 ***
    *** block erase 829 ***
    *** block prog 829 ***
    57 -------------------------------------
    *** block erase 437 ***
    *** block prog 437 ***
    *** block erase 828 ***
    *** block prog 828 ***
    58 -------------------------------------
    *** block erase 438 ***
    *** block prog 438 ***
    *** block erase 829 ***
    *** block prog 829 ***
    59 -------------------------------------
    *** block erase 439 ***
    *** block prog 439 ***
    *** block erase 828 ***
    *** block prog 828 ***
    60 --------------------------
    

    But it seem to be just another two block numbers.



  • @robert-hh

    Ok, tks
    I also will try to run some tests.

    Best Regards,



  • @robert-hh So I wrote a little python script:

    #
    def run():
        b = bytearray(2038)
    
        f = open("logdata", "w")
        f.write(b)
        f.close()
    

    And I added a few printf statements to sflash_diskio_littlefs.c, which print a line every time erase or write is called. Then I called up the little script a few times. This is the output:

    >>> wtest.run()
    *** block erase 0 ***
    *** block prog 0 ***
    *** block erase 235 ***
    *** block prog 235 ***
    *** block erase 1 ***
    *** block prog 1 ***
    >>> wtest.run()
    *** block erase 236 ***
    *** block prog 236 ***
    *** block erase 0 ***
    *** block prog 0 ***
    >>> wtest.run()
    *** block erase 237 ***
    *** block prog 237 ***
    *** block erase 1 ***
    *** block prog 1 ***
    >>> wtest.run()
    *** block erase 238 ***
    *** block prog 238 ***
    *** block erase 0 ***
    *** block prog 0 ***
    >>> wtest.run()
    *** block erase 239 ***
    *** block prog 239 ***
    *** block erase 1 ***
    *** block prog 1 ***
    >>> wtest.run()
    *** block erase 240 ***
    *** block prog 240 ***
    *** block erase 0 ***
    *** block prog 0 ***
    

    The result is confusing. It shows some kind of wear leveling, the block with the changing numbers from 235 to 240 (going on to 1024 as I noticed and then skipping to 2), but also, that alternating block 0 and block 1 are written/erased. These are so called "superblocks". According to the spec, they should change after a while. I will let that run to see if it happens. If not, then that is a good candidate for flash failure.

    P.S.: If you worry about me killing a device. No problem. It is an open one, on which I already had replaced a flash chip. And I have spare chips here.



  • @serafimsaudade Since the block size is 4k, writing 200 bytes or 2k bytes does not make a difference.



  • @robert-hh

    Sorry I mistype the file size, The wipy3 write to flash an file with +- 2K bytes.

    "How is the actual file write done. Do you write a single short file every time, or do you append the data to an ever increasing file. Or do you write to alternate files, ...."

    I open always the same file, write with no append, and close the file.

    Best Regards,



  • @serafimsaudade OK. So if it is a flash wear problem, you have about 6 years time. Besides that....

    Doing a quick calc. The WiPy3 flash file system is 4 MB, or 1024 (or about 10^3) blocks of 4 k. With a well distributed wear leveling, one could make at least 10^8 block writes. That number has to be divided by the number of block writes caused by you file operation to get the number of safe file write. I do not know this number. To get that, I could add debug messages to the LFS. For that, I have another question:

    • How is the actual file write done. Do you write a single short file every time, or do you append the data to an ever increasing file. Or do you write to alternate files, ....

Log in to reply
 

Pycom on Twitter