UART RX buffer issue after buffer overflow



  • Summary: overflowing the UART receive buffer causes an offset in the ring buffer pointer so further use of the UART returns incorrect data.

    From this thread. Reproduced by me this morning without difficulty. The GPS module discussed in the thread is not relevant/required to reproduce the issue.

    Test setup: External device (I used an Arduino) setup to send n chars to the FiPy UART. I've only tested this on UART 1 so far (P3/P4).
    Test init: UART(1, baudrate=9600, bits=8, parity=None, stop=1, timeout_chars=10, pins=('P3', 'P4'))

    (sysname='FiPy', nodename='FiPy', release='1.15.0.b1', version='v1.8.6-849-baa8c33 on 2018-01-29', machine='FiPy with ESP32', lorawan='1.0.0', sigfox='1.0.1')
    

    Sending 50 bytes results in any() showing 50 and readall() returning correct data (as expected).

    Sending 999 bytes results in any() showing 480 and readall() returning truncated data (as expected?).

    However, this overflow appears to cause some corruption or offset in the ring buffer pointer. Any subsequent data is received and the bytes correctly counted when checking any(), BUT reading the data with read(n) or readall() returns previous data from the buffer before some of the new data. eg (sent 20 chars):

    >>> uart.any()
    20
    >>> uart.readall()
    b'xxx01234567890123456'
    

    In this case, "xxx" was previous data still in the buffer, the data actually received and counted was "01234567890123456789".

    De (&re) init of the UART does not fix it once the buffer is in this state. Calling machine.reset() and initializing the UART from fresh seems to be the only way to sort it.

    This exact UART behaviour is discussed here (different ESP32 OS but they do come up with a fix), and perhaps more usefully here (recent ESP-IDF bugfix).



  • @paulm I made the test with a baud rate of 115200 and 2048 byte and 10k data. With flow control enabled, nothing is lost. Without flow control, data is lost. But that is expected without flow control. If you send more data than the input ring buffer can take, while you not fetching it, then excess data is lost.
    However, what I get is strange. I wait with reading until all data is sent. What I get then is 607 bytes from the first part of the message and about 100 bytes from the tail. And that is repeatable.
    The other thing that puzzles me was that note,. that the UART buffer was extended to 4k. Looking into the code, mods/machuart.c, line 64, it says 512 byte plus the 128 byte FIFO of the chip. But still that should be OK for 480 bytes.

    The Pycom code does nothing to UART besides the API. The actual data transfer is done through the esp-idf code. The only thing that could be changed is the RX buffer size.



  • @robert-hh Looking over the code of your test, I don't believe that would reproduce the issue. Since there's a perpetual while True loop, you were likely performing the uart reads fast enough that they never went over the 480 buffer size, never triggering the issue.

    To reproduce my test generating the error, create the UART object on Pycom, wait for 5 seconds doing nothing, send over 600 bytes of data to the UART during this time, and THEN, after the send is done, start reading the UART on the Pycom.

    If you could do that, and not get the problem, then that would show a very interesting and unexplainable difference between my Pycom board and yours.



  • @robert-hh Thanks for the information. I can confirm that, within the last 24 hours, with a new GPy running the latest development 1.20 firmware, baud rate 4800, I was observing a confirmed behavior where I would send 665 bytes to UART1 and the following would happen:

    uart.any() #Returned 480
    uart.read(uart.any()) #Take the 480
    uart.any() #Always returned 205, should have been 185
    uart.read(uart.any()) #The first 20 bytes read were exactly the last 20 bytes of the first read

    I later redesigned my code to write my individual 133 byte packets to the GPy individually, with my other MCU flushing and waiting 100ms in between packets to ensure the GPy never went over 480.

    That fixed the issue.

    It is strange that this issue was still happening to my Pycom running the latest firmware, since what you posted suggest that it might be in a "ought to have already been fixed" state.



  • @paulm Increasing the mssage size ight be cause by intrinsic encoding/decoding. See this example:

    s='äöü'
    print(len(s))
    b=b'öäü'
    print(len(b))
    print(len(b.decode()))
    

    outputs:
    3
    6
    3
    I agree that THIS would be an example of Python hell.



  • @paulm I just made a quick test with data sent from my PC to a LoPy4, which had the following test script:

    from machine import UART
    import sys
    
    uart=UART(1, 921600, pins=('P3', 'P4', 'P8', 'P9'))
    
    
    while True:
        if uart.any():
            _ = sys.stdout.write(uart.read())
    

    Up to 115200 all data was received even without flow control. At 230400, data was lost without flow control. With RTS@Lopy/CTS@FT232 handshaking. I had no data loss up to a speed of 921600, which was the fastest I tested.



  • @paulm I cannot find an words mentioning, that this has been fixed. Just the post below pointing at the IDF and that using the new IDF would fix it. The only change to UARt I found was the increase of RX buffer size to 4k for GPY and FIPY.
    In the esp-idf there are plenty changes for the UART in the last 11 months. Hard to tell which should fix that buffer problem.



  • I will eat my words if someone can prove to me this has been decisively fixed. But Occam's Razor tells me this unfixed Pycom software error is presently the cause of yet another extremely frustrating "why the hell isn't my project working properly" moment currently affecting my development. I have an external MCU communicating with GPy - no problems when one 133 byte packets is sent over UART, but corrupted data structures consistently are observable when I send 5 packets (i.e., over 480 buffer size).

    In my case, I read uart.read(uart.any()) once, and then the second call returns 205 bytes - even though my other MCU is incontrovertibly sending only 665 bytes (not 205+480=685).

    Pycom would have such a great product if the quality of the software matched that of the hardware. Implementation-wise, it has caused more delays in development than the time savings even the most vociferous Python-advocate could claim over C/Arduino. Now I have to totally change the architecture of my MCU->GPy communication to do handshakes between small packets. In a low-power environment, this means slower execution and shorter battery life. Too bad 11 months after this thread was made, it remains unfixed as the source of time-consumptive surprise-problems.



  • I would also be interested if this problem was fixed or not?

    I'm working on a project where i need to read data with up to 2100 bytes at once and it seems that data (about 500 bytes) gets lost somewhere.
    I have no problems with smaller chunks with about 80bytes.



  • @daniel was this problem actually fixed?

    I think I'm having a similar issue on the latest LoPy4 firmware.



  • @daniel

    Awesome thanks!



  • @nathanh it will be a standard firmware update.



  • @daniel

    Sorry for the dumb question, but will this be a firmware update which we simply use the firmware updater tool on our devices or will we need to build our own off the repo?

    Also, thanks a lot to everyone involved in getting this sorted, will save me a lot of headaches going forward!



  • daniel ADMINISTRATORS less than a minute ago
    Hello guys,

    The issue fixed in the IDF is definitely affecting us. We will merge that patch and will be available for our next release this week.

    Cheers,
    Daniel



  • I believe I am experiencing the same issue:
    https://github.com/pycom/pycom-micropython-sigfox/issues/129


Log in to reply
 

Pycom on Twitter