Der's blag

Der's blag

ESP32 is crashing when using WiFi and LVGL at the same time

Hardware

Context

I was playing around with a neat integrated WT32-SC01 development board (don’t by this one, there are newer and better options right now). It is perfect for using with LVGL: you can create user-interfaces that can adapt to different screen sizes with relatively little difficulty. Historically, this was not really possible, and embedded devices had to either invent their own GUI from scratch (which takes a lot of time), settle for an ugly interface, or use Android or Linux, which can be expensive.

Obviously, LVGL comes with its own set of compromises. Unlike some other libraries, its rendering is entirely software-based, which means that you need to send all those rendered bytes down the narrow bus. Full-screen animations may lag, even on boards with parallel interfaces. The WT32-SC01 uses an SPI interface, which has even lower potential bandwidth.

Fortunately, LVGL does not require you to have a full frame buffer somewhere in memory. Everything is drawn in chunks, so you only need to allocate a small amount of memory. It seems that having only 10% of the full framebuffer in memory is optimal for performance, but you can even go lower if needed

Platformio + FreeRTOS + LVGL setup.

Initially I started to based on top of this example from littleyoda. In my opinion, it’s absolutely critical to use PlatformIO for any Arduino-related development. PlatformIO is great at fixing and pinning all your state, and in the case of LVGL, it’s useful for managing configurations. That way, when you return to the project a year later, it still compiles and flashes without any effort or tons of manual library installation.

I have pushed some changes to that template in my fork here. Only useful for WT32-SC01.

However, to implement some complex logic, I needed the ability to run my custom code in a separate independent task. Hopefully lvgl/lv_port_esp32 provides exactly that. LVGL code is started in a separate FreeRTOS task, which should not block any other tasks due to its low priority.

There was one major problem with that code: drawing was visibly slower compared to what I had seen before. The fix was quite simple: raise the task priority whenever LVGL is actively drawing and set it back to a low priority when LVGL is done with drawing.

Be careful with that. In FreeRTOS, a task with a lower priority will not be scheduled as long as there is a task with a higher priority that needs to be executed

The problem

Now, I have written quite a lot of code, and it works great. However, when I either start writing to flash (using the Arduino Preferences library) or connect to WiFi, random crashes start to happen. They do not occur every time, but they are still unpleasant. They look something like this:

Guru Meditation Error: Core  1 panic'ed (Interrupt wdt timeout on CPU1).
... a long and unrelated stack-trace follows

I suspected that the memory was clearly corrupted, as the stack trace made absolutely no sense (spoiler alert: it was not the case).

I tried a lot of things, but eventually these steps helped me:

  • I tried putting the code that caused the problem between noInterrupts() and interrupts() to see if that would help, but it was not really helpful. However, the crashes did change a bit, and that allowed me to find a lot of interesting information through Google.
  • I modified the application so that the crash would occur every time. In my case, I had a function that saved settings to flash every minute, but I modified it to run 10 times per second, so the random crash occurred almost instantly after boot. This made it so much easier to identify the cause, as before, I had to wait around a minute to reproduce the problem.
  • I disabled almost every feature that I could one by one until the crashing stopped. This quickly showed that something in LVGL was causing the problem. After trying many other things, it turned out that the lv_tick_task function function was the culprit.

What is that function even doing? Well, LVGL needs to keep track of time. For example, if animation is supposed to take 50ms, something needs to be done in even intervals. In microcontrollers that’s usually not as simple as in system-level code. Therefore a timer is created. A timer sends aperiodic interrupt and that interrupt might occur while some other part of the code is executed (unless interrupts are disabled).

What is that function even doing? Well, LVGL needs to keep track of time. For example, if an animation is supposed to take 50ms, something needs to be done in even intervals during that time interval. In microcontrollers, that’s usually not as simple as in system-level code. Therefore a timer is created. The timer sends an aperiodic interrupt, which might occur while some other part of the code is being executed (unless interrupts are disabled). Whenever you configure that timer manually, you have to call this function from your interrupt handler.

Now, onto the fun part: the ESP32 uses SPI for a variety of functions. It is required for reading from flash, as well as for communication with the second core (though I’m not entirely certain, so please don’t quote me on that). Even reading from the PSRAM requires SPI, as the ESP32 only has 320 KiB of internal RAM.

Here comes our problem: When some other code is using SPI, external RAM is temporarily disabled. You can only access what’s inside the internal RAM. However, the interrupt that fired and executed the code was calling the lv_tick_inc function, which in turn incremented the internal LVGL counter. This counter is, by default, stored in the external RAM and is not always available.

Hopefully, LVGL allows to define custom attributes on that counter. A simple build flag -DLV_ATTRIBUTE_TICK_INC=IRAM_ATTR added to platformio.ini instantly fixed the problem.

tl;dr

When:

  • You are running lvgl in one task and your own custom code in the other one.
  • Your custom code is doing SPI access, such as:
    • starting/stopping wireless connection Wifi.begin(...)
    • writing to flash memory, for example with Preferences.
  • You are experiencing random crashes

Do:

  • Add IRAM_ATTR attribute to the lv_tick_task.
  • Add -DLV_ATTRIBUTE_TICK_INC=IRAM_ATTR to the platformio.ini build_flags (or applying LV_ATTRIBUTE_TICK_INC=IRAM_ATTR in your lvgl config).

More reading material: