Based on experience, I fully expected my first venture into Python multi-threading to run into problems. I just didn’t know exactly what that problem would look like. I had hoped that my first Python threading bug would be friendlier to understand and debug, just as the Python programming language has been friendlier for a beginner to write.
Sadly, no such luck. After I started using my GPIO class, with the debounce routine powered by threading.Timer, the crash is a very unfriendly
Segmentation Fault. Surprisingly, it’s not the full “
Segmentation Fault (core dumped)” so there was no core for me to debug with. No matter, it’s my program and I can relaunch it under gdb and try to dissect each instance of the crash.
The symptoms are consistent with a multi-thread timing issue:
- It is unpredictable. I could use my program for several minutes without a problem, or the program might die within a few seconds of launch.
- When something does go wrong, the actual point of failure that crashed in gdb is deep in the bowels of the system far from the actual problem.
- In one case, a Python thread worker without any of my code on the stack.
- In another case, internal system corruption error “
Error in `python3': corrupted size vs. prev_size“
- The state of the rest of the system is also unpredictable. Using gdb’s “
info threads” command I could get an overlook of the rest of the program, and it looks different every time.
I thought my timer-related code was simple: it checks the state of the input pin, and if the pin level has changed (after the debounce wait) also change the appearance of the visual indicator. The visual indication change was done by changing the stylesheet on the QWidget on screen. One of the error messages gave me my first clue that was my bug.
Could not parse stylesheet of object 0x21f6f58
Going back to look at the state of the other threads with this in mind, I saw they all had something in common: the main UI thread at the point of the crash is working on some part of visual rendering.
So we have a candidate explanation: I had been updating the QWidget stylesheet in the timer callback, which executes on a thread that is not the main Qt UI thread. So when this occurs when the UI thread is in the middle or rendering the QWidget, it would see the data structure change out from under it by the timer thread. This is not good and could explain the crash.
To test the hypothesis, I modified my program to perform stylesheet updates from the timer callback thread and do it frequently. Tens of times per second versus once every few seconds in my normal execution. The program now crashes consistently within a few seconds of launch which I saw as good confidence builder for my hypothesis.
The next thing we need is a fix. I need to make sure the stylesheet update occurs on the UI thread, and it seems simplest to post a message to the UI thread event queue. I can accomplish this by emitting a signal from the timer thread and have that go to a slot on the UI thread via a Queued Connection. The slot executes on the UI thread and can safely update the stylesheet there.
After this modification, the test program is now only emitting a signal from the timer callback thread to queue a stylesheet update, instead of performing the actual update itself on the timer thread. And now the test program no longer crashes.
I started debugging this believing I ran into a problem deep in the bowels of Python implementation internals. It turns out my clumsy use of threads had confused Qt.
Lesson learned, moving on to the next lesson…