2021-12-042021-12-06 Roger Cheng

Screen Rotation Support for ESP_8_BIT_Composite Arduino Library

I’ve had my head buried in modern LED-illuminated digital panels, so it was a good change of pace to switch gears to old school CRTs for a bit. Several months have passed since I added animated GIF support to my ESP_8_BIT_Composite video out Arduino library for ESP32 microcontrollers. I opened up the discussion forum option for my GitHub repository and a few items have been raised, sadly I haven’t been able to fulfill the requests ranging from NTSC-J support (I don’t have a corresponding TV) to higher resolutions (I don’t know how). But one has just dropped in my lap, and it was something I can do.

Issue #21 was a request for the library to implement Adafruit GFX capability to rotate display orientation. When I first looked at rotation, I had naively thought Adafruit GFX would handle that above drawPixel() level and I won’t need to write any logic for it. This turned out to be wrong: my code was expected to check rotation and alter coordinate space accordingly. I looked at the big CRT TV I had sitting on my workbench and decided I wasn’t going to sit that beast on its side, and then promptly forgot about it until now. Whoops.

Looking into Adafruit’s generic implementation of drawPixel(), I saw a code fragment that I could copy:

  int16_t t;
  switch (rotation) {
  case 1:
    t = x;
    x = WIDTH - 1 - y;
    y = t;
    break;
  case 2:
    x = WIDTH - 1 - x;
    y = HEIGHT - 1 - y;
    break;
  case 3:
    t = x;
    x = y;
    y = HEIGHT - 1 - t;
    break;
  }

Putting this into my own drawPixel() was a pretty straightforward way to handle rotated orientations. But I had overridden several other methods for the sake of performance, and they needed to be adapted as well. I had drawFastVLine, drawFastHLine, and fillRect, each optimized for their specific scenario with minimal overhead. But now the meaning of a vertical or horizontal line has become ambiguous.

Looking over at what it would take to generalize the vertical or horizontal line drawing code, I realized they have become much like fillRect(). So instead of three different functions, I only need to make fillRect() rotation aware. Then my “fast vertical line” routine can call into fillRect() with a width of one, and similarly my “fast horizontal line” routine calls into fillRect() with a height of one. This invokes some extra computing overhead relative to before, but now the library is rotation aware and I have less code to maintain. A tradeoff I’m willing to make.

While testing behavior of this new code, I found that Adafruit GFX library uses different calls when rendering text. Text size of one uses drawPixel() for single-pixel manipulation. For text sizes larger than one, they switch to using fillRect() to draw more of the screen at a time. I wrote a program to print text at all four orientations, each at three different sizes, to exercise both code paths. It has been added to the collection of code examples as GFX_RotatedText.

Satisfied that my library now supports screen rotation, I published it as version 1.3.0. But that turned out to be incomplete, as I neglected to update the file library.properties.

2021-05-252021-05-24 Roger Cheng

Cat and Galactic Squid

Emily Velasco whipped up some cool test patterns to help me diagnose problems with my port of AnimatedGIF Arduino library example, rendering to my ESP_8_BIT_composite color video out library. But that wasn’t where she first noticed a problem. That honor went to the new animated GIF she created upon my request for something nifty to demonstrate my library.

This started when I copied an example from the AnimatedGIF library for the port. After I added the code to copy between my double buffers to keep them consistent, I saw it was a short clip of Homer Simpson from The Simpsons TV show. While the legal department of Fox is unlikely to devote resources to prosecute authors of an Arduino library, I was not willing to take the risk. Another popular animated GIF is Nyan Cat, which I had used for an earlier project. But despite its online spread, there is actual legal ownership associated with the rainbow-pooping pop tart cat. Complete with lawsuits enforcing that right and, yes, an NFT. Bah.

I wanted to stay far away from any legal uncertainties. So I asked Emily if she would be willing to create something just for this demo as an alternate to Homer Simpson and Nyan Cat. For the inspirational subject, I suggested a picture she posted of her cat sleeping on her giant squid pillow.

A wider view pic.twitter.com/6BHmLRbNW1
— Emily Velasco (@MLE_Online) April 30, 2021

A few messages back and forth later, Emily created Cat and Giant Squid complete with a backstory of an intergalactic adventuring duo.

In gratitude, the squid rescued the cat from its life of eating scraps in alleys and, they've been traveling the universe ever since, discovering new things, helping others, and generally enjoying each other's company.
— Emily Velasco (@MLE_Online) May 14, 2021

Here they are on an seamlessly looping background, flying off to their next adventure. Emily has released this art under the CC BY-SA (Creative Commons Attribution-ShareAlike) 4.0 license. And I have happily incorporated it into ESP_8_BIT_composite library as an example of how to show animated GIFs on an analog TV. When I showed the first draft, she noticed a visual artifact that I eventually diagnosed to missing X-axis offsets. After I fixed that, the animation played beautifully on my TV. Caveat: the title image of this post is hampered by the fact it’s hard to capture a CRT on camera.

2021-05-242021-05-25 Roger Cheng

Finding X-Offset Bug in AnimatedGIF Example

Thanks to a little debugging, I figured out my ESP_8_BIT_composite color video out Arduino library required a new optional feature to make my double-buffering implementation compatible with libraries that rely on a consistent buffer such as AnimatedGIF. I was happy that my project, modified from one of the AnimatedGIF examples, was up and running. Then I swapped out its test image for other images, and it was immediately clear the job is not yet done. These test images were created by Emily Velasco and released under Creative Commons Attribution-ShareAlike 4.0 license (CC BY-SA 4.0).

This image resulted in the flawed rendering visible as the title image of this post. Instead of numbers continously counting upwards in the center of the screen, various numbers are rendered at wrong places and not erased properly in the following screens. Here is another test image to get more data

Between the two test images and observing where they were on screen, I narrowed the problem. Animated GIF files might only update part of the frame and when that happens, the frame subset is to be rendered at a X/Y offset relative to the origin. The Y offset was accounted for correctly, but the X offset went unused meaning delta frames were rendering against the left edge rather than the correct offset. This problem was not in my library, but inherited from the AnimatedGIF example. Where it went unnoticed because the trademark-violating animated GIF used by that example didn’t have an X-axis offset. Once I understood the problem, I went digging into AnimatedGIF code. Where I found the unused X-offset, and added it into the example where it belonged. These test images now display correctly, but they’re not terribly interesting to look at. What we need is a cat with galactic squid friend.

2021-05-232021-05-24 Roger Cheng

Animated GIF Decoder Library Exposed Problem With Double Buffering

Once I resolved all the problems I knew existed in version 1.0.0 of my ESP_8_BIT_composite color video out Arduino library, I started looking around for usage scenarios that would unveil other problems. In that respect, I can declare my next effort a success.

My train of thought started with ease of use. Sure, I provided an adaptation of Adafruit’s GFX library designed to make drawing graphics easy, but how could I make things even easier? What is the easiest way for someone to throw up a bit of colorful motion picture on screen to exercise my library? The answer came pretty quickly: I should demonstrate how to display an animated GIF on an old analog TV using my library.

This is a question I’ve contemplated before in the context of the Hackaday Supercon 2018 badge. Back then I decided against porting a GIF decoder and wrote my own run-length encoding instead. The primary reason was that I was short on time for that project and didn’t want to risk losing time debugging an unfamiliar library. Now I have more time and can afford the time to debug problems porting an unfamiliar library to a new platform. In fact, since the intent was to expose problems in my library, I fully expected to do some debugging!

I looked around online for an animated GIF decoder library written in C or C++ code with the intent of being easily portable to microcontrollers. Bonus if it has already been ported to some sort of Arduino support. That search led me to the AnimatedGIF library by Larry Bank / bitbank2. The way it was structured made input easy: I don’t have to fuss with file I/O or SPIFFS, I can feed it a byte array. The output was also well matched to my library, as the output callback renders the image one horizontal line at a time, a great match for the line array of ESP_8_BIT.

Looking through the list of examples, I picked ESP32_LEDMatrix_I2S as the most promising starting point for my test. I modified the output call from the LED matrix I2S interface to my Adafruit GFX based interface, which required only minor changes. On my TV I can almost see a picture, but it is mostly gibberish. As the animation progressed, I can see deltas getting rendered, but they were not matching up with their background.

After chasing a few dead ends, the key insight was noticing my noisy background of uninitialized memory was flipping between two distinct values. That was my reminder I’m performing double-buffering, where I swap between front and back buffers for every frame. AnimatedGIF is efficient about writing only the pixels changed from one frame to the next, but double buffering meant each set of deltas was written over not the previous frame, but two frames prior. No wonder I ended up with gibberish.

Aside: The gibberish amusingly worked in my favor for this title image. The AnimatedGIF example used a clip from The Simpsons, copyrighted material I wouldn’t want to use here. But since the image is nearly unrecognizable when drawn with my bug, I can probably get away with it.

The solution is to add code to keep the two buffers in sync. This way libraries minimizing drawing operations would be drawing against the background they expected instead of an outdated background. However, this would incur a memory copy operation which is a small performance penalty that would be wasted work for libraries that don’t need it. After all of my previous efforts to keep API surface area small, I finally surrendered and added a configuration flag copyAfterSwap. It defaults to false for fast performance, but setting it to true will enable the copy and allow using libraries like AnimatedGIF. It allowed me to run the AnimatedGIF example, but I ran into problems playing back other animated GIF files due to missing X-coordinate offsets in that example code.

2021-05-222021-05-23 Roger Cheng

TIL Some Video Equipment Support Both PAL and NTSC

Once I sorted out memory usage of my ESP_8_BIT_composite Arduino library, I had just one known issue left on the list. In fact, the very first one I filed: I don’t know if PAL video format is properly supported. When I pulled this color video signal generation code from the original ESP_8_BIT project, I worked to keep all the PAL support code intact. But I live in NTSC territory, how am I going to test PAL support?

This is where writing everything on GitHub paid off. Reading my predicament, [bootrino] passed along a tip that some video equipment sold in NTSC geographic regions also support PAL video, possibly as a menu option. I poked around the menu of the tube TV I had been using to develop my library, but didn’t see anything promising. For the sake of experimentation I switched my sketch into PAL mode just to see what happens. What I saw was a lot of noise with a bare ghost of the expected output, as my TV struggled to interpret the signal in a format it could almost but not quite understand.

I knew the old Sony KP-53S35 RPTV I helped disassemble is not one of these bilingual devices. When its signal processing board was taken apart, there was an interface card to host a NTSC decoder chip. Strongly implying that support for PAL required a different interface card. It also implies newer video equipment have a better chance of having multi-format support, as they would have been built in an age when manufacturing a single worldwide device is cheaper than manufacturing separate region-specific hardware. I dug into my hardware hoard looking for a relatively young piece of video hardware. Success came in the shape of a DLP video projector, the BenQ MS616ST.

I originally bought this projector as part of a PC-based retro arcade console with a few work colleagues, but that didn’t happen for reasons not important right now. What’s important is that I bought it for its VGA and HDMI computer interface ports so I didn’t know if it had composite video input until I pulled it out to examine its rear input panel. Not only does this video projector support composite video in both NTSC and PAL formats, it also had an information screen where it indicates whether NTSC or PAL format is active. This is important, because seeing the expected picture isn’t proof by itself. I needed the information screen to verify my library’s PAL mode was not accidentally sending a valid NTSC signal.

Further proof that I am verifying a different code path was that I saw a visual artifact at the bottom of the screen absent from NTSC mode. It looks like I inherited a PAL bug from ESP_8_BIT, where rossumur was working on some optimizations for this area but left it in a broken state. This artifact would have easily gone unnoticed on a tube TV as they tend to crop off the edges with overscan. However this projector does not perform overscan so everything is visible. Thankfully the bug is easy to fix by removing an errant if() statement that caused PAL blanking lines to be, well, not blank.

Thanks to this video projector fluent in both NTSC and PAL, I can now confidently state that my ESP_8_BIT_composite library supports both video formats. This closes the final known issue, which frees me to go out and find more problems!

[Code for this project is publicly available on GitHub]

2021-05-212021-05-22 Roger Cheng

Allocating Frame Buffer Memory 4KB At A Time

Getting insight into computational processing workload was not absolutely critical for version 1.0.0 of my ESP_8_BIT_composite Arduino library. But now that the first release is done, it was very important to get those tools up and running for the development toolbox. Now that people have a handle on speed, I turned my attention to the other constraint: memory. An ESP32 application only has about 380KB to work with, and it takes about 61K to store a frame buffer for ESP_8_BIT. Adding double-buffering also doubled memory consumption, and I had actually half expected my second buffer allocation to fail. It didn’t, so I got double-buffering done, but how close are we skating to the edge here?

Fortunately I did not have to develop my own tools here to gain insight into memory allocation, ESP32 SDK already had one in the form of heap_caps_print_heap_info() For my purposes, I called it with the MALLOC_CAP_8BIT flag because pixels are accessed at the single byte (8 bit) level. Here is the memory output running my test sketch, before I allocated the double buffers. I highlighted the blocks that are about to change in red:

Heap summary for capabilities 0x00000004:
  At 0x3ffbdb28 len 52 free 4 allocated 0 min_free 4
    largest_free_block 4 alloc_blocks 0 free_blocks 1 total_blocks 1
  At 0x3ffb8000 len 6688 free 5872 allocated 688 min_free 5872
    largest_free_block 5872 alloc_blocks 5 free_blocks 1 total_blocks 6
  At 0x3ffb0000 len 25480 free 17172 allocated 8228 min_free 17172
    largest_free_block 17172 alloc_blocks 2 free_blocks 1 total_blocks 3
  At 0x3ffae6e0 len 6192 free 6092 allocated 36 min_free 6092
    largest_free_block 6092 alloc_blocks 1 free_blocks 1 total_blocks 2
  At 0x3ffaff10 len 240 free 0 allocated 128 min_free 0
    largest_free_block 0 alloc_blocks 5 free_blocks 0 total_blocks 5
  At 0x3ffb6388 len 7288 free 0 allocated 6784 min_free 0
    largest_free_block 0 alloc_blocks 29 free_blocks 1 total_blocks 30
  At 0x3ffb9a20 len 16648 free 5784 allocated 10208 min_free 284
    largest_free_block 4980 alloc_blocks 37 free_blocks 5 total_blocks 42
  At 0x3ffc1f78 len 123016 free 122968 allocated 0 min_free 122968
    largest_free_block 122968 alloc_blocks 0 free_blocks 1 total_blocks 1
  At 0x3ffe0440 len 15072 free 15024 allocated 0 min_free 15024
    largest_free_block 15024 alloc_blocks 0 free_blocks 1 total_blocks 1
  At 0x3ffe4350 len 113840 free 113792 allocated 0 min_free 113792
    largest_free_block 113792 alloc_blocks 0 free_blocks 1 total_blocks 1
  Totals:
    free 286708 allocated 26072 min_free 281208 largest_free_block 122968

I was surprised at how fragmented the memory space already was even before I started allocating memory in my own code. There are ten blocks of available memory, only two of which are large enough to accommodate an allocation for 60KB. Here is the memory picture after I allocated the two 60KB frame buffers (and two line arrays, one for each frame buffer.) With the changed sections highlighted in red.

Heap summary for capabilities 0x00000004:
  At 0x3ffbdb28 len 52 free 4 allocated 0 min_free 4
    largest_free_block 4 alloc_blocks 0 free_blocks 1 total_blocks 1
  At 0x3ffb8000 len 6688 free 3920 allocated 2608 min_free 3824
    largest_free_block 3920 alloc_blocks 7 free_blocks 1 total_blocks 8
  At 0x3ffb0000 len 25480 free 17172 allocated 8228 min_free 17172
    largest_free_block 17172 alloc_blocks 2 free_blocks 1 total_blocks 3
  At 0x3ffae6e0 len 6192 free 6092 allocated 36 min_free 6092
    largest_free_block 6092 alloc_blocks 1 free_blocks 1 total_blocks 2
  At 0x3ffaff10 len 240 free 0 allocated 128 min_free 0
    largest_free_block 0 alloc_blocks 5 free_blocks 0 total_blocks 5
  At 0x3ffb6388 len 7288 free 0 allocated 6784 min_free 0
    largest_free_block 0 alloc_blocks 29 free_blocks 1 total_blocks 30
  At 0x3ffb9a20 len 16648 free 5784 allocated 10208 min_free 284
    largest_free_block 4980 alloc_blocks 37 free_blocks 5 total_blocks 42
  At 0x3ffc1f78 len 123016 free 56 allocated 122880 min_free 56
    largest_free_block 56 alloc_blocks 2 free_blocks 1 total_blocks 3
  At 0x3ffe0440 len 15072 free 15024 allocated 0 min_free 15024
    largest_free_block 15024 alloc_blocks 0 free_blocks 1 total_blocks 1
  At 0x3ffe4350 len 113840 free 113792 allocated 0 min_free 113792
    largest_free_block 113792 alloc_blocks 0 free_blocks 1 total_blocks 1
  Totals:
    free 161844 allocated 150872 min_free 156248 largest_free_block 113792

The first big block, which previously had 122,968 bytes available, became the home of both 60KB buffers leaving only 56 bytes. That is a very tight fit! A smaller block, which previously had 5,872 bytes free, now had 3,920 bytes free indicating that’s where the line arrays ended up. A little time with the calculator with these numbers arrived at 16 bytes of overhead per memory allocation.

This is good information to inform some decisions. I had originally planned to give the developer a way to manage their own memory, but I changed my mind on that one just as I did for double buffering and performance metrics. In the interest of keeping API simple, I’ll continue handling the allocation for typical usage and trust that advanced users know how to take my code and tailor it for their specific requirements.

The ESP_8_BIT line array architecture allows us to split the raw frame buffer into smaller pieces. Not just a single 60KB allocation as I have done so far, it can accommodate any scheme all the way down to allocating 240 horizontal lines individually at 256 bytes each. That will allow us to make optimal use of small blocks of available memory. But doing 240 instead of 1 allocation for each of two buffers means 239 additional allocations * 16 bytes of overhead * 2 buffers = 7,648 extra bytes of overhead. That’s too steep of a price for flexibility.

As a compromise, I will allocate in the frame buffer in 4 kilobyte chunks. These will fit in seven out of ten available blocks of memory, an improvement from just two. Each frame would consist of 15 chunks. This works out to an extra 14 allocations * 16 bytes of overhead * 2 buffers = 448 bytes of overhead. This is a far more palatable price for flexibility. Here are the results with the frame buffers allocated in 4KB chunks, again with changed blocks in red:

Heap summary for capabilities 0x00000004:
  At 0x3ffbdb28 len 52 free 4 allocated 0 min_free 4
    largest_free_block 4 alloc_blocks 0 free_blocks 1 total_blocks 1
  At 0x3ffb8000 len 6688 free 784 allocated 5744 min_free 784
    largest_free_block 784 alloc_blocks 7 free_blocks 1 total_blocks 8
  At 0x3ffb0000 len 25480 free 724 allocated 24612 min_free 724
    largest_free_block 724 alloc_blocks 6 free_blocks 1 total_blocks 7
  At 0x3ffae6e0 len 6192 free 1004 allocated 5092 min_free 1004
    largest_free_block 1004 alloc_blocks 3 free_blocks 1 total_blocks 4
  At 0x3ffaff10 len 240 free 0 allocated 128 min_free 0
    largest_free_block 0 alloc_blocks 5 free_blocks 0 total_blocks 5
  At 0x3ffb6388 len 7288 free 0 allocated 6776 min_free 0
    largest_free_block 0 alloc_blocks 29 free_blocks 1 total_blocks 30
  At 0x3ffb9a20 len 16648 free 1672 allocated 14304 min_free 264
    largest_free_block 868 alloc_blocks 38 free_blocks 5 total_blocks 43
  At 0x3ffc1f78 len 123016 free 28392 allocated 94208 min_free 28392
    largest_free_block 28392 alloc_blocks 23 free_blocks 1 total_blocks 24
  At 0x3ffe0440 len 15072 free 15024 allocated 0 min_free 15024
    largest_free_block 15024 alloc_blocks 0 free_blocks 1 total_blocks 1
  At 0x3ffe4350 len 113840 free 113792 allocated 0 min_free 113792
    largest_free_block 113792 alloc_blocks 0 free_blocks 1 total_blocks 1
  Totals:
    free 161396 allocated 150864 min_free 159988 largest_free_block 113792

Instead of almost entirely consuming the block with 122,968 bytes leaving just 56 bytes, the two frame buffers are now distributed among smaller blocks leaving 28,329 contiguous bytes free in that big block. And we still have anther big block free with 113,792 bytes to accommodate large allocations.

Looking at this data, I could also see allocating in smaller chunks would have led to diminishing returns. Allocating in 2KB chunks would have doubled the overhead but not improved utilization. Dropping to 1KB would double the overhead again, and only open up one additional block of memory for use. Therefore allocating in 4KB chunks is indeed the best compromise, assuming my ESP32 memory map is representative of user scenarios. Satisfied with this arrangement, I proceeded to work on my first and last bug of version 1.0.0: PAL support.

[Code for this project is publicly available on GitHub]

2021-05-202021-05-21 Roger Cheng

Lightweight Performance Metrics Have Caveats

Before I implemented double-buffering for my ESP_8_BIT_composite Arduino library, the only way we know we’re overloaded is when we start seeing visual artifacts on screen. After I implemented double-buffering, when we’re overloaded we’ll see the same data shown for two or more frames because the back buffer wasn’t ready to be swapped. A binary good/no-good feedback is better than nothing but it would be frustrating to work with and I knew I could do better. I wanted to collect some performance metrics a developer can use to know how close they’re running to the edge before going over.

This is another feature I had originally planned as some type of configurable data. My predecessor ESP_8_BIT handled it as a compile-time flag. But just as I decided to make double-buffering run all the time in the interest of keeping the Arduino API easy to use, I’ve decided to collect performance metrics all the time. The compromise is that I only do so for users of the Adafruit GFX API, who have already chosen ease of use over maximum raw performance. The people who use the raw frame buffer API will not take the performance hit, and if they want performance metrics they can copy what I’ve done and tailor it to their application.

The key counter underlying my performance measurement code goes directly down to a feature of the Tensilica CPU. CCount, which I assume to mean cycle count, is incremented at every clock cycle. When the CPU is running at full speed of 240MHz, it increments by 240 million within each second. This is great, but the fact it is a 32-bit unsigned integer limits its scope, because that means the count will overflow every 2³² / 240,000,000 = 17.895 seconds.

I started thinking of ways to keep a 64-bit performance counter in sync with the raw CCount, but in the interest of keeping things simple I abandoned that idea. I will track data through each of these ~18 second periods and, as soon as CCount overflows, I’ll throw it all out and start a new session. This will result in some loss of performance data but it eliminates a ton of bookkeeping overhead. Every time I notice an overflow, statistics from the session is output to logging INFO level. The user can also query the running percentage of the session at any time, or explicitly terminate a session and start a new one for the purpose of isolating different code.

The percentage reported is the ratio of of clock cycles spent in waitForFrame() relative to the amount of time between calls. If the drawing loop does no work, like this:

void loop() {
  videoOut.waitForFrame();
}

Then 100% of the time is spent waiting. This is unrealistic because it’s not useful. For realistic drawing loops that does more work, the percentage will be lower. This number tells us roughly how much margin we have to spare to take on more work. However, “35% wait time” does not mean 35% CPU free, because other work happens while we wait. For example, the composite video signal generation ISR is constantly running, whether we are drawing or waiting. Actual free CPU time will be somewhere lower than this reported wait percentage.

The way this percentage is reported may be unexpected, as it is an integer in the range from 0 to 10000 where each unit is a percent or a percent. The reason I did this is because the floating-point unit on an ESP32 imposes its own overhead that I wanted to avoid in my library code. If the user wants to divide by 100 for a human-friendly percentage value, that is their choice to accept the floating-point performance overhead. I just didn’t want to force it on every user of my library.

Lastly, the session statistics include frames rendered & missed, and there is an overflow concern for those values as well. The statistics will be nonsensical in the ~18 second session window where either of them overflow, though they’ll recover by the following session. Since these are unsigned 32-bit values (uint32_t) they will overflow at 2³² frames. At 60 frames per second, that’s a loss of ~18 seconds of data once every 2.3 years. I decided not to worry about it and turn my attention to memory consumption instead.

[Code for this project is publicly available on GitHub]

2021-05-192021-05-20 Roger Cheng

Double Buffering Coordinated via TaskNotify

Eliminating work done for pixels that will never been seen is always a good change for efficiency. Next item on the to-do list is to work on pixels that will be seen… but we don’t want to see them until they’re ready. Version 1.0.0 of ESP_8_BIT_composite color video out library used only a single buffer, where code is drawing to the buffer at the same time the video signal generation code is reading from the buffer. When those two separate pieces of code overlap, we get visual artifacts on screen ranging from incomplete shapes to annoying flickers.

The classic solution to this is double-buffering, which the precedent ESP_8_BIT did not do. I hypothesize there were two reasons for this: #1 emulator memory requirements did not leave enough for a second buffer and #2 emulators sent its display data in horizontal line order, managing to ‘race ahead” of the video scan line and avoid artifacts. But now both of those are gone. #1 no longer applies because emulators had been cut out, freeing memory. And we lost #2 because Adafruit GFX is concentrated around vertical lines so it is orthogonal to scan line and no longer able to “race ahead” of it resulting in visual artifacts. Thus we need two buffers. A back buffer for the Adafruit GFX code to draw on, and a front buffer for the video signal generation code to read from. At the end of each NTSC frame, I have an opportunity to swap the buffers. Doing it at that point ensures we’ll never try to show a partially drawn frame.

I had originally planned to make double-buffering an optional configurable feature. But once I saw how much of an improvement this was, I decided everyone will get it all of the time. In the spirit of Arduino library style guide recommendations, I’m keeping the recommended code path easy to use. For simple Arduino apps the memory pressure would not be a problem on an ESP32. If someone wants to return to single buffer for memory needs, or maybe even as a deliberate artistic decision to have flickers, they can take my code and create their own variant.

Knowing when to swap the buffer was easy, video_isr() had a conveniently commented section // frame is done. At that point I can swap the front and back buffers if the back buffer is flagged as ready to go. My problem was that I didn’t know how to signal the drawing code they have a new back buffer and they can start drawing the next frame. The existing video_sync() (which I use for my waitForFrame() API) forecasts the amount of time to render a frame and uses vTaskDelay() which I am somewhat suspicious of. FreeRTOS documentation has the disclaimer that vTaskDelay() has no guarantee that it will resume at the specified time. The synchronization was thus inferred rather than explicit, and I wanted something that ties the two pieces of code more concretely together. My research eventually led to vTaskNotifyGiveFromISR() I can use in video_isr() to signal its counterpart ulTaskNotifyTake() which I will use for a replacement implementation of video_sync(). I anticipate this will prove to be a more reliable way for the application code to know they can start working on the next frame. But how much time do they have to spare between frames? That’s the next project: some performance metrics.

[Code for this project is publicly available on GitHub]

2021-05-182021-05-19 Roger Cheng

The Fastest Pixels Are Those We Never Draw

It’s always good to have someone else look over your work, they find things you miss. When Emily Velasco started writing code to run on my ESP_8_BIT_composite library, her experiment quickly ran into flickering problems with large circles. But that’s not as embarrassing as another problem, which triggered ESP32 Core Panic system reset.

When I started implementing a drawing API, I specified X and Y coordinates as unsigned integers. With a frame buffer 256 pixels wide and 240 pixels tall, it was a great fit for 8-bit unsigned integers. For input verification, I added a check to make sure Y did not exceed 240 and left X along as it would be a valid value by definition.

When I put Adafruit’s GFX library on top of this code, I had to implement a function with the same signature as Adafruit used. The X and Y coordinates are now 16-bit numbers, so I added a check to make sure X isn’t too large either. But these aren’t just 16-bit numbers, they are int16_t signed integers. Meaning coordinate values can be negative, and I forgot to check that. Negative coordinate values would step outside the frame buffer memory, triggering an access violation, hence the ESP32 Core Panic and system reset.

I was surprised to learn Adafruit GFX default implementation did not have any code to enforce screen coordinate limits. Or if they did, it certainly didn’t kick in before my drawPixel() override saw them. My first instinct is to clamp X and Y coordinate values within the valid range. If X is too large, I treat it as 255. If it is negative, I treat it as zero. Y is also clamped between 0 and 239 inclusive. In my overrides of drawFastHLine and drawFastVLine, I also wrote code to gracefully handle situations when their width or heights are negative, swapping coordinates around so they remain valid commands. I also used the X and Y clamping functions here to handle lines that were partially on screen.

This code to try to gracefully handle a wide combination of inputs added complexity. Which added bugs, one of which Emily found: a circle that is on the left or right edge of the screen would see its off-screen portion wrap around to the opposite edge of the screen. This bug in X coordinate clamping wasn’t too hard to chase down, but I decided the fact it even exists is silly. This is version 1.0, I can dictate the behavior I support or not support. So in the interest of keeping my code fast and lightweight, I ripped out all of that “plays nice” code.

A height or a width is negative? Forget graceful swapping, I’m just not going to draw. Something is completely off screen? Forget clamping to screen limits, stuff off-screen are just not going to get drawn. Lines that are partially on screen still need to be gracefully handled via clamping, but I discarded all of the rest. Simpler code leaves fewer places for bugs to hide. It is also far faster, because the fastest pixels are those that we never draw. These optimizations complete the easiest updates to make on individual buffers, the next improvement comes from using two buffers.

[Code for this project is publicly available on GitHub]

2021-05-172021-05-18 Roger Cheng

Overriding Adafruit GFX HLine/VLine Defaults for Performance

I had a lot of fun building a color picker for 256 colors available in the RGB332 color space, gratuitous swoopy 3D animation and all. But at the end of the day it is a tool in service of the ESP_8_BIT_composite video out library. Which has its own to-do list, and I should get to work.

The most obvious work item is to override some Adafruit GFX default implementations, starting with the ones explicitly recommended in comments. I’ve already overridden fillScreen() for blanking the screen on every frame, but there are more. The biggest potential gain is the degenerate horizontal-line drawing method drawFastHLine() because it is a great fit for ESP_8_BIT, whose frame buffer is organized as a list of horizontal lines. This means drawing a horizontal line is a single memset() which I expect to be extremely fast. In contrast, vertical lines via drawFastVLine() would still involve a loop iterating over the list of horizontal lines and won’t be as fast. However, overriding it should still gain benefit by avoiding repetitious work like validating shared parameters.

Given those facts, it is unfortunate Adafruit GFX default implementations tend to use VLine instead of the HLine that would be faster in my case. Some defaults implementations like fillRect() were easy to switch to HLine, but others like fillCircle() is more challenging. I stared at that code for a while, grumpy at lack of comments explaining what it is doing. I don’t think I understand it enough to switch to HLine so I aborted that effort.

Since VLine isn’t ESP_8_BIT_composite’s strong suit, these default implementations using VLine did not improve as much as I had hoped. Small circles drawn with fillCircle() are fine, but as the number of circles increase and/or their radius increase, we start seeing flickering artifacts on screen. It is actually a direct reflection of the algorithm, which draws the center vertical line and fills out to either side. When there is too much to work to fill a circle before the video scanlines start, we can see the failure in the form of flickering triangles on screen, caused by those two algorithms tripping over each other on the same frame buffer. Adding double buffering is on the to-do list, but before I tackle that project, I wanted to take care of another optimization: clipping off-screen renders.

[Code for this project is publicly available on GitHub]

2021-05-162021-05-28 Roger Cheng

Notes Of A Three.js Beginner: QuaternionKeyframeTrack Struggles

When I started researching how to programmatically animate object rotations in three.js, I was warned that quaternions are hard to work with and can easily bamboozle beginners. I gave it a shot anyway, and I can confirm caution is indeed warranted. Aside from my confusion between Euler angles and quaternions, I was also ignorant of how three.js keyframe track objects process their data arrays. Constructor of keyframe track objects like QuaternionKeyframeTrack accept (as their third parameter) an array of key values. I thought it would obviously be an array of quaternions like [quaterion1, quaternion2], but when I did that, my CPU utilization shot to 100% and the browser stopped responding. Using the browser debugger, I saw it was stuck in this for() loop:

class QuaternionLinearInterpolant extends Interpolant {
  constructor(parameterPositions, sampleValues, sampleSize, resultBuffer) {
    super(parameterPositions, sampleValues, sampleSize, resultBuffer);
  }
  interpolate_(i1, t0, t, t1) {
    const result = this.resultBuffer, values = this.sampleValues, stride = this.valueSize, alpha = (t - t0) / (t1 - t0);
    let offset = i1 * stride;
    for (let end = offset + stride; offset !== end; offset += 4) {
      Quaternion.slerpFlat(result, 0, values, offset - stride, values, offset, alpha);
    }
    return result;
  }
}

I only have two quaterions in my key frame values, but it is stepping through in increments of 4. So this for() loop immediately shot past end and kept looping. The fact it was stepping by four instead of by one was the key clue. This class doesn’t want an array of quaternions, it wants an array of quaternion numerical fields flattened out.

Wrong: [quaterion1, quaternion2]
Right: [quaterion1.x, quaterion1.y, quaterion1.z, quaterion1.w, quaternion2.x, quaternion2.y, quaternion2.z, quaternion2.w]

The latter can also be created via quaterion1.toArray().concat(quaternion2.toArray()).

Once I got past that hurdle, I had an animation on screen. But only half of the colors animated in the way I wanted. The other half of the colors went in the opposite direction while swerving wildly on screen. In a HSV cylinder, colors are rotated across the full range of 360 degrees. When I told them to all go to zero in the transition to a cube, the angles greater than 180 went one direction and the angles less than 180 went the opposite direction.

Having this understanding of the behavior, however, wasn’t much help in trying to get things working the way I had it in my head. I’m sure there are some amateur hour mistakes causing me grief but after several hours of ever-crazier animations, I surrendered and settled for hideous hacks. Half of the colors still behaved differently from the other half, but at least they don’t fly wildly across the screen. It is unsatisfactory but will have to suffice for now. I obviously don’t understand quaternions and need to study up before I can make this thing work the way I intended. But that’s for later, because this was originally supposed to be a side quest to the main objective: the Arduino color composite video out library I’ve released with known problems I should fix.

My interactive RGB332 8-bit color picker can now be toggled between HSV cylinder and RGB cube modes. Half of the cubes (green-yellow) aren't doing what I want because I clearly don't understand quaternions. Still a nifty animation, though. pic.twitter.com/jtRMMVNboP
— Roger Cheng (@Regorlas) April 30, 2021

[Code for this project is publicly available on GitHub]

2021-05-152021-05-16 Roger Cheng

Notes Of A Three.js Beginner: Euler Angles vs. Quaternions

I was pretty happy with how quickly I was able to get a static 3D visualization on screen with the three.js library. My first project to turn the static display into an interactive color picker also went smoothly, giving me a great deal of self confidence for proceeding to the next challenge: adding an animation. And this was where three.js put me in my place reminding me I’m still only a beginner in both 3D graphics and JavaScript.

Before we get to details on how I fell flat on my face, to be fair three.js animation system is optimized for controlling animations created using content creation tools such as Blender. In this respect, it is much like Unity 3D. In both of these tools, programmatically generated animations are not the priority. In fact there weren’t any examples for me to follow in the manual. I hunted around online and found DISCOVER three.js, which proclaimed itself as “The Missing Manual for three.js”. The final chapter (so far) of this online book talks about animations. This chapter had an ominous note on animation rotations:

As we mentioned back in the chapter on transformations, quaternions are a bit harder to work with than Euler angles, so, to avoid becoming bamboozled, we’ll ignore rotations and focus on position and scale for now.

This is worrisome, because my goal is to animate the 256 colors between two color model layouts. From the current layout of a HSV cylinder, to a RGB cube. This required dealing with rotations and just as the warning predicted that’s what kicked my butt.

The first source confusion is between Euler angles and quaternions when dealing with three.js 3D object properties. Object3D.rotation is an object representing Euler angles, so trying to use QuaternionKeyframeTrack to animate object rotation resulted in a lot of runtime errors because the data types didn’t match. This problem I blame on JavaScript in general and not three.js specifically. In a strongly typed language like C there would be an error indicating I’ve confused my types. In JavaScript I only see errors at runtime, in this case one of these two:

When the debug console complains “NaN error” it probably meant I’ve accidentally used Euler angles when quaternions are expected. Both of those data types have fields called x, y, and z. Quaterions have a fourth numeric field named w, while Euler angles have a string indicating order. Trying to use an Euler angle as quaternion would result in the order string trying to fit in w, which is not a number hence the NaN error.
When the debug console complains “THREE.Quaternion: .setFromEuler() encountered an unknown order:” it means I’ve done the reverse and accidentally used Quaternion when Euler angles are expected. This one is fortunately a bit more obvious: numeric value w is not a string and does not dictate an order.

Getting this sorted out was annoying, but this headache was nothing compared to my next problem: using QuaternionKeyframeTrack to animate object rotations.

[Code for this project is publicly available on GitHub]

2021-05-142021-05-16 Roger Cheng

Notes Of A Three.js Beginner: Color Picker with Raycaster

I was pleasantly surprised at how easy it was to use three.js to draw 256 cubes, each representing a different color from the 8-bit RGB332 palette available for use in my composite video out library. Arranged in a cylinder representing the HSV color model, it failed to give me special insight on how to flatten it into a two-dimension color chart. But even though I didn’t get what I had originally hoped for, I thought it looked quite good. So I decided to get deeper into three.js to make this more useful. Towards the end of three.js getting started guide is a list of Useful Links pointing to additional resources, and I thought the top link Three.js Fundamentals was as good of a place to start as any. It gave me enough knowledge to navigate the rest of three.js reference documentation.

After several hours of working with it, my impression is that three.js is a very powerful but not very beginner-friendly library. I think it’s reasonable for such a library to expect that developers already know some fundamentals of 3D graphics and JavaScript. From there it felt fairly straightforward to start using tools in the library. But, and this is a BIG BUT, there is a steep drop if we should go off the expected path. The library is focused on performance, and in exchange there’s less priority on fault tolerance, graceful recovery, or even helpful debugging messages for when things go wrong. There’s not much to prevent us from shooting ourselves in the foot and we’re on our own to figure out what went wrong.

The first exercise was to turn my pretty HSV cylinder into a color picker, making it an actually useful tool for choosing colors from the RGB332 color palette. I added pointer down + pointer up listeners and if they both occurred on the same cube, I change the background color to that color and display the two-digit hexadecimal code representing that color. Changing the background allows instant comparison to every other color in the cylinder. This functionality requires the three.js Raycaster class, and the documentation example translated across to my application without much fuss, giving me confidence to tackle the next project: add the ability to switch between HSV color cylinder and RGB color cube, where I promptly fell on my face.

[Code for this project is publicly available on GitHub]

2021-05-132021-05-15 Roger Cheng

HSV Color Wheel of 256 RGB332 Colors

I have a rectangular spread of all 256 colors of the 8-bit RGB332 color cube. This satisfies the technical requirement to present all the colors possible in my composite video out library, built by bolting the Adafruit GFX library on top of video signal generation code of rossumur’s ESP_8_BIT project for ESP32. But even though it satisfies the technical requirements, it is vaguely unsatisfying because making a slice for each of four blue channel values necessarily meant separating some similar colors from each other. While Emily went to Photoshop to play with creative arrangements, I went into code.

I thought I’d look into arranging these colors in the HSV color space, which I was first exposed to via Pixelblaze I used in my Glow Flow project. HSV is good for keeping similar colors together and is typically depicted as a wheel of colors with the angles around the circle corresponding to the H or hue axis. However, that still leaves two more dimensions of values: saturation and value. We still have the general problem of three variables but only two dimensions to represent them, but again I hoped the limited set of 256 colors could be put to advantage. I tried working through the layout on paper, then a spreadsheet, but eventually decided I need to see the HSV color space plotted out as a cylinder in three dimensional space.

I briefly considered putting something together in Unity3D, since I have a bit of familiarity with it via my Bouncy Bouncy Lights project. But I thought Unity would be too heavyweight and overkill for this project, specifically because I didn’t need a built-in physics engine for this project. Building a Unity 3D project takes a good chunk of time and imposes downtime breaking my trains of thought. Ideally I can try ideas and see them instantly by pressing F5 like a web page.

Which led me to three.js, a JavaScript library for 3D graphics in a browser. The Getting Started guide walked me through creating a scene with a single cube, and I got the fast F5 refresh that I wanted. In addition to rendering, I wanted a way to look around a HSV space. I found OrbitControls in the three.js examples library, letting us manipulate the camera view using a pointer device (mouse, touchpad, etc.) and that was enough for me to get down to business.

I wrote some JavaScript to convert each of the 256 RGB values into their HSV equivalents, and from there to a HSV coordinate in three dimensions. When the color cylinder popped up on screen, I was quite disappointed to see no obvious path to flatten that to two dimensions. But even though it didn’t give me the flash of insight I sought, the layout is still interesting. I see a pattern, but it is not consistent across the whole cylinder. There’s something going on but I couldn’t immediately articulate what it is.

Independent of those curiosities, I decided the cylinder looks cool all on its own, so I’ll keep working with it to make it useful.

HSV color cylinder with just 8-bit RGB332 colors. Original goal was to see if this can be flattened into a usable color chart in 2D. "Just 256, how hard can it be?" pic.twitter.com/lsTQJTO6x8
— Roger Cheng (@Regorlas) April 28, 2021

[Code for this project is publicly available on GitHub]

2021-05-122021-05-13 Roger Cheng

Brainstorming Ways to Showcase RGB332 Palette

It’s always good to have another set of eyes to review your work, a lesson reinforced when Emily pointed out I had forgotten not everyone would know what 8-bit RGB332 color is. This is a rather critical part of using my color composite video out Arduino library, assembled with code from rossumur’s ESP_8_BIT project and Adafruit’s GFX library. I started with the easy part of amending my library README to talk about RGB332 color and list color codes for a few common colors to get people started, but I want to give them access to the rest of the color palette as well.

Which led to the next challenge: What’s the best way to present this color palette? There is a fairly fundamental part of this challenge: there are three dimensions to a RGB color value: Red, Green, and Blue. But a chart on a web page only has two dimensions. Dictating that diagrams illustrating color spaces can only show a two-dimensional slice out of a three dimension volume.

For this challenge, being limited to just 256 colors became an advantage. It’s a small enough number that I could show all 256 colors by putting a few such slices side by side. The easiest one is to slice among the blue axis, since it only had two bits in RGB332 so there are only four slices of blue. Each slice of blue shows all combinations of the red and green channels. They had three bits each for 2³ = 8 values, and combining them means 8 * 8 = 64 colors in each slice of blue. The header image for this post arranges the four blue slices side-by-side. This is a variant of my Arduino library example RGB332_pulseB which changes the blue channel over time instead of laying them out side by side.

But even though this was a complete representation of the palette, Emily and I were unsatisfied with this layout. Too many similar colors were separated by this layout. Intuitively it feels like there should be a potential arrangement for RGB332 colors that would allow similar colors to always be near each other. It wouldn’t apply in the general case, but we only have 256 colors to worry about, and that might work to our advantage again. Emily dived into Photoshop because she’s familiar with that tool, and designed some very creative approaches to the idea. I’m not as good with Photoshop, so I dived into what I’m familiar with: writing code.

Me: This 8-bit color chart sucks. I can make one that makes more sense.

Also me: pic.twitter.com/i7uYBIbPya
— Emily Velasco (@MLE_Online) April 27, 2021

2021-05-112021-05-12 Roger Cheng

My RGB332 Color Code Oversight

I felt a certain level of responsibility after my library was accepted to the Arduino library manager, and wrote down all the things I knew I could work on under GitHub issues for the project. I also filled out the README file with usage basics, and I had felt pretty confident I have enabled people to generate color composite video out signals from their ESP32 projects. This confidence was short-lived. My first ~~guinea pig~~ volunteer for a test drive was Emily Velasco, who I credit for instigating this quest. After she downloaded the library and ran my examples, she looked at the source code and asked a perfectly reasonable question: “Roger, what are these color codes?”

Oh. CRAP.

Having many years of experience playing in computers graphics, I was very comfortable with various color models for specifying digital image data. When I read niche jargon like “RGB332”, I immediately know it meant an 8-bit color value with most significant three bits for red channel, three bits for green, and least significant two bits for blue. I was so comfortable, in fact, that it never occurred to me that not everybody would know this. And so I forgot to say anything about it in my library documentation.

I thanked Emily for calling out my blind spot and frantically got to work. The first and most immediate task was to update the README file, starting with a link to the Wikipedia section about RGB332 color. I then followed up with a few example values, covering all the primary and secondary colors. This resulted in a list of eight colors which can also be minimally specified with just 3 bits, one for each color channel. (RGB111, if you will.)

I thought about adding some of these RGB332 values to my library as #define constants that people can use, but I didn’t know how to name and/or organize them. I don’t want to name them something completely common like #define RED because that has a high risk of colliding with a completely unrelated RED in another library. Technically speaking, the correct software architecture solution to this problem is C++ namespace. But I see no mention of namespaces in the Arduino API Style Guide and I don’t believe it is considered a simple beginner-friendly construct. Unable to decide, I chickened out and did nothing in my Arduino library source code. But that doesn’t necessarily mean we need to leave people up a creek, so Emily and I set out to build some paddles for this library.

2021-05-102021-05-11 Roger Cheng

Initial Issues With ESP_8_BIT Color Composite Video Out Library

I honestly didn’t expect my little project to be accepted into the Arduino Library Manager index on my first try, but it was. Now that it is part of the ecosystem, I feel obligated to record my mental to-do list in a format that others can reference. This lets people know that I’m aware of these shortcomings and see the direction I’ve planned to take. And if I’m lucky, maybe someone will tackle them before I do and give me a pull request. But I can’t realistically expect that, so putting them down on record would at least give me something to point to. “Yes, it’s on the to-do list.” So I wrote down the known problems in the issues section of the project.

First and foremost problem is that I don’t know if PAL code still works. I intended to preserve all the PAL functionality when I extracted the ESP_8_BIT code, but I don’t know if I successfully preserved it all. I only have a NTSC TV so I couldn’t check. And even if someone tells me PAL is broken, I wouldn’t be able to do anything about it. I’m not dedicated enough to go out and buy a PAL TV just for testing. [bootrino] helpfully tells me there are TV that understand both standards, which I didn’t know. I’m not dedicated enough to go out and get one of those TV for the task, but at least I know to keep an eye open for such things. This one really is waiting for someone to test and, if there are problems, submit a pull request.

The other problems I know I can handle. In fact, I had a draft of the next item: give the option to use caller-allocated frame buffer instead of always allocating our own. I had this in the code at one point, but it was poorly tested and I didn’t use it in any of the example sketches. The Arduino API Style Guide suggests trimming such extraneous options in the interest of keeping the API surface area simple, so I did that for version 1.0.0. I can revisit it if demand comes back in the future.

One thing I left behind in ESP_8_BIT and want to revive is a performance metric of some sort. For smooth display the developer must perform all drawing work between frames. The waitForFrame() API exists so drawing can start as soon as one frame ends, but right now there’s no way to know how much room was left before next frame begins. This will be useful as people start to probe the limits.

After performance metrics are online, that data can be used to inform the next phase: performance optimizations. The only performance override I’ve done over the default Adafruit GFX library was fillScreen() since all the examples call that immediately after waitForFrame() to clear the buffer. There are many more candidates to override, but we won’t know how much benefit they give unless we have performance metrics online.

The final item on this initial list of issues is support for double- or triple-buffering. I don’t know if I’ll ever get to it, but I wrote it down because it’s such a common thing to want in a graphics stack. This is a rather advanced usage and it consumes a lot of memory. At 61KB per buffer, the ESP32 can’t really afford many of them. At the very least this needs to come after the implementation of user-allocated buffers, because it’s going to be a game of Tetris to find enough memory in between developer code to create all these buffers and they know best how they want to structure their application.

I thought I had covered all the bases and was feeling pretty good about things… but I had a blind spot that Emily Velasco spotted immediately.

2021-05-092021-05-10 Roger Cheng

ESP_8_BIT Color Composite Video Out On Arduino Library Manager

I was really happy to have successfully combined two cool things: (1) the color composite video out code from rossumur’s ESP_8_BIT project, and (2) the Adafruit GFX graphics library for Arduino projects. As far as my research has found, this is a unique combination. Every other composite video reference I’ve found are either lower resolution and/or grayscale only. So this project could be an interesting complement to the venerable TVout library. Like all of my recent coding projects they’re publicly available on GitHub, but I thought it would have even better reach if I can package it as an Arduino library.

Creating an Arduino library was a challenge that’s been on my radar, but I never had anything I thought would be an unique contribution to the ecosystem. But now I do! I started with the tutorial for building an Arduino library with a morse code example. It set the foundation for me to understand the much more detailed Arduino library specification. Once I had the required files in place, my Arduino IDE recognized my code as an installed library on the system and I could create new sketches that pull in the library with a single #include.

One of my worries is the fact that this was a very hardware-specific library. It would run only on ESP32 running the Arduino Core and not any other hardware. Not the Teensy, not the SAMD, and definitely not the ATmega328. There are two layers to this protection: first, I add architectures=esp32 to my library.properties file. This will inform Arduino IDE to disable this library as “incompatible” when another architecture is selected. But I knew it was possible for someone to include the library and switch hardware target, and they would be mystified by the error messages that would follow. So the second layer of protection is this #ifdef I added that would cause a compiler error with a human-readable explanation.

#ifndef ARDUINO_ARCH_ESP32
#error This library requires ESP32 as it uses ESP32-specific hardware peripheral
#endif

I was pretty pleased by that library and set my eyes on the next level: What if this can be in the Arduino IDE Library Manager? This way people don’t have to find it on GitHub and download into their Arduino library directory, they can download directly from within the Arduino IDE. There is a documented procedure for submission to the Library Manager, but before I submitted, I made the changes to ensure my library conforms to the Arduino API style guide. Once everything looks like they’re lined up, I submitted my request to see what happens. I expected to receive feedback on problems I need to fix before my submission would be accepted, but it was accepted on the first try! This was a pleasant surprise, but I’ll be the first to admit there is still more work to be done.

2021-05-082021-05-09 Roger Cheng

Adapting Adafruit GFX Library to ESP_8_BIT Composite Video Output

I looked at a few candidate graphics libraries to make working with ESP_8_BIT video output frame buffer easier. LVGL offered a huge feature set for building user interfaces, but I don’t that is a good match for my goal. I could always go back to that later if I felt it would be worthwhile. FabGL was excluded because I couldn’t find documentation on how to adapt it to the code I have on hand, but it might be worth another look if I wanted to use its VGA output ability.

Examining those options made me more confident in my initial choice of Adafruit GFX Library. I think it is the best fit for what I want right now: Something easy to use by Arduino developers, with a gentle on-ramp thanks to the always-excellent documentation Adafruit posts for their stuff including the GFX Library. It also means there are a lot of existing code floating around out there for people to use and play with the ESP_8_BIT video generation code.

I started modifying my ESP_8_BIT wrapper class directly, removing the frame buffer interface, but then I changed my mind. I decided to leave the frame buffer option in place for people who are not afraid of byte manipulation. Instead, I created another class ESP_8_BIT_GFX that derives from Adafruit_GFX. This new class will be the developer-friendly class wrapper, and it will internally hold an instance of my frame buffer wrapper class.

When I started looking at what it would take to adapt Adafruit_GFX, I was surprised to see the list is super short. The most fundamental requirement is that I must implement a single pure virtual method: drawPixel(). I was up and running after calling the base constructor with width (256) and height (240) and implementing a single method. The rest of Adafruit_GFX base class has fallback implementations of every API that eventually boils down to a series of calls into drawPixel().

Everything beyond drawPixel() are icing on the cake, giving us plenty of options for performance improvements. I started small by overriding just the fillScreen() class, because I intend to use that to erase the screen between every frame and I wanted that to be fast. Due to how ESP_8_BIT organized the frame buffer into an array of pointers into horizontal lines, I can see drawFastHLine() as the next most promising thing to override. But I’ll resist that temptation for now, I need to make sure I can walk before I run.

[Code for this project is publicly available on GitHub]

2021-05-072021-05-08 Roger Cheng

Window Shopping: LVGL

I have the color composite video generation code of ESP_8_BIT repackaged into an Arduino display library, but right now the only interface is a pointer to raw frame buffer memory and that’s not going to cut it on developer-friendliness grounds. At the minimum I want to match the existing Arduino TVout library precedent, and I think Adafruit’s GFX Library is my best call, but I wanted to look around at a few other options before I commit.

I took a quick look at FabGL and decided it would not serve for my purpose, because it seemed to lack the provision for using an external output device like my code. The next candidate is LVGL, the self-proclaimed Light and Versatile Graphics Library. It is designed to run on embedded platforms therefore I was confident I could port it to the ESP32. And that was even before I found out there was an existing ESP32 port of LVGL so that’s pretty concrete proof right there.

Researching how I might adapt that to ESP_8_BIT code, I poked around the LVGL documentation and I was pleased it was organized well enough for me to quickly find what I was looking for: Display Interface section in the LVGL Porting Guide. We are already well ahead of where I was with FabGL. The documentation suggested allocating at least one working buffer for LVGL with a size at least 1/10th that of the frame buffer. The porting developer is then expected to register a flush callback to copy data from that LVGL working buffer to the actual frame buffer. I understand LVGL adopted this pattern because it needs RAM on board the microcontroller core to do its work. And since the frame buffer is usually attached to the display device, off the microcontroller memory bus, this pattern makes sense. I had hoped LVGL would be willing to work directly with the ESP_8_BIT buffer, but it doesn’t seem to be quite that friendly. Still, I can see a path to putting LVGL on top of my ESP_8_BIT code.

As a side bonus, I found an utility that could be useful even if I don’t use LVGL. The web site offers an online image converter to translate images to various formats. One format is byte arrays in multiple pixel formats, downloadable as C source code file. One of the pixel formats is the same 8-bit RGB332 color format used by ESP_8_BIT. I could use that utility to convert images and cut out the RGB332 section for pasting into my own source code. This converter is more elegant than that crude JavaScript script I wrote for my earlier cat picture test.

LVGL offers a lot of functionality to craft user interfaces for embedded devices, with a sizable library of control elements beyond the usual set of buttons and lists. If sophisticated UI were my target, LVGL would be an excellent choice. But I don’t really expect people to be building serious UI for display on an old TV via composite video, I expect people using my library to create projects that exploit the novelty of a CRT in this flat panel age. Simple drawing primitives like draw line and fill rectangle is available as part of LVGL, but they are not the focus. In fact the Drawing section of the documentation opens by telling people they don’t need to draw anything. I think the target audience of LVGL is not a good match for my intended audience.

Having taken these quick looks, I believe I will come back to FabGL if I wanted to build an ESP32 device that outputs to VGA. If I wanted to build an embedded brain for a device with a modern-looking user interface, LVGL will be something I re-evaluate seriously. However, when my goal is to put together something that will be quick and easy to play with throwing something colorful on screen over composite video, neither are a compelling choice over Adafruit GFX Library.

New Screwdriver

My Project Diary of Coding, Making, and Tinkering

NTSC

Screen Rotation Support for ESP_8_BIT_Composite Arduino Library

Cat and Galactic Squid

Finding X-Offset Bug in AnimatedGIF Example

Animated GIF Decoder Library Exposed Problem With Double Buffering

TIL Some Video Equipment Support Both PAL and NTSC

Allocating Frame Buffer Memory 4KB At A Time

Lightweight Performance Metrics Have Caveats

Double Buffering Coordinated via TaskNotify

The Fastest Pixels Are Those We Never Draw

Overriding Adafruit GFX HLine/VLine Defaults for Performance

Notes Of A Three.js Beginner: QuaternionKeyframeTrack Struggles

Notes Of A Three.js Beginner: Euler Angles vs. Quaternions

Notes Of A Three.js Beginner: Color Picker with Raycaster

HSV Color Wheel of 256 RGB332 Colors

Brainstorming Ways to Showcase RGB332 Palette

My RGB332 Color Code Oversight

Initial Issues With ESP_8_BIT Color Composite Video Out Library

ESP_8_BIT Color Composite Video Out On Arduino Library Manager

Adapting Adafruit GFX Library to ESP_8_BIT Composite Video Output

Window Shopping: LVGL