Last time, we experimented with spiritual blind-sending as a way to theoretically speed up SPI on AVR. While there were lots of fancy oscilloscope traces and impressive demo code, there is nothing like an actual, real, practical application to get people excited. Read on to see how much faster we can make the already highly optimized AdaFruit DotStar library with a little blind-sending action… (spoiler alert – the answer is lots more faster!)
DotStars are Adafruit’s branded line of APA102 LED strips. They are a lot like Neopixel strips, except they are much faster and so don’t have the same flicker and jitter problems. If fast is what makes Dotstars good, then making them even faster should make them even better!
If you just want your Dotstars to refresh 18-23% faster and don’t care about the magic that makes that happen, you can use this drop-in fork of the Adafruit library…
To get the juicy speed improvements, you must…
- use a chip with dedicated SPI hardware. This includes the ATMEGA328 chip on the Arduino UNO. This does not include the ATTINY85 on the Trinket. The library will still work with non-SPI-enabled chips, you just wont get the extra speed.
- connect your Dotstar strip to the dedicated SPI pins. On the Arduino UNO, this means that the Dotstar
clockline goes to digital pin #13 and the
dataline goes to digital pin #11.
- use the hardware
Adafruit_DotStar(NUMPIXELS, DOTSTAR_BRG)constructor. This is the one that does not include data and clock pin arguments.
Note that with the new library, the global brightness setting is free! The code runs the same speed with brightness controlled as it does without it.
To test, I used the strandtest demo program included in the library with a 60 pixel long string. Times listed are how long it took to execute a single refresh of the whole string.
|SPI Method||Without Brightness||With Brightness|
- The code manually toggles the the clock and data pins to shift out the bitstream.
- This code uses the datasheet SPI sending code that waits for each byte to complete before computing and sending the next one.
- This code is smart enough to start computing the next byte to be transmitted while the current byte is still being shifted out by the SPI hardware. Once the new byte is computed, it polls the SPI hardware to determine when it is ready to accept the next byte.
- Blind Send
- This mysterious and edgy code counts every cycle used while computing bytes to ensure that it proffers a new byte at exactly the clock tick when the hardware is finished sending the previous one.
The Blind Send code is not only faster, it is 20 bytes smaller too! Cake time!
You can see the code changes in the new version here…
Everything is local to the
USE_HW_SPI section of the
Q: Could running things so much faster lead to signal transmission problems, like if my cables are not so great?
A: Probably not since the speed improvements here come completely from reducing the idle time between bytes rather than changing the speed of bits inside each byte.
Q: Why not save some space and use
_delay_loop_2() for your delays?
A: The docs for these functions are just too squishy. “The loop executes three CPU cycles per iteration, not including the overhead the compiler needs to setup the counter register.” How many cycles is that? Why don’t you want to tell me so I know how long it will take? I know you can just read the code, but it is easier and safer to write my code than to read someone else’s code. Plus, don’t you like my handy multiple entry point subroutine trick for having multiple delays possible from a single call?
Q: Wouldn’t completely disabling the global brightness setting speed things up even more?
A: No. With the current code, the hardware SPI clock is the limiting factor for maximum speed. We have about 16 instructions between each SPI byte to do with what we please, and it turns out that is plenty of time to do the brightness transformation, so it is effectively free.
Q: Can we make it faster?
A: I do not think you could squeeze even 1 clock cycle of extra performance out of this SPI sending code. That said, there are probably 10-20 cycles wasted in avoidable preambles and compares in the
show() function. If you care enough, you could pull the blind send code out of the
show() function and inline it into your code that was calling
Q: I want EVEN faster!
A: Well, you could overclock your Arduino with a faster crystal and some liquid nitrogen. Or just get a Raspberry Pi or Beagle Bone since these can SPI much, much faster than our humble Arduino.
Q: What’s the point? Wasn’t it fast enough before?
A: If you have to ask… If, however, you are doing hardcore light painting with temporally dithered colors, the extra performance could make a big difference, especially considering that Dotstar pixels update asynchronously so the longer the delay before the 1st and last pixel, the more visible tearing will be. Keep in mind that all this extra speed is completely free, so why not use it? (It is actually cheaper than free, because it uses less memory too!)
Q: You expect me to believe that your massive ~100 lines of ASM takes up 20 bytes less flash than the terse ~10 lines of C it replaces?
A: Dissemble and find out for yourself! (Or just compile the old version and then the new version and compare the “bytes used” message).
Q: I still see random glitching and tearing on my strips!
A: I bet your string refresh is getting interrupted by an interrupt. Try adding a
cli() before and an
sei() after your call to