Getting Real – Using Blind Send SPI to Turbocharge the Adafruit DotStar Libraryspi,

Last time, we experimented with spiritual blind-sending as a way to theoretically speed up SPI on AVR. While there were lots of fancy oscilloscope traces and impressive demo code, there is nothing like an actual, real, practical application to get people excited. Read on to see how much faster we can make the already highly optimized AdaFruit DotStar library with a little blind-sending action… (spoiler alert – the answer is lots more faster!)

DotStars are Adafruit’s branded line of APA102 LED strips. They are a lot like Neopixel strips, except they are much faster and so don’t have the same flicker and jitter problems. If fast is what makes Dotstars good, then making them even faster should make them even better!

Punchline

If you just want your Dotstars to refresh 18-23% faster and don’t care about the magic that makes that happen, you can use this drop-in fork of the Adafruit library…

https://github.com/bigjosh/Adafruit_DotStar

To get the juicy speed improvements, you must…

  1. use a chip with dedicated SPI hardware. This includes the ATMEGA328 chip on the Arduino UNO. This does not include the ATTINY85 on the Trinket. The library will still work with non-SPI-enabled chips, you just wont get the extra speed.
  2. connect your Dotstar strip to the dedicated SPI pins. On the Arduino UNO, this means that the Dotstar clock line goes to digital pin #13 and the data line goes to digital pin #11.
  3. use the hardware Adafruit_DotStar(NUMPIXELS, DOTSTAR_BRG) constructor. This is the one that does not include data and clock pin arguments.

Note that with the new library, the global brightness setting is free! The code runs the same speed with brightness controlled as it does without it.

Benchmarks

Blindsend chart

To test, I used the strandtest demo program included in the library with a 60 pixel long string. Times listed are how long it took to execute a single refresh of the whole string.

SPI Method Without Brightness With Brightness
Soft (bitbang) 5,690us 5,730us
Standard 573us 625us
Pipelined 341us 364us
Blind Send 279us 279us

Cases:

Soft
The code manually toggles the the clock and data pins to shift out the bitstream.
Standard
This code uses the datasheet SPI sending code that waits for each byte to complete before computing and sending the next one.
Pipelined
This code is smart enough to start computing the next byte to be transmitted while the current byte is still being shifted out by the SPI hardware. Once the new byte is computed, it polls the SPI hardware to determine when it is ready to accept the next byte.
Blind Send
This mysterious and edgy code counts every cycle used while computing bytes to ensure that it proffers a new byte at exactly the clock tick when the hardware is finished sending the previous one.

Code Size

The Blind Send code is not only faster, it is 20 bytes smaller too! Cake time!

Changes

You can see the code changes in the new version here…

https://github.com/bigjosh/Adafruit_DotStar/commit/52f573a9681909261029f149af785c539756ec69

Everything is local to the USE_HW_SPI section of the show() function.

FAQ

Q: Could running things so much faster lead to signal transmission problems, like if my cables are not so great?

A: Probably not since the speed improvements here come completely from reducing the idle time between bytes rather than changing the speed of bits inside each byte.

Q: Why not save some space and use _delay_loop_2() for your delays?

A: The docs for these functions are just too squishy. “The loop executes three CPU cycles per iteration, not including the overhead the compiler needs to setup the counter register.” How many cycles is that? Why don’t you want to tell me so I know how long it will take? I know you can just read the code, but it is easier and safer to write my code than to read someone else’s code. Plus, don’t you like my handy multiple entry point subroutine trick for having multiple delays possible from a single call?

Q: Wouldn’t completely disabling the global brightness setting speed things up even more?

A: No. With the current code, the hardware SPI clock is the limiting factor for maximum speed. We have about 16 instructions between each SPI byte to do with what we please, and it turns out that is plenty of time to do the brightness transformation, so it is effectively free.

Q: Can we make it faster?

A: I do not think you could squeeze even 1 clock cycle of extra performance out of this SPI sending code. That said, there are probably 10-20 cycles wasted in avoidable preambles and compares in the show() function. If you care enough, you could pull the blind send code out of the show() function and inline it into your code that was calling show().

Q: I want EVEN faster!

A: Well, you could overclock your Arduino with a faster crystal and some liquid nitrogen. Or just get a Raspberry Pi or Beagle Bone since these can SPI much, much faster than our humble Arduino.

Q: What’s the point? Wasn’t it fast enough before?

A: If you have to ask… If, however, you are doing hardcore light painting with temporally dithered colors, the extra performance could make a big difference, especially considering that Dotstar pixels update asynchronously so the longer the delay before the 1st and last pixel, the more visible tearing will be. Keep in mind that all this extra speed is completely free, so why not use it? (It is actually cheaper than free, because it uses less memory too!)

Q: You expect me to believe that your massive ~100 lines of ASM takes up 20 bytes less flash than the terse ~10 lines of C it replaces?

A: Dissemble and find out for yourself! (Or just compile the old version and then the new version and compare the “bytes used” message).

Q: I still see random glitching and tearing on my strips!

A: I bet your string refresh is getting interrupted by an interrupt. Try adding a cli() before and an sei() after your call to show().

5 comments

  1. David Grayson

    The APA102 has a 5-bit brightness setting you can send in the first byte which allows for dimmer colors than would otherwise be possible. Is there any particular reason that your library and Adafruit’s library do not expose that as a feature to the user?

    • bigjosh2

      The master 5-bit brightness setting on these pixels uses a different, much slower PWM generator than the one for the 8-bit RGB color brightnesses. It is so slow that it pretty much ruins the advantage of using the APA102. It is much better adjust the brightness of the RGB values before you send them to the strip to preserve the high PWM rate. It would have been better if they had omitted those bits so we could have slightly faster refresh rates. It would have been *much* better if they had used those extra bits to give us slightly more dynamic range on the RGB values!

  2. Pingback: Turbocharge the Adafruit DotStar Library Using Blind Send (20% Faster!) « Adafruit Industries – Makers, hackers, artists, designers and engineers!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s