NeoPixels Revealed: How to (not need to) generate precisely timed signals

There is an easier way to drive NeoPixels using code that…

  1. is simple to understand
  2. easy to change without breaking
  3. allows indefinitely long pixel strings
  4. addresses the root cause of signal reshaping glitches
  5. needs only a trivial amount of memory regardless of string length

Here is a demo of a 1,000+ pixel string being driven by a vintage Arduino DueMillinove at about 30 frames per second…

The program only uses about 5% of this Arduino’s 1K RAM. Since the amount of RAM used does not grow with the number of pixels driven, the limiting factor for this string length was the length of my apartment.

NeoPixels are not that hard

Much has been written about how picky NeoPixels are about timing. According to the excellent AdaFruit Uberguide, “the control signal has very strict timing requirements” and people have used many exotic and complex methods to meet these strict requirements- including cycle counting, PWM, SPI, and even UARTs with extra inverter hardware.

The standard way to drive NeoPixels with an Arduino is using the AdaFruit library which has some very fancy assembly code that was meticulously hand crafted to get every tick to land in precisely the right place. This time-tested library works great and if it suits your needs then you should by all means use it- but woe to he who would attempt to try to change even one line of it.

Unfortunately I could not use this library for my project anyway because it needs 3 bytes of RAM for each pixel (one byte for each R, G, and B value) in the string. It needs all that memory to get everything ready so that it can dump all the pixels as one giant, perfectly timed bit squirt. My display had 1,440 pixels, so no way this was going to work with on my humble  Arduino with 1K of RAM.

Luckily, it turns out that NeoPixels are not really that picky about timing once you get to know them.

Here is the canonical WS2812 timing diagram from the datasheet…   WS2812 Timing

 

Timing specs
Official WS2812 Timing specs

Let’s restate these as normalized values…

Datasheet timing constraints for driving a NeoPixel
Symbol Parameter Min Typical Max Units
T0H 0 code ,high voltage time 200 350 500 ns
T1H 1 code ,high voltage time 550 700 850 ns
T0L 0 code , low voltage time 650 800 950 ns
T1L 1 code ,low voltage time 450 600 750 ns
RES low voltage time 50,000 ns

 

This looks pretty constraining, but if we look closely we can see that things are not as bad as they seem. In fact, we’ll see that most of these apparent constraints are irrelevant.

If we instead think about how the different parts of the signal relate to each other, we can deduce that the important timing constraints for driving a single NeoPixel are…

  • There is a minimum width of a 0-bit pulse (T0H) to ensure it is detected
  • There is a maximum width of a 0-bit pulse (T0H) to ensure that it does not become a 1-bit
  • There is a minimum width of a 1-bit pulse (T1H) since it must be long enough to not be a 0-bit pulse.
  • There is a minimum time the level must stay low between consecutive bits (TxL), which ensures that the chip sees  separate bits rather than one long bit.
  • There is a maximum time the level can stay low (TxL) before a reset is triggered and any loaded data is latched and displayed on the LED.

That’s it! Let’s update our timing diagrams based on this new perspective…

Updated timing for a NeoPixel

Simplified timing constraints for a NeoPixel

 

Simplified timing constraints for driving a NeoPixel
Symbol Parameter Min Typical Max Units
T0H 0 code ,high voltage time 200 350 500 ns
T1H 1 code ,high voltage time 550 700 ns
TLD data, low voltage time 450 600 5,000 ns
TLL latch, low voltage time 6,000 ns

 

It is easy to comply with minimums because all we have to do is not be too fast. We can waste time in delays loops, or just do other things that take a while to make sure that we don’t arrive to the party too early.

Of the two maximums, the data low time (TLD) is very easy to comply with since 5,000ns is an eternity for a program.

So, we are left with just the single tight maximum of 500ns for a 0-bit pulse (T0H). This is 8 cycles on an 16MHz CPU. That is a pretty tight constraint, but we can do it especially since it is completely deterministic and we do not need to do any computation between the start and the end of the pulse. Heck, the AVR in an Arduino can toggle a bit from low to high and back again in 4 cycles if you are careful, so we have time.

The only tight timing parameter for NeoPixel signaling is the maximum width of a 0-bit pulse

Seriously – that is all you need to do to drive a NeoPixel!

You expect me to believe there is NO maximum width for a 1-bit (T1H)?

There is no bound for the maximum width of a 1-bit that a Neopixel will accept. I’ve tested 5-second long bits, but I am confident that you can send a full pixel with 2-week long 1-bits into a Neopixel and it will (eventually) display  it just fine. Test it yourself.

However, if you try to drive a string of Neopixels, group dynamics come into play and these jumbo sized 1-bits it will not work. To see why, check out…

NeoPixels Revealed: Going NSA on pixel-to-pixel conversations

The short answer is that the first NeoPixel will “reshape” the oversized 1-bit by shrinking it down to a normal-sized 1-bit. This leaves a low gap after the bit.  If this gap is longer than ~6,000ns, it will become a reset pulse (TLL) to the next pixel in the string. So, while there is no maximum width for T1H for a single NeoPixel, if we want to have a string of NeoPixels then we need to make sure that no chip in the chain sees an inadvertent reset pulse. This creates a dynamic maximum for T1H in a string of pixels of 5,000ns (maximum low time before causing a reset TLD)+550ns (the minimum width of the reshaped 1-bit T1H) = 5,550ns. Luckily, 5,500ns is still a very long time and so very easy to comply with.

Updated simplified timing constraints for NeoPixel strings
Symbol Parameter Min Typical Max Units
T0H 0 code ,high voltage time 200 350 500 ns
T1H 1 code ,high voltage time 550 700 5,500 ns
TLD data, low voltage time 450 600 5,000 ns
TLL latch, low voltage time 6,000 ns

But wait – what is this 6us minimum for a reset latch about? The datasheet says a reset takes 50us!

I know… and 44,000ns is a pretty big signal to loose!!

If you look at the timing diagram, you should see that the maximum width for TLD is really the same thing as the minimum for the TLL latching reset code. If the signal stays low for too long after one bit is finished and before the next bit starts, then the chip resets and latches. It turns out empirically that the maximum TLD width is about 5-6us despite what the datasheet says. Really. Leave the signal low for just 6us and the LED will latch.

Just to be sure I wasn’t missing something, I put a photo-detector in front of a NeoPixel to see exactly when and how it was *really* turning on.

Here is a typical shot of a pixel getting latched…

TxL time needed to latch new color

Channel Color Connection
1 Yellow TRESET trigger (~6us wide)
2 Blue DIN Pixel Data
3 Purple Actual LED output

The yellow trace is a trigger I put in so I could catch just the right moment. It signals the end of a full frame of pixel data. The actual data bits are in the middle trace. In this case, you are seeing the trailing end of a frame that turns the pixel from off to on. The top trace is the measured light output from the LED. You’ll see there is a pause in the data of about 6us, followed about 30us later by the LED actually displaying that new data and turning on. No 50us reset anywhere. All it takes is at least 5-6us (I use 6us to be safe) of low time to latch a new color.

But what is that 30us delay between the latch and the actual display of the new color? Maybe that is where the 50us in the datasheet comes from? Maybe, but it turns out that 30us has nothing to do with the reset timeout – instead it actually comes from the chip’s PWM circuit. If you want to learn more about that, you’ll have to read…

NeoPixels Revealed: Getting physical to uncover PWM Secrets

And now for the code…

With all this in mind, we can write some very simple Arduino code to drive NeoPixels…

#define PIXELS 96*11 // Number of pixels in the string

// These values depend on which pin your string is connected to and what board you are using
// More info on how to find these at http://www.arduino.cc/en/Reference/PortManipulation

// These values are for digital pin 8 on an Arduino Yun or digital pin 12 on a DueMilinove
// Note that you could also include the DigitalWriteFast header file to not need to to this lookup.

#define PIXEL_PORT PORTB // Port of the pin the pixels are connected to
#define PIXEL_DDR DDRB // Port of the pin the pixels are connected to
#define PIXEL_BIT 4 // Bit of the pin the pixels are connected to

// These are the timing constraints taken mostly from the WS2812 datasheets
// These are chosen to be conservative and avoid problems rather than for maximum throughput 

#define T1H  900    // Width of a 1 bit in ns
#define T1L  600    // Width of a 1 bit in ns

#define T0H  400    // Width of a 0 bit in ns
#define T0L  900    // Width of a 0 bit in ns

#define RES 7000    // Width of the low gap between bits to cause a frame to latch

// Here are some convenience defines for using nanoseconds specs to generate actual CPU delays

#define NS_PER_SEC (1000000000L) // Note that this has to be SIGNED since we want to be able to check for negative values of derivatives

#define CYCLES_PER_SEC (F_CPU)

#define NS_PER_CYCLE ( NS_PER_SEC / CYCLES_PER_SEC )

#define NS_TO_CYCLES(n) ( (n) / NS_PER_CYCLE )

#define DELAY_CYCLES(n) ( ((n)>0) ? __builtin_avr_delay_cycles( n ) : __builtin_avr_delay_cycles( 0 ) ) // Make sure we never have a delay less than zero

// Actually send a bit to the string. We turn off optimizations to make sure the compile does
// not reorder things and make it so the delay happens in the wrong place.

void sendBit(bool) __attribute__ ((optimize(0)));

void sendBit( bool bitVal ) {

    if ( bitVal ) {      // 1-bit

      bitSet( PIXEL_PORT , PIXEL_BIT );

      DELAY_CYCLES( NS_TO_CYCLES( T1H ) - 2 ); // 1-bit width less overhead for the actual bit setting
                                                     // Note that this delay could be longer and everything would still work
      bitClear( PIXEL_PORT , PIXEL_BIT );

      DELAY_CYCLES( NS_TO_CYCLES( T1L ) - 10 ); // 1-bit gap less the overhead of the loop

    } else {             // 0-bit

      cli();                                       // We need to protect this bit from being made wider by an interrupt 

      bitSet( PIXEL_PORT , PIXEL_BIT );

      DELAY_CYCLES( NS_TO_CYCLES( T0H ) - 2 ); // 0-bit width less overhead
                                                    // **************************************************************************
                                                    // This line is really the only tight goldilocks timing in the whole program!
                                                    // **************************************************************************
      bitClear( PIXEL_PORT , PIXEL_BIT );

      sei();

      DELAY_CYCLES( NS_TO_CYCLES( T0L ) - 10 ); // 0-bit gap less overhead of the loop

    }

    // Note that the inter-bit gap can be as long as you want as long as it doesn't exceed the 5us reset timeout (which is A long time)
    // Here I have been generous and not tried to squeeze the gap tight but instead erred on the side of lots of extra time.
    // This has thenice side effect of avoid glitches on very long strings becuase

}

void sendByte( unsigned char byte ) {

    for( unsigned char bit = 0 ; bit < 8 ; bit++ ) {

      sendBit( bitRead( byte , 7 ) ); // Neopixel wants bit in highest-to-lowest order
                                                     // so send highest bit (bit #7 in an 8-bit byte since they start at 0)
      byte <<= 1; // and then shift left so bit 6 moves into 7, 5 moves into 6, etc

    }
}

/*

The following three functions are the public API:
  ledSetup() - set up the pin that is connected to the string. Call once at the beginning of the program.
  sendPixel( r , g , b ) - send a single pixel to the string. Call this once for each pixel in a frame.
  show() - latch the recently sent pixels on the LEDs . Call once per frame.
*/

// Set the specified pin up as digital out

void ledsetup() {

  bitSet( PIXEL_DDR , PIXEL_BIT );

}

void sendPixel( unsigned char r, unsigned char g , unsigned char b ) {

  sendByte(g); // Neopixel wants colors in green-then-red-then-blue order
  sendByte(r);
  sendByte(b);

}

// Just wait long enough without sending any bots to cause the pixels to latch and display the last sent frame

void show() {
    DELAY_CYCLES( NS_TO_CYCLES(RES) );
}

You can download the full demo program from GitHub. Open it up in your Arduino IDE and change the #define PIXEL to match your string length and update the PORT, DDR, and BIT defines to match your board and pin.

I’m sorry that you have to look up the PORT values for your Arduino (which turns out to be non-trivial!), but there does not appear to be a clean way for the code to to do this without bringing in something like the DigitalFastWrite library, which would overly complicate things.

The top part of the demo program is the same code shown above and the bottom part is a rework of the AdaFruit strandtest program just to give you something familiar to start with. Keep in mind that I picked values that would look good on a huge string, so if you connect it to a 60 pixel strip that you might not be visually impressed.

This code should work unchanged on any AVR CPU that can toggle the 0-bit fast enough to meet the T0H maximum (that includes all Arduinos). This code is optimized for readability, simplicity, and changeability rather than speed. That said, it is still plenty fast and probably within 25% of being as fast as possible. I’ve also found that this technique of generating pixels on the fly encourages a more functional-ish programming style than working with buffers. This style often leads to code that ends up having faster refresh rates overall even though the per-bit times might not be as fast as the hand optimized bit-banging out of a buffer.  It also turns out that having some slop time after each bit is actually helpful for making very long strings stable…

NeoPixels Revealed: Why you should give your bits room to breathe

FAQ

Q: In the video, when the entire strip is lit full white, it looks like the far end has a orange tint to it. Is this an artifact of the camera or something?
A: Wow, you are very perceptive! Yes, the string did start to get orange at the end at full power. I actually didn’t just run out of apartment when making this long string – I also ran out of power supplies. I had 3 supplies, each 10amps. This is *almost* enough, but not when every single LED is full brightness white. When that happens, the voltage at the very end of the strip starts to get too low and the LEDs start looking pallid. Why orange? Because blue LEDs need the highest voltage drop to light so when the voltage sags they are the first to go followed by the greens – leaving yellows to oranges and ultimately red.

Q: Are you really, really sure that the reset pulse doesn’t have to be 50us long? ‘Cause that could make my display silky smooth!
A: I’ve tried lots and lots of NeoPixels and WS2812’s and I can not find one that needs more than 6us of low signal to latch. If you find one that needs longer, please let me know!

Q: I have some interrupt timing sensitive code that I have always wanted to run while also driving NeoPixels. Are you saying this might be possible?
A: No problem! As long as your interrupt service time is shorter than ~5us, you can leave interrupts on while driving your NeoPixels with the above code! The only place where we turn them off is during the 350ns + 4 cycles when we are actually generating a 0-bit pulse to make sure that it stays below the maximum T0H length.

Q: Can I call SendPixel() from inside a blocking interrupt routine?
A: Since sendBit() blindly disables and re-enables interrupts, it will blindly turn interrupts back on every time it sends a 0-bit. This is easy to fix. If you are going to always be sending bits from inside a blocking interrupt routine, you can just totally delete the cli() and sei() since interrupts will already be off. If you want to send both normally and from a blocking interrupt, you could store the interrupt state and then restore it rather than blindly sei()’ing.

Q: What NeoPixels are you testing with?
A: I’ve tried every WS2812 I could get my hands on, including lots of NeoPixels from AdaFruit, eBay, and Alibaba. If you find some picky NeoPixels that I missed, please let me know!

Q: Where are your favorite places to get your NeoPixels?
A: Of course I love AdaFruit because they are awesome, have great prices, and ship from NYC in 1-2 days. For huge bulk orders or when I need something special that AdaFruit doesn’t sell, I also have been super-happy with Mr-Right-LED because Mr. Zhang can make me 96 LED-per-meter strings just the way I like them with no silicone, heat shrink,  or adhesive – and they also have great prices and ship from Hong Kong in 2-3 days.

Q: Could I save time by using the gaps in between individual bits for image computation rather than waiting until the end of a pixel?
A: If your algorithm is fine grained enough to be efficiently broken into 24 very short work units per pixel, then this could be tremendously efficient -and impressive! Please send what you come up with!

Q: Silly question, but what if I want to show video!?!
A: No problem, just use Flash! Arduinos have way, way more Flash than they do RAM so there is plenty of room for video data especially if compressed. Read the video data out of flash, decompress it, and send it to the NeoPixels on the fly. No frame buffer needed as long as you can beat the generous 5us timeout. You will probably want to write a program in Java or Python to generate the PROGMEM C code with your compressed video data in it. Ognite uses this read-video-data-on-the-fly-from-flash technique, so is a good place to look for example code. If someone comes up with some super awesome video source material maybe I’ll build an example project to show exactly how it is done.

Q: When I try to compile your code, I get an “SimpleNeopixelDemo:55: error: ‘__builtin_avr_delay_cycles’ was not declared in this scope”  error!
A: You are a mac. Sorry. I know it sounds crazy, but it seams that the Arduino IDE on mac uses a version of the avr-gcc compiler that does not understand this function. I do not have a mac, so I can’t help debug this problem.

UPDATES:

6/12/2014- Updated the code snipit on this page to match the code in the demo project on GitHub. The timing constants have been relaxed even further (things that have minimums are longer and things that have maximums are shorter) based on feedback I’ve gotten from people running the code on lots of different WS2812’s from lots of places. The code overall now takes slightly longer, but should work with any chip no matter how how sloppy they are.

7/8/2014- Thanks to Hack-a-day, I found Tim’s similar work on WS2812Bs. He was motivated to optimize for speed and efficiency, but ends up with code that is also simple and elegant. If you are looking for code that is all C and easy to play with and understand, then my code is good – but if you want a fast library to use in production then you should check out his excellent library.

2 comments

  1. Kevin Wilson

    Thanks for all the great info! I am looking into using a lot of neopixels to indirect light a cofferred ceiling in a room addition we are planning. I do software and know too little about the hardware side, so I am a little concerned about doing the right thing for powering this project… that’s a lot of current! I may do 5V power supplies around the loop or was thinking about running 12 or 24VDC and using buck converters for the 5V. so I am thinking I want a continuous run for the ground and data line, but break the 5V line where each powersupply is inserted.. is this the right way to power a 30+ foot string? Can you point me somewhere to understand how this is done?
    Thanks for the help and all the great content!

    • bigjosh2

      I prefer to keep the distribution voltage as high as possible and then drop it down when it gets to where I need it, so I’d use 5 volt power supplies near the pixels they are powering. This lets you use much smaller distribution wires (12 volts needs 10x more current than 120 volts to transfer the same power), and also skips the extra equipment and losses of the buck converters.

      You do want to connect ground and data lines between adjacent strips (data line is not really “continuous” because it is generated at the output each pixel). You could split the power segments, but this could lead to a problem if two adjacent strips end up at significantly different voltage levels because, say, one happens to have all LEDs on bright while the other happens to have all LEDs off.

      Long story short, you can end up with a case where the voltage of a data in pin on the pixel next to the break is much higher than the power supply voltage that pixel is getting. This is not good.

      If you always connect the power supply pins together, then adjacent pixels will always have power voltages (and thus data voltage levels) that are very close to each other.

      Luckily, NeoPixel strips have pretty high resistance so as long as there are a couple of strips between power supplies, then the resistance of the strips will prevent the supplies from fighting each other too much even if they have slightly different voltages.

      Let me know if this doesn’t make sense and I’ll do a diagram that will hopefully explain better.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s