Parallel Processing Arduino Style – Make Massive NeoPixel Displays With Nanoscale Concurrent Computing

We’ve already seen that it is possible to drive thousands of WS2812B NeoPixels with a lowly Arduino using careful bit-banging. But what if we could bang out 8 bits at a time rather than sending them single file? Could it be possible to drive 8 times as many strings (or get 8 times the refresh rate) from our Arduino by processing bits in parallel? It would be like having a tiny pipelined GPU render engine inside our Arduino!

Read on to find out the results of a quick proof-of-concept test!….

Perfunctory Video

Spoiler alert: If you want to keep up the suspense, read the article first and then come back and watch the video.

Pins and Ports

Each output pin on the Arduino maps to a single bit of a port”. A port is just an internal register that happens to be connected to the pins, so writing to the port can change the output of the pins it is connected to. Ports are named B, C, D, etc. The pin-to-port mappings for an Uno are shown here…


For example, Digital Pin 4 maps to bit 4 of port D (shown as PD4), so if we set bit 4 of port D then digital pin #4 will go high (assuming it is set to output mode). You can read more about pins and ports here.

By writing a full byte to a port, we can set all the pins at once in one very quick step.  If we can compound these gains by doing of our computations and signal generation using 8 bits in parallel, we should be able to drive nearly 8 times as many pixels or get 8 times the refresh rate when compared to individual bit banging.

Picking Our Port

Looking carefully at the above map, you’ll see that only Port D has all of its bits mapped to accessible pins, so that is the one we will use. Mind as well get maximum bang for our bit-bang-buck!

To test, we very simply connect up 8 strings to the 8 pins of port D like this…


Real life is a bit messier, but still recognizable…

2016-05-04 12.55.03

Pushing Parallel Pixels

To drive a WS2812B Neopixel strip, we need to generate a sequence of specially timed signals. To drive 8 strips, we need to generate 8 of these sequences – and all at the same time. This is not as hard as it sounds since each data bit in the generated signal can be neatly divided into three phases…


Signals for sending one WS2812B data bit

Step Color Output Description Duration
1 GREEN HIGH always high  (T0H)
2 YELLOW DATA the data itself (T1H-T0H)
3 RED LOW always low (T0H)

(The exact timings for each phase are described here)

See how the only difference between a 1 bit and a 0 bit is the level in the time period shown in yellow? This makes things much simpler for us since the beginning and ending of each bit are always the same.

So, to transmit a set of 8 encoded data bits (1 data bit to each string), all we need to do is…

  1. set all bits in the port to 1 (which is a single write of 0xff to the port)
  2. wait the right amount of time
  3. set the bits in the port to the data we want to send to each string (also a single write of a byte with all the 8 bits set to the correct data)
  4. wait the right amount of time
  5. set all the bits in the port to 0 (again, a single write of 0x00 to the port)
  6. wait the right amount of time and repeat

Translated into pseudo assembly code, that looks like…

out PORTD, 0xff ; set all pins on port D to 1
delay T0H ; complete 1st phase of an encoded bit
out PORTD, data ; set all pins on port D to their data values
delay T1H-T0H ; complete 2nd phase of an encoded bit
out PORTD, 0x00 ; set all pins on port D to 0
delay T1L ; complete last phase of an encoded bit

…things get a bit messier converting to real Arduino C code, but the steps are still recognizable…

[code lang=”cpp”]
// Actually send the next set of 8 WS2812B encoded bits to the 8 pins.
// We must to drop to asm to enusre that the complier does
// not reorder things and make it so the delay happens in the wrong place.

static inline __attribute__ ((always_inline)) void sendBitX8( uint8_t bits ) {

const uint8_t onBits = 0xff; // We need to send all bits on on all pins as the first 1/3 of the encoded bits

asm volatile (

"out %[port], %[onBits] \n\t" // 1st step – send T0H high

".rept %[T0HCycles] \n\t" // Execute NOPs to delay exactly the specified number of cycles
"nop \n\t"
".endr \n\t"

"out %[port], %[bits] \n\t" // set the output bits to thier values for T0H-T1H
".rept %[dataCycles] \n\t" // Execute NOPs to delay exactly the specified number of cycles
"nop \n\t"
".endr \n\t"

"out %[port],__zero_reg__ \n\t" // last step – T1L all bits low

// Don’t need an explicit delay here since the overhead that follows will always be long enough

[port] "I" (_SFR_IO_ADDR(PIXEL_PORT)),
[bits] "d" (bits),
[onBits] "d" (onBits),

[T0HCycles] "I" (NS_TO_CYCLES(T0H) – 2), // 1-bit width less overhead for the actual bit setting, note that this delay could be longer and everything would still work

[dataCycles] "I" (NS_TO_CYCLES((T1H-T0H)) – 2) // Minimum interbit delay. Note that we probably don’t need this at all since the loop overhead will be enough, but here for correctness




Note that we do not explicitly wait for the T1L delay during the final phase since the overhead of calling the function will add enough time of low level between bits.

A Pixel Is More Than Just a Bit

We are using color strips, so each pixel is a total of 24 bits long – 8 bits for each Red, Green, and Blue brightness.  To keep things simple for this test, we will just send each pixel as either 24 1‘s for on or 24 0‘s for off. The 24 1‘s encode a brightness of 255 for all three colors – which corresponds to full brightness white pixel (visually “ON”). The 24 0‘s encode a brightness of 0 for all three colors- which corresponds to a black pixel (visually “OFF”).

[code lang=”cpp”]
// Send a single pixel out to each of the 8 strings
/ Each bit in `row` indicates if the pixel in the corresponding string should be on or off

static inline void __attribute__ ((always_inline)) sendPixelRow( uint8_t row ) {

// Send the bit 24 times down every row.
// This ends up as 100% white if the bit in row is 1, or black (off) if the bit is 0.
// Remember that each pixel is 24 bits wide (8 bits each for R,G, & B)

uint8_t bit=24;

while (bit–) {

sendBitX8( row );



Commence Test Data Transmission!

Now we are ready to test!

Remember that each call to sendPixelRow() will send one full pixel to each of the 8 attached strips. Each bit in the passed byte corresponds to one of the strips. Bit 0 goes to the bottom strip, bit 7 to the top one, etc…

Let’s send an interesting pattern so we can tell if it works….

[code lang=”cpp”]
sendPixelRow( 0b10000000 ); // Send an interesting and challenging pattern
sendPixelRow( 0b01000000 );
sendPixelRow( 0b00100000 );
sendPixelRow( 0b00010000 );
sendPixelRow( 0b00001000 );
sendPixelRow( 0b00000100 );
sendPixelRow( 0b00000010 );
sendPixelRow( 0b00000001 );
sendPixelRow( 0b00000010 );
sendPixelRow( 0b00000100 );
sendPixelRow( 0b00001000 );
sendPixelRow( 0b00010000 );
sendPixelRow( 0b00100000 );
sendPixelRow( 0b01000000 );
sendPixelRow( 0b10000000 );
sendPixelRow( 0b00000000 );
sendPixelRow( 0b01010101 );
sendPixelRow( 0b10101010 );
sendPixelRow( 0b01010101 );
sendPixelRow( 0b10101010 );
sendPixelRow( 0x00000000 );
sendPixelRow( 0b11111111 );
sendPixelRow( 0x00000000 );
sendPixelRow( 0b11111111 );
sendPixelRow( 0x00000000 );

Success! Hopefully you can make out the test pattern on the 8 strips…

2016-05-04 13.37.32

This is really cool! We updated 8 strips in about the same amount of time as it takes to update just one! The power of parallel bit-banging!


Now that we have proof of concept, we need to figure out a way to put all this extra bandwidth to good use.  Ideally, we want to find an application that also lets us do our display computations in parallel so we effectively have a (tiny!) 8-way parallel processor generating our display.

Stay tuned, because I know a prefect project to make the most of our new found power. This is going to be big…

Code Drop

Complete working sketch for an Arduino Uno here…


Q: Why?

When the question came up, I thought it was interesting enough to justify a proof of concept test.

Q: Why bother doing this on an Arduino? Just get a Beaglebone/Teensy/RaspPi/Fadecandy!

All of these platforms have more horsepower/memory/swagger than a lowly Arduino. The BeagleBone’s PRU is especially well suited to driving lots and lots of Neopixel strings in parallel.

That said, the Arduino is a widely available and popular platform, and lots and lots of people use them for driving WS2812B NeoPixels. The Arduino naively runs at 5 volts, so you can connect the strings directly to it. The Arduino is bare metal, so generating the precise timing needed is straight forward (although not necessarily easy).

Using an Arduino (or cheap clone) can be an order of magnitude cheaper than the above platforms and there is an aesthetic beauty to using the minimum hardware necessary to solve a problem.

The code presented here can even run directly on a $2 naked Arduino chip connected directly to the strips- even getting its power from them.  All that’s needed is a bit of tweaking to the clock speed to avoid needing the 20MHz crystal.

Q: Can you do color? Animation? Scrolling? Video Games? Ahhhh- head exploding with ideas!!!!

All this and more. Just wait until next time…


  1. dntruong

    I’m glad you got this working: The main concern is not actually driving 8 pins, but getting 8bits repeatedly from 8 places in your pixel array.
    I’ve managed getting up to two bits working, hence I have a 2bit bitbang mode on FAB_LED, but not eight…
    The question is: can we read 8 bytes to push before the strip times out and resets. Now I have not tried to buffer the 8 bytes but I don’t see how that makes things faster and consumes RAM.

    • bigjosh2

      I have an application that is ideally suited to processing the pixels 8 bits at a time and can generate the display pixels and resulting signals in real-time fast enough to avoid inadvertent resets. It is even super memory efficient, needing only a single byte to store 6 rows of the display. Stay tuned for some very long (and actually useful) displays!

      • dntruong

        Note: I’ve implemented blindly APA102 support and ARM support, but IDK how buggy it is. Don’t play with it yet unless you wanna debug it :).

  2. dntruong

    How to pull it off? pipelining.
    First, offset the display time to each strip by 8.
    Create a loop that loads ONE byte in a buffer register from memory at every iteration.
    Use 8 registers as buffers.
    Now pull a bit from each register to form the port’s next value. display. move bit pointer. The trick is each register displays a different bit and loads when its bit counter hits 8.
    Makes sense?
    Question : are there enough registers to hold 8 buffered bytes, indices, addresses do the math, etc. for the loop to work.

    • dntruong

      I have pushed support for 8 ports on FAB_LED. However it’s a dumb approach that won’t work on Uno.
      I have implemented a pipelined version that does one memory access per LED bit push, but it doesn’t quite work:

      inline void
      avrBitbangLedStrip<FAB_TVAR>::eightPortSoftwareSendBytes(const uint16_t count, const uint8_t * array)
      const uint16_t blockSize = count / 8;
      uint8_t r0 __asm__("r2") = 0;
      uint8_t r1 __asm__("r3") = 0;
      uint8_t r2 __asm__("r4") = 0;
      uint8_t r3 __asm__("r5") = 0;
      uint8_t r4 __asm__("r6") = 0;
      uint8_t r5 __asm__("r7") = 0;
      uint8_t r6 __asm__("r8") = 0;
      uint8_t r7 __asm__("r9") = 0;
      for(register uint16_t c = 0; c < blockSize + 8; c++) {
      for(register int8_t b = 7; b >= 0; b–) {
      uint8_t bitmask __asm__("r10");
      // bitmask |= r0 & 1 ; r0 >>= 1;
      // bitmask |= (r1 & 1) << 1; r1 >>= 1;
      // bitmask |= (r2 & 1) << 2; r2 >>= 1;
      // bitmask |= (r3 & 1) << 3; r3 >>= 1;
      // bitmask |= (r4 & 1) << 4; r4 >>= 1;
      // bitmask |= (r5 & 1) << 5; r5 >>= 1;
      bitmask |= (r6 & 1) << 6; r6 >>= 1;
      bitmask |= (r7 & 1) << 7; r7 >>= 1;
      // Load ONE byte per iter.
      // Skip end condition check to reduce CPU cycles.
      // It is expected that the LED strip won't have extra pixels.
      switch (b) {
      case 0:
      // if (c < blockSize)
      r0 = array[c];
      case 1:
      // if (c < blockSize+1)
      r1 = array[c + 1 * blockSize];
      case 2:
      r2 = array[c + 2 * blockSize];
      case 3:
      r3 = array[c + 3 * blockSize];
      case 4:
      r4 = array[c + 4 * blockSize];
      case 5:
      r5 = array[c + 5 * blockSize];
      case 6:
      r6 = array[c + 6 * blockSize];
      case 7:
      r7 = array[c + 7 * blockSize];
      // Set all HIGH, set LOW all zeros, set LOW zeros and ones.
      FAB_PORT(dataPortId, 0xFF);
      DELAY_CYCLES(4); //high0 – sbiCycles);
      // FAB_PORT(dataPortId, bitmask);
      AVR_PORT(dataPortId) &= bitmask;
      DELAY_CYCLES(4); //high1 – sbiCycles – high0);
      FAB_PORT(dataPortId, 0x00);
      // Let's assume we'll spend enough time doing math to not need to wait
      //DELAY_CYCLES(20); //low0 – cbiCycles);

      view raw


      hosted with ❤ by GitHub

      Maybe you can spot what I did wrong?

      • dntruong

        Well bitmap was not initialized. :/

        So this works for 4 ports, but it blows my mind, it doesn’t with 8, though the loop should be perfectly balanced at 8. :/

        • bigjosh2

          After a quick look, (1) I think you can make the bit scatter gather much faster with a sprinkle of ASM. Instead of all that ANDing, use only a ROR and ROL for each bit, (2) change the layout of the buffer so that each block is always 8 bytes long, this avoids any multiplies and the only overhead for each byte in the buffer is a MOV Rx, Z+. You have plenty of time to deal with the shuffled 8x buffer blocks in the foreground thread, no reason to spend time on it when in a rush to get the bits out.

          • dntruong

            I checked in code in FAB_LED, and so far it can drive max 6 pins in parallel on a 16MHz Uno.
            Example F demos it.
            Daniel Garcia, the FastLED guy, helped me iron out some of the bugs in the code.

            BTW I usually rely on gcc to use optimal instructions, just helping it getting it right with proper coding that makes it go the right path.

            I admit I’m doing it blind still, as I don’t look at the ASM (partly IDK where to find it with the IDE :P ). I should do that to check if code already generates a ROL to save a couple of cycles per register and make this handle 8 ports with spare cycles.

            I’d use 8B only for rgbw, as I want to keep the flexibility for users. I suspect the * will be replaced by a shift right.

            IDK what you mean by “foreground”. In FAB_LED the idea is I don’t buffer anything. There’s one array of data owned by the user which may hold 1 to 32 bits per pixel.

            Current working code (max 6 ports @16MHz):


  3. Nikos

    Well done, very interesting concept.
    I hwas however confused from the name
    “sendPixelRow( 0b10000000 );”

    Would it be more precise if it was sendPixelColumn ? This is what it does isn;t it?

    • bigjosh2

      Yes, I struggled with descriptive names for these functions, and sendPixelRow() probably ended up being the worst possible name for this function in this context where it is sending a column of pixels to the display. I’ll fix next time I rev the code. Thanks!

  4. CW&T (@cwandt)

    I dropped in an arduino Yun (wifi) to make a simple, but very very long weather display. Uses curl to hit a custom domain that returns a text string for the display to scroll.

    // This simplified demo scrolls the text of the Jaberwoky poem directly from flash memory
    // Full article at
    // Change this to be at least as long as your pixel string (too long will work fine, just be a little slower)
    #define PIXELS 96*4 // Number of pixels in the string. I am using 4 meters of 96LED/M
    // These values depend on which pins your 8 strings are connected to and what board you are using
    // More info on how to find these at
    // PORTD controls Digital Pins 0-7 on the Uno
    // You'll need to look up the port/bit combination for other boards.
    // Note that you could also include the DigitalWriteFast header file to not need to to this lookup.
    #define PIXEL_PORT PORTB // Port of the pin the pixels are connected to
    #define PIXEL_DDR DDRB // Port of the pin the pixels are connected to
    //connecting to PORTB on an arduino YUN
    //11 10 9 8 MISO MOSI SCK
    //static const uint8_t onBits = 0b11011111; // Bit pattern to write to port to turn on all pins connected to LED strips.
    static const uint8_t onBits = 0b11111110; // Bit pattern to write to port to turn on all pins connected to LED strips.
    // If you do not want to use all 8 pins, you can mask off the ones you don't want
    // Note that these will still get 0 written to them when we send pixels
    // TODO: If we have time, we could even add a variable that will and/or into the bits before writing to the port to support any combination of bits/values
    // These are the timing constraints taken mostly from
    // imperically measuring the output from the Adafruit library strandtest program
    // Note that some of these defined values are for refernce only – the actual timing is determinted by the hard code.
    #define T1H 814 // Width of a 1 bit in ns – 13 cycles
    #define T1L 438 // Width of a 1 bit in ns – 7 cycles
    #define T0H 312 // Width of a 0 bit in ns – 5 cycles
    #define T0L 936 // Width of a 0 bit in ns – 15 cycles
    // Phase #1 – Always 1 – 5 cycles
    // Phase #2 – Data part – 8 cycles
    // Phase #3 – Always 0 – 7 cycles
    #define RES 500000 // Width of the low gap between bits to cause a frame to latch
    // Here are some convience defines for using nanoseconds specs to generate actual CPU delays
    #define NS_PER_SEC (1000000000L) // Note that this has to be SIGNED since we want to be able to check for negative values of derivatives
    #define CYCLES_PER_SEC (F_CPU)
    #define NS_TO_CYCLES(n) ( (n) / NS_PER_CYCLE )
    // Sends a full 8 bits down all the pins, represening a single color of 1 pixel
    // We walk though the 8 bits in colorbyte one at a time. If the bit is 1 then we send the 8 bits of row out. Otherwise we send 0.
    // We send onBits at the first phase of the signal generation. We could just send 0xff, but that mught enable pull-ups on pins that we are not using.
    /// Unforntunately we have to drop to ASM for this so we can interleave the computaions durring the delays, otherwise things get too slow.
    // OnBits is the mask of which bits are connected to strips. We pass it on so that we
    // do not turn on unused pins becuase this would enable the pullup. Also, hopefully passing this
    // will cause the compiler to allocate a Register for it and avoid a reload every pass.
    static inline void sendBitx8( const uint8_t row , const uint8_t colorbyte , const uint8_t onBits ) {
    asm volatile (
    "L_%=: \n\r"
    "out %[port], %[onBits] \n\t" // (1 cycles) – send either T0H or the first part of T1H. Onbits is a mask of which bits have strings attached.
    // Next determine if we are going to be sending 1s or 0s based on the current bit in the color….
    "mov r0, %[bitwalker] \n\t" // (1 cycles)
    "and r0, %[colorbyte] \n\t" // (1 cycles) – is the current bit in the color byte set?
    "breq OFF_%= \n\t" // (1 cycles) – bit in color is 0, then send full zero row (takes 2 cycles if branch taken, count the extra 1 on the target line)
    // If we get here, then we want to send a 1 for every row that has an ON dot…
    "nop \n\t " // (1 cycles)
    "out %[port], %[row] \n\t" // (1 cycles) – set the output bits to [row] This is phase for T0H-T1H.
    // ==========
    // (5 cycles) – T0H (Phase #1)
    "nop \n\t nop \n\t " // (2 cycles)
    "nop \n\t nop \n\t " // (2 cycles)
    "nop \n\t nop \n\t " // (2 cycles)
    "nop \n\t " // (1 cycles)
    "out %[port], __zero_reg__ \n\t" // (1 cycles) – set the output bits to 0x00 based on the bit in colorbyte. This is phase for T0H-T1H
    // ==========
    // (8 cycles) – Phase #2
    "ror %[bitwalker] \n\t" // (1 cycles) – get ready for next pass. On last pass, the bit will end up in C flag
    "brcs DONE_%= \n\t" // (1 cycles) Exit if carry bit is set as a result of us walking all 8 bits. We assume that the process around us will tak long enough to cover the phase 3 delay
    "nop \n\t \n\t " // (1 cycles) – When added to the 5 cycles in S:, we gte the 7 cycles of T1L
    "jmp L_%= \n\t" // (3 cycles)
    // (1 cycles) – The OUT on the next pass of the loop
    // ==========
    // (7 cycles) – T1L
    "OFF_%=: \n\r" // (1 cycles) Note that we land here becuase of breq, which takes takes 2 cycles
    "out %[port], __zero_reg__ \n\t" // (1 cycles) – set the output bits to 0x00 based on the bit in colorbyte. This is phase for T0H-T1H
    // ==========
    // (5 cycles) – T0H
    "ror %[bitwalker] \n\t" // (1 cycles) – get ready for next pass. On last pass, the bit will end up in C flag
    "brcs DONE_%= \n\t" // (1 cycles) Exit if carry bit is set as a result of us walking all 8 bits. We assume that the process around us will tak long enough to cover the phase 3 delay
    "nop \n\t nop \n\t " // (2 cycles)
    "nop \n\t nop \n\t " // (2 cycles)
    "nop \n\t nop \n\t " // (2 cycles)
    "nop \n\t nop \n\t " // (2 cycles)
    "nop \n\t " // (1 cycles)
    "jmp L_%= \n\t" // (3 cycles)
    // (1 cycles) – The OUT on the next pass of the loop
    // ==========
    //(15 cycles) – T0L
    "DONE_%=: \n\t"
    // Don't need an explicit delay here since the overhead that follows will always be long enough
    [port] "I" (_SFR_IO_ADDR(PIXEL_PORT)),
    [row] "d" (row),
    [onBits] "d" (onBits),
    [colorbyte] "d" (colorbyte ), // Phase 2 of the signal where the actual data bits show up.
    [bitwalker] "r" (0x80) // Alocate a register to hold a bit that we will walk down though the color byte
    // Note that the inter-bit gap can be as long as you want as long as it doesn't exceed the reset timeout (which is A long time)
    // Just wait long enough without sending any bots to cause the pixels to latch and display the last sent frame
    void show() {
    delayMicroseconds( (RES / 1000UL) + 1); // Round up since the delay must be _at_least_ this long (too short might not work, too long not a problem)
    // Send 3 bytes of color data (R,G,B) for a signle pixel down all the connected stringsat the same time
    // A 1 bit in "row" means send the color, a 0 bit means send black.
    static inline void sendRowRGB( uint8_t row , uint8_t r, uint8_t g, uint8_t b ) {
    //if(row>0 && row < 64) row–;
    sendBitx8( row , g , onBits); // WS2812 takes colors in GRB order
    sendBitx8( row , r , onBits); // WS2812 takes colors in GRB order
    sendBitx8( row , b , onBits); // WS2812 takes colors in GRB order
    // This nice 5×7 font from here…
    // Font details:
    // 1) Each char is fixed 5×7 pixels.
    // 2) Each byte is one column.
    // 3) Columns are left to right order, leftmost byte is leftmost column of pixels.
    // 4) Each column is 8 bits high.
    // 5) Bit #7 is top line of char, Bit #1 is bottom.
    // 6) Bit #0 is always 0, becuase this pin is used as serial input and setting to 1 would enable the pull-up.
    // defines ascii characters 0x20-0x7F (32-127)
    // PROGMEM after variable name as per
    #define FONT_WIDTH 5
    #define INTERCHAR_SPACE 1
    #define ASCII_OFFSET 0x20 // ASSCI code of 1st char in font array
    const uint8_t Font5x7[] PROGMEM = {
    0x00, 0x00, 0x00, 0x00, 0x00, //
    0x00, 0x00, 0xfa, 0x00, 0x00, // !
    0x00, 0xe0, 0x00, 0xe0, 0x00, // "
    0x28, 0xfe, 0x28, 0xfe, 0x28, // #
    0x24, 0x54, 0xfe, 0x54, 0x48, // $
    0xc4, 0xc8, 0x10, 0x26, 0x46, // %
    0x6c, 0x92, 0xaa, 0x44, 0x0a, // &
    0x00, 0xa0, 0xc0, 0x00, 0x00, // '
    0x00, 0x38, 0x44, 0x82, 0x00, // (
    0x00, 0x82, 0x44, 0x38, 0x00, // )
    0x10, 0x54, 0x38, 0x54, 0x10, // *
    0x10, 0x10, 0x7c, 0x10, 0x10, // +
    0x00, 0x0a, 0x0c, 0x00, 0x00, // ,
    0x10, 0x10, 0x10, 0x10, 0x10, //
    0x00, 0x06, 0x06, 0x00, 0x00, // .
    0x04, 0x08, 0x10, 0x20, 0x40, // /
    0x7c, 0x8a, 0x92, 0xa2, 0x7c, // 0
    0x00, 0x42, 0xfe, 0x02, 0x00, // 1
    0x42, 0x86, 0x8a, 0x92, 0x62, // 2
    0x84, 0x82, 0xa2, 0xd2, 0x8c, // 3
    0x18, 0x28, 0x48, 0xfe, 0x08, // 4
    0xe4, 0xa2, 0xa2, 0xa2, 0x9c, // 5
    0x3c, 0x52, 0x92, 0x92, 0x0c, // 6
    0x80, 0x8e, 0x90, 0xa0, 0xc0, // 7
    0x6c, 0x92, 0x92, 0x92, 0x6c, // 8
    0x60, 0x92, 0x92, 0x94, 0x78, // 9
    0x00, 0x6c, 0x6c, 0x00, 0x00, // :
    0x00, 0x6a, 0x6c, 0x00, 0x00, // ;
    0x00, 0x10, 0x28, 0x44, 0x82, // <
    0x28, 0x28, 0x28, 0x28, 0x28, // =
    0x82, 0x44, 0x28, 0x10, 0x00, // >
    0x40, 0x80, 0x8a, 0x90, 0x60, // ?
    0x4c, 0x92, 0x9e, 0x82, 0x7c, // @
    0x7e, 0x88, 0x88, 0x88, 0x7e, // A
    0xfe, 0x92, 0x92, 0x92, 0x6c, // B
    0x7c, 0x82, 0x82, 0x82, 0x44, // C
    0xfe, 0x82, 0x82, 0x44, 0x38, // D
    0xfe, 0x92, 0x92, 0x92, 0x82, // E
    0xfe, 0x90, 0x90, 0x80, 0x80, // F
    0x7c, 0x82, 0x82, 0x8a, 0x4c, // G
    0xfe, 0x10, 0x10, 0x10, 0xfe, // H
    0x00, 0x82, 0xfe, 0x82, 0x00, // I
    0x04, 0x02, 0x82, 0xfc, 0x80, // J
    0xfe, 0x10, 0x28, 0x44, 0x82, // K
    0xfe, 0x02, 0x02, 0x02, 0x02, // L
    0xfe, 0x40, 0x20, 0x40, 0xfe, // M
    0xfe, 0x20, 0x10, 0x08, 0xfe, // N
    0x7c, 0x82, 0x82, 0x82, 0x7c, // O
    0xfe, 0x90, 0x90, 0x90, 0x60, // P
    0x7c, 0x82, 0x8a, 0x84, 0x7a, // Q
    0xfe, 0x90, 0x98, 0x94, 0x62, // R
    0x62, 0x92, 0x92, 0x92, 0x8c, // S
    0x80, 0x80, 0xfe, 0x80, 0x80, // T
    0xfc, 0x02, 0x02, 0x02, 0xfc, // U
    0xf8, 0x04, 0x02, 0x04, 0xf8, // V
    0xfe, 0x04, 0x18, 0x04, 0xfe, // W
    0xc6, 0x28, 0x10, 0x28, 0xc6, // X
    0xc0, 0x20, 0x1e, 0x20, 0xc0, // Y
    0x86, 0x8a, 0x92, 0xa2, 0xc2, // Z
    0x00, 0x00, 0xfe, 0x82, 0x82, // [
    0x40, 0x20, 0x10, 0x08, 0x04, // (backslash)
    0x82, 0x82, 0xfe, 0x00, 0x00, // ]
    0x20, 0x40, 0x80, 0x40, 0x20, // ^
    0x02, 0x02, 0x02, 0x02, 0x02, // _
    0x00, 0x80, 0x40, 0x20, 0x00, // `
    0x04, 0x2a, 0x2a, 0x2a, 0x1e, // a
    0xfe, 0x12, 0x22, 0x22, 0x1c, // b
    0x1c, 0x22, 0x22, 0x22, 0x04, // c
    0x1c, 0x22, 0x22, 0x12, 0xfe, // d
    0x1c, 0x2a, 0x2a, 0x2a, 0x18, // e
    0x10, 0x7e, 0x90, 0x80, 0x40, // f
    0x10, 0x28, 0x2a, 0x2a, 0x3c, // g
    0xfe, 0x10, 0x20, 0x20, 0x1e, // h
    0x00, 0x22, 0xbe, 0x02, 0x00, // i
    0x04, 0x02, 0x22, 0xbc, 0x00, // j
    0x00, 0xfe, 0x08, 0x14, 0x22, // k
    0x00, 0x82, 0xfe, 0x02, 0x00, // l
    0x3e, 0x20, 0x18, 0x20, 0x1e, // m
    0x3e, 0x10, 0x20, 0x20, 0x1e, // n
    0x1c, 0x22, 0x22, 0x22, 0x1c, // o
    0x3e, 0x28, 0x28, 0x28, 0x10, // p
    0x10, 0x28, 0x28, 0x18, 0x3e, // q
    0x3e, 0x10, 0x20, 0x20, 0x10, // r
    0x12, 0x2a, 0x2a, 0x2a, 0x04, // s
    0x20, 0xfc, 0x22, 0x02, 0x04, // t
    0x3c, 0x02, 0x02, 0x04, 0x3e, // u
    0x38, 0x04, 0x02, 0x04, 0x38, // v
    0x3c, 0x02, 0x0c, 0x02, 0x3c, // w
    0x22, 0x14, 0x08, 0x14, 0x22, // x
    0x30, 0x0a, 0x0a, 0x0a, 0x3c, // y
    0x22, 0x26, 0x2a, 0x32, 0x22, // z
    0x00, 0x10, 0x6c, 0x82, 0x00, // {
    0x00, 0x00, 0xfe, 0x00, 0x00, // |
    0x00, 0x82, 0x6c, 0x10, 0x00, // }
    0x10, 0x10, 0x54, 0x38, 0x10, // ~
    0x10, 0x38, 0x54, 0x10, 0x10, // 
    // Send the pixels to form the specified char, not including interchar space
    // skip is the number of pixels to skip at the begining to enable sub-char smooth scrolling
    // TODO: Subtract the offset from the char before starting the send sequence to save time if nessisary
    // TODO: Also could pad the begining of the font table to aovid the offset subtraction at the cost of 20*8 bytes of progmem
    // TODO: Could pad all chars out to 8 bytes wide to turn the the multiply by FONT_WIDTH into a shift
    static inline void sendChar( uint8_t c , uint8_t skip , uint8_t r, uint8_t g, uint8_t b ) {
    const uint8_t *charbase = Font5x7 + (( c – ' ') * FONT_WIDTH ) ;
    uint8_t col = FONT_WIDTH;
    while (skip–) {
    while (col–) {
    // sendRowRGB( pgm_read_byte_near( charbase++ ) , r , g , b );
    sendRowRGB( pgm_read_byte_near(charbase++) , r , g , b );
    while (col–) {
    sendRowRGB( 0 , r , g , b ); // Interchar space
    // Show the passed string. The last letter of the string will be in the rightmost pixels of the display.
    // Skip is how many cols of the 1st char to skip for smooth scrolling
    static inline void sendString( const char *s , uint8_t skip , const uint8_t r, const uint8_t g, const uint8_t b ) {
    unsigned int l = PIXELS / (FONT_WIDTH + INTERCHAR_SPACE);
    sendChar( *s , skip , r , g , b ); // First char is special case becuase it can be stepped for smooth scrolling
    while ( *(++s) && l–) {
    sendChar( *s , 0, r , g , b );
    #include <Process.h>
    char c [] = " "
    " "
    " "
    " "
    " "
    " "
    " "
    " "
    " "
    unsigned long previousMillis = 0;
    unsigned long interval = 150000L;//5*60*1000L;
    void setup() {
    PIXEL_DDR |= onBits; // Set used pins to output mode
    Bridge.begin(); //disables pin 0, 1
    //while (!Serial);
    void loop() {
    unsigned long currentMillis = millis();
    if (currentMillis – previousMillis > interval) {
    previousMillis = currentMillis;
    //if (millis() > 5 * 60 * 1000 && millis() % int(20 * 60 * 1000) < 500) runCurl();
    const char *m = c;
    while (*m) {
    for ( uint8_t step = 0; step < FONT_WIDTH + INTERCHAR_SPACE ; step++ ) { // step though each column of the 1st char for smooth scrolling
    sendString( m , step , 30, 30, 7 );
    if (millis() % 2000 < 1000) digitalWrite(14, HIGH);
    else digitalWrite(14, LOW);
    void runCurl() {
    // curl is command line program for transferring data using different internet protocols
    Process p; // Create a process and call it "p"
    p.begin("curl"); // Process that launch the "curl" command
    p.addParameter(""); // Add the URL parameter to "curl"; // Run the process and wait for its termination
    int j = 0;
    while (p.available() > 0) {
    char d =;
    c[j] = d;
    c[j] = 0; // null terminate string


    • bigjosh2

      Thanks CW! As much as I hate the Yun, this is a solution that works and I know that lots of people will use it. The Yun has an additional Serial port that does not use the D0-D7 pins at all, so no conflict the neopixel code. (It uses this extra serial port to talk to the little onboard linux computer.)

      Got any suggestions for good test websites that return interesting text to use for “”?
      Got a video of your setup to share?!

      Thanks again!

Leave a Reply to bigjosh2Cancel reply