Driving a WS2812 RGB LED with an STM32

As seen on Hackaday!

If you like blinky LEDs – especially the full color RGB ones – you might have already come across the WS2812 LEDs.
These little suckers are extremely bright and can be purchased rather cheaply from eBay, 50 or a 100 pcs at a time or strips with 60 LEDs per meter.
You can also get them as bare LEDS, as strips or even as breadboard friendly breakout boards from Adafruit (NeoPixel) and SparkFun.

These devices can be chained together to endless strings (given you can supply enough power) and therefore are perfect for making big RGB screens/matrices.

As it turns out all the awesomeness comes with a downside: the digital interface used is not a standard interface. It’s an 800kHz data stream (1.25us per bit) with different duty cycles (high/low times) to represent a ‘0’ or a ‘1’ bit.

WS2812 timing diagram

Image source: SparkFun Electronics
Image source: SparkFun Electronics

WS2812 data packet format

Image source: SparkFun Electronics
Image source: SparkFun Electronics

In between each data sequence of 24 bits there is a 50us dead time which the LED uses to update it’s PWM output for each emitter.

Bit banging

Because the WS2812 protocol is not standard interface there are no hardware peripherals on most microcontrollers to support this interface.
This leaves only one possible solution on most microcontrollers: bit banging.

This means that the micro has to fetch the next bit, figure out if it’s high or low and then set the GPIO pin low at the correct time, all within 1.25us.

The timing requirements of the protocol will actually be rather tough to come by for most 8 bit micros clocked at 16MHz (depending on architecture).
It’s almost impossible to bit bang the protocol without coding directly in assembly and optimizing the code for execution time and paying close attention to the number of instructions (which is what the Adafruit NeoPixel library is doing).

A different approach

Therefore I decided to try my luck using an STM32VL Discovery board to drive the LED. As I have mentioned earlier the protocol used is non standard. As we can see from the timing diagram above, it basically is a crude form of PWM (pulse width modulation) and I’m actually leveraging the PWM and DMA capabilities of the STM32 to generate the needed PWM signals.

The trick with DMA (direct memory access) is that a number of data bytes (in this case a buffer) can be transferred from a memory location to the compare register of a timer without CPU intervention. The DMA controller will listen for an event, in this case the counter register reaching the compare value (i.e. the PWM pin going low) and then send the next byte to the timer’s compare register.

It might seem a bit overkill at first but the cool thing about DMA is that the compare register gets updated with a new value before the current PWM cycle is over, therefore the next PWM cycle will already use an update compare value and there will be no duplication of bits due to the CPU not being able to keep up with the timer.

This is an abstract of what my code is doing:

  1. Bring the 8 bit R, G and B values into the correct order (LED wants G-R-B)
  2. Figure out the bit sequence from the ordered bytes and create a buffer with 24 bytes of compare values to give the correct pulse width
  3. Append a number of 0 bytes to the buffer in order to create 50us dead time between data packets (pulse width = 0)
  4. Configure the DMA buffer size, memory location of the buffer and enable DMA channel for the timer
  5. Start the timer configured to create an 800kHz PWM signal and wait until the DMA buffer is empty
  6. Stop the timer and return to main program

The code can be found on GitHub as always.

Thanks for reading!

41 thoughts on “Driving a WS2812 RGB LED with an STM32

  1. Strange. But on my STM32F103 your code works is not entirely correct.
    Some colors are transferred to the LED is not correct.

    Your controller runs on 24 MHz or 72 MHz?

  2. By the way. Why DMA_BufferSize = 42?
    We have 2 diode, 2*24 = 48 ….

    “(len*24)+42; // number of bytes needed is #LEDs * 24 bytes + 42 trailing bytes”

    What is trailing bytes?

    1. I have found the timing not to be too critical on these LEDs, I have even managed to use a PIC microcontroller with 4MHz SPI to drive those LEDs.
      I have tried multiple things to get rid of the wrong timing on the first bit but unfortunately I have not found out why it would be shorter than the other bits.

      I am using an STM32F100RBT6B with an HSE frequency of 24MHz (8MHz crystal with *4 PLL).

      The 42 trailing bytes are set to 0 to ensure that the next stream of data will be transmitted after a delay of more than 50µs (I know it’s actually 40 bytes at 1.25µs but I use 42 instead to add a margin for timing that’s slightly off).
      Remember that these LEDs will shift through all data bytes until the data line goes low for more than 50µs, then the PWM of all LEDs is updated at once. So in this case the 42 trailing bytes are sent at the very end of the transmission.

      In the DMA initialization code I used 42 bytes at some point during development. That figure is irrelevant because the DMA buffer size is reloaded every time before a transmission (after it has been calculated on the fly).

      I hope this clears things up a bit ;)

  3. Hi,
    which toolchain are you using? I’ve tried compiling/linking the code with emIDE (GCC toolchain) and get 16 error messages.

    1. Hey again,

      I have fixed the Makefile to include files inside the project directory.
      Strangely enough I was only getting the error on my Windows machine, Linux box was working fine and dandy.

      Anyway, let me know if it has worked for you!


  4. Note that the timing on the WS2812/WS2812B LEDs has changed as of batches from WorldSemi manufactured made in October 2013, and timing tolerance for approx 10-30% of parts is very small.
    Recommendation from WorldSemi is now: 0 = 400ns high/850ns low, and 1 = 850ns high, 400ns low.

  5. Hey,
    thanks for the article! One thing: I ported the code to an STM32F0 discovery board. Now I get a strange behaviour: sending all zeroes and sending all ones works fine at ~800kHz with the expected duty cycles. But as soon as you switch from a “0” to a “1”, the PWM period for the “1” is shortened by a factor of two or so.

    Any idea on that? I tried resetting the CCR register after disabling the DMA and the TIM3, but that didn’t change it.

    1. I have had erroneous behaviour on PWM every time I wrote a new set of 24 bits. I don’t know if that is due to the timer already counting the next period as the first byte is loaded into the compare register.

      I haven’t had any other problems like you mention. Have you made sure that your compare values are correct and that the DMA is updating the compare register as soon as the compare match has occured?

  6. I tried again to
    For testing I set the compare values fix to G=0, R=FF, B=0. The output stream is correct up to the bit where 0 changes to 1. It looks like the compare register is updated too soon because the high pulse of the last 0 is 350ns, then there is a low time of only 187ns, goes high for 150ns and then low again for 560ns. After that the stream is ok again.

    The first compare is at 9 (=0). It resets the output, then the register is set to 17 by the DMA, but the counter is still at ten or so, so the output is set high again. Then the counter reaches 17, so the output goes low.

    I need to figure out how to let the DMA update the cycle after the counter is reset, not when the compare event happens.

  7. Got it: I used the TIM3_UPDATE on DMA channel 3. This way the CCR is updated when the Timer is reset.

    Found another bug though: The first bit of a stream was longer than the next (around 1µs high time). I fixed that by adding a CCR=0 in front of the data, so the buffersize is now (len*24)+43.

    So far I only used a scope to check if it’s working but the LEDs should be delivered today so I’ll be able try for real :)

      1. I have since rewritten the code and now use a slightly different approach.

        The code in that repository is still some work in progress and not encapsulated in it’s own library but it does output the correct waveforms for the WS2812 LEDs and automatically appends the 50us dead time at the end of the complete data frame.


  8. Sorry, me again: I did another optimization on your code:
    You had the LED_BYTE_Buffer declared as uint16_t which takes twice the RAM space you actually need. Due to the fact that I’ll drive over 1500 LEDs with the processor I ran out of memory.
    The fix is to declare the buffer as uint8_t and set the DMA_MemoryDataSize = DMA_MemoryDataSize_Byte.


    1. Hi Dadita

      I guess I’m having the same trouble you describe. When I measure the single with my scope the signal is jittering. It is not as stable as seen in the movie. Can you describe the steps you made?

      My sincerly,


      1. Hi LeddeL,

        I stopped writing all the steps I made because I didnt want to spam this comment section. The answer to your question is 2 posts above yours. I used the TIM3_UPDATE on DMA channel 3. This way the CCR is updated when the Timer is reset, instead of when the Compare-Event is set.

        BUT: I ran into the next thing which is memory. I want to drive over 1500 LEDs which means with the solution proposed in this article I would need ~12kBytes of RAM. My STM32F0 doesn’t have that much. So I built a small breadboard with two D-Flip-Flops and an OR-Gate and now I can use the SPI as it should be (every SPI-Bit is one WS2812-Bit). If anyone is interested, I can try to put a small tutorial together.

        Thanks to Elia nonetheless, because without him I’d still be stuck in DMA routines :)

        1. The idea of using two D-Flip Flops and an OR gate sounds interesting.
          Would you mind posting a schematic?


        2. Hi Dadita,

          I’m interested in the hardware/software setup you use to create the WS2812 signal out of your SPI Interface. Maybe you could post a code snippet and a schematic?


        3. Hi,

          Is it possible to get the schematic with the flip flops and gates please ? I guess you need a clock source ? from the MCU or with a RC oscillator ?

  9. I am developing a controller for an RGB led strip based on WS2812B ( http://www.ledlightinghut.com/144-led-m-ws2812-digital-intelligent-rgb-led-strip-light.html ) using STM32F373, which requires very specific timing. I decided to use DMA and timers for the tight control, using the ideas described by the OctoWS2811 library (http://www.pjrc.com/teensy/td_libs_OctoWS2811.html).

    I am having issues with getting the PWM channels to trigger DMA transfers as the DIER register is being reset in the HAL PWM initialize code.

  10. Dear Elia,
    Thank you for this example.
    Please, can you provide an example for a single LED!
    Thank in advance

    1. A single LED would be easier, just call the function appropriately: WS2812_send(&eightbit[i], 1);.

      Happy hacking,

  11. Thank you for the trick with PWM & DMA ! Is it specific to this CPU or it works the same way in other STM32’s ? I plan to use it to output a long scanline of bits.

    1. This should work for basically all STM32 with DMA as it uses standard ST peripheral library functions. There might be some differences between families though, you might need to try it out ;)

      Note that this is not exactly memory efficient or fast, you’d be hard pressed to be able to drive a lot of LEDs (memory limitation) at a high update rate for a display or something (because of the preprocessing).

      There are other solutions for the STM32s like the OctoWS2812 library which might be a better fit for what you’re doing.


  12. This is a good approach. Basically a fixed buffer for DMA and circular DMA mode with double buffering is still missing to be as memory friendly as possible. I use that approach in my Stm32f0 to convert any numbet of RGB values to the DMA buffer pwm values. Its a bit tricky to geht right, but it works like a charm. Special handling has to be implemented for adding the 50 us 0 values.

  13. If the timing tolerance is so low then why does everyone say it runs at 800kpbs? The bit duration is different for HIGH and LOW. Do you run the PWM at 800KHz?

  14. Hi, thanks for sharing your DMA/PWM method.
    I rewrote the code so it would work with libopencm3 using a stm32f103. I also used a circular buffer mentioned by Daniel Schnell in the comments. Works like a charm. If anyone is interested the code can be foud here:

  15. Hi, your library works geat and I tried to reduce the memory consumption.
    Your buffer contains only two different values: 17 and 9
    I changed the buffersize to uint8_t and the DMA-Memory-Size to DMA_MemoryDataSize_Byte, but now the lower leds are blinking weird and the others remain switched off.
    (The CCR-Register is initialized with 0)
    Can someone explain why it doesn’t work correctly?

    1. Now I have made some additional tests on my STM32F407-Discovery.
      It looks like the chip ignores the DMA_MemoryDataSize_Byte and as long as I’m using a uint16_t buffer, everything works fine.
      I tried out the newest peripheral library (1.5.0), but there was no difference

      uint16_t buffer[BUFFERSIZE] = { 0 }; // WORKING
      uint8_t buffer[BUFFERSIZE] = { 0 }; // FAILS

      DMA_InitStructure.DMA_PeripheralInc = DMA_PeripheralInc_Disable;
      DMA_InitStructure.DMA_MemoryInc = DMA_MemoryInc_Enable;
      DMA_InitStructure.DMA_PeripheralDataSize = DMA_PeripheralDataSize_HalfWord;
      DMA_InitStructure.DMA_MemoryDataSize = DMA_MemoryDataSize_Byte;

      1. I am not sure about it, because i did not test it. But you should set the destination address to TIMx->CCRy + byte offset because you want to program the lowest byte of the register. For example if the CCR is 16 bits wide you should have a byte offset of one to address the lower byte.

Leave a Reply

Your email address will not be published. Required fields are marked *

seven + 7 =