We experienced an outage across all stores last night, starting at 5:26pm CST and ending at 7:36pm for 99% of stores.
Most stores started experiencing errors around 5:26pm and returned to normal by 7:36pm; Blue = successful visits; Orange = error visits
At Cratejoy, we’ve built our system around redundancy. We’ve handled many merchants who have been featured on Shark Tank, CNN, Good Morning America, and many other high-profile promotions.
Last night we saw 5x more traffic than we’ve ever seen from any promotion, due to highly successful marketing campaigns. Despite being prepared for large spikes, this was much larger than anything we had seen before or had been prepared to handle.
Our site reliability engineering team was able to isolate the problem and return nearly all storefronts to a healthy state. We designed techniques on the fly to mitigate the problem and restore service to normal. Techniques that we can use much faster next time this happens.
We get as excited about these promotions as you do and want to make sure we can meet these critical high-traffic moments. Making the 2017 holiday season a blow-out for all merchants is critical to our mission. We don’t take failures like this lightly.
Through last night’s incident, we’ve exposed more work that we need to do to be fully prepared for the holiday season. The good news: much of this work was already underway and we expect to have a smooth holiday season with fast storefronts. We’re knocking out ways to allow us to handle much more traffic than we saw last night for a faster, more reliable Cratejoy for all merchants.