Between 1:09 and 2:21 PM CST we suffered an increased error rate across functions affecting both the Merchant Portal and Marketplace.
The Merchant dashboard was hardest hit, impacting 10% of requests made, while the Marketplace only saw around 4%. The Merchant Portal uses this service heavily, including in the auto-suggest global search bar.
This disruption was caused by a processing anomaly on one node of our primary search cluster. While this service already has built in failover should a node fail, the particular problem did not disable the node but maximized it’s usage drastically lowering the throughput. The result as a user is if you had the luck of being routed to that node, you’d have an extended wait, error or timeout.
We’ve already reached out to our cloud provider about the issue of the problem with failover.
We are expanding our capacity in our search cluster so that we have extra room for the unexpected going forward.