In this post I want to talk about a lesser know pattern — Governor Pattern.My first encounter with same happened in Book, Release It.
Although this pattern is not very common , its very effective especially in era of Automations when things can get out of hand really quickly, due to faulty script/configuration/code etc
Origin of this pattern is associated with that of Steam Engines. When Engineers realized power of steam, they needed way to control/check it so machines leveraging the power doesn’t get out of hand.
Through this concept/mechanism/pattern , they were able to govern speed of engines , they were able to control machines effectively
How did we use it ?
In Financial Systems, there is concept of Day End, Month End, Half Year and End of Year
Each of transactions needs to happen at specific date/ time - meaning if Interest application does not run successfully in month end and automatically date changes, it could cause problems in Profit Loss/Balance Sheet /Accounting etc.
For every Month end/Year end/Half year tasks we did below
- Before and after every critical task that needs to be run we added verification tasks. These verification was based on some rules — counts/amounts/time/ etc
- System halts for predefined duration before and after the task — so that if needed some human can administer/monitor/tally etc
- Interface is provided to administrator to take corrective action or reach out to support if he locates an issue and not able to fix it himself
- Finally before Date Change — system halts completely till someone asks it to proceed
In this way through right checks and controls — system doesn’t get out of control
Why did we do it ?
There was a incident one time, where in one of our monthly task indicated it ran and system date changed to next day. After change of date, new transactions from different channels started flowing — atm, mobile etc
However after sometime , someone noticed that Monthly task had failed and consequently , Interest application hadn't run as expected.
At this stage we had to halt the system. Change the date in system. Re-Run the task which had not run correctly and then retrofit all the new transactions in previous date. Rest assured it became a whole nighter
In Same book, Release It, Author talks about How Reddit has used it effectively to control autoscaler application
There was a famous Amazon Outage in 2019. If there was some check like these, which monitors rate at which machines/infrastructure are being shut down, then losses/damages could have been averted
Other places where it can be used
- Fraud Monitoring — if unusually high transactions happen in an account
- Inventory management — sudden change in product count