In control theory, observability is a measure of how well internal states of a system can be inferred from knowledge of its external outpu
What I learnt at Amazon AWS
- If something happens 1% of the time during validation tests, it’s going to be painful when go to production (imagine what is 1% of requests at AWS)
- Every time you plan for a new feature you should also add what metrics you are going to emit, what you are going to page for and how people that are paged will know what’s the issue / solve the issue
- If you page for things humans have no control, you are going to be hated by your team
- Metrics are good for an overview of the health of your system, they are not a replacement for decent logging.