§ What I learnt working for Amazon
- Don’t just restart things to “fix issues”, you need to know what’s wrong, debug the issue, understand the issue.
- Humans should not be paged in the middle of the night if they can’t do anything about the problem
- If you just want someone to babysit an application you are doing something wrong. Please, have proper metrics and automation instead of human beings looking at charts.
- If something goes wrong 0.1% of the time, for something huge as AWS it means several times a day! Fix all the bugs or build automation to automatically recover your software
- If a human has to type a command, one day they will do it wrongly and destroy something
- You automate with scripts (you have code review, you have exactly the same process everytime). You don’t automate humans with documentation
- Code is more important than documentation
- Documentation is more important than tribal knowledge
- If it can wait a few hours, push back!
- You need to understand how Linux/Unix works!!!
- You must know how shellscript works and build a couple yourself. Automate everything!! You need to start your local env every morning? Better have a script for that! You need to run tests before pushing code? Automate it! If you type twice the same set of commands, automate it!
- If you have shellscripts that will help your team, share it!!
- Share your automations!
- Ask people to share their automations!
- Build tools to build automation!
- Build tools to build tools!
- Try first to build frameworks / generic software
- Better to automatically recover a problem than asking a human to evaluate it!
§ [DRAFT] Working for Amazon AWS (2015 - 2019)
§ First Contact
It was January 2015; I was in the middle of my master’s. I had just started working again for CI&T when I received a message from an Amazon recruiter. There would be a hiring event in São Paulo, and they wanted to invite me to the process. To be honest, I was not interested in looking moving to the USA, but working for a big company would allow me to work in a different environment: instead of using frameworks to build services for banks, I would be able to write those frameworks/tools myself.
I decided to try out the tests/interviews.
I ended up passing, and the salary they offered would allow me to live a completely different life. And I don’t mean a life with cars, houses and that type of shit, but rather I would have some real financial independence. I decided to accept the offer.
Due to the fact, the whole USA working visa is random, I ended up not being selected. So Amazon offered me to work in Europe instead (YAY), I could pick Germany or Ireland. I did some research and at that time, it seemed to me that the work in Ireland was more interesting to me.
That’s how I ended up working at the Redshift team.
§ First year
The first year was a hard one, not only I was starting a new job, but I was outside my home country, speaking a foreign language. But at the same time, I was lucky. Almost everybody on my team was new as well. We were all getting used to the whole thing, so we bound. The team on our side was also pretty new, so we end up also spending time outside work together. Those people were amazing and I owe them a lot. Also, during our oncalls I had to deal mostly with hardware/system issues, so I learnt a lot during that time.
On my first two months, our team was basically doing oncall and trying to delivery something between the shifts. Every member was primary oncall for a whole week every month and another week “ops” oncall (what meant work on tasks no one wanted to “keep the lights on”). Oncall was heavy, a couple of tickets per hour, but our shift would last only from 8-16 every day. A bit better than 24 hours setup.
The environment was very nice, 8 hours of work and that’s it, no long hours, no crazy amount of performance review. Everybody was entitle to make mistakes and learn from them.
§ 2017 - 2019
As it’s publicly known, life at Amazon is not always easy. The team on our side had pretty bad managers. People were having a lot of trouble and it was public that the whole team was not happy. Most people end up leaving the company. Including very good friends of mine.
Our team had some problems in the past as well, especially regarding people in the USA. They would see at the beginning our team as “the folks who do oncall during our night” and for a long time, we had to fight for good projects. This made some folks on our team to leave the company or try to find new organizations sinde Amazon.
That changed with upper management, even the product changed. When I joined, the engineers were 100% focused on delivering new features. With the new managers, we started focusing on making Redshift stable first (our oncalls were pretty hard, with hundreds of tickets per week). Also, automation was finally priority and we were given green light to implement our ideas of oncall improvement.
Our local manager was also pretty good so we never had similar issues as the “team on our side”. He was the type of manager that you can trust.
Our team also incresead size and a new office was open in Berlin. I had the opportunity to train some folks there. Oncall was reduced to a few days ever 2 months. Sometimes not even that, since we were onboarding more and more people in Berlin.