Required Skills : 4+ years of running services in a large scale *nix environment. Understanding of SRE principles and goals along with good Oncall experience Experience and understanding on Scaling, Capacity Planning and Disaster Recovery Fast learner with excellent analytical problem solving and communication skills The ability to design, author, and release code in any language (Python, Java would be a plus) Deep understanding and experience in administration & usage of Apache Druid at scale. Deep understanding and experience in one or more of the following - Kubernetes, AWS, Hadoop, Flink, Docker, Spinnaker, Helm Experience working on supporting Java applications is a plus Experience using monitoring and logging solutions like Prometheus, Grafana, Splunk etc.