Are you ready to grow your career in the cloud? Do you like the feeling that you are making a difference? This is your chance to be an integral part of a dynamic team of talented professionals deploying and maintaining innovative, industry-leading, cloud-based software.
Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. This technical role is focused on deploying, maintaining, and automating a wide range of operational tasks for the Instana observability and application performance monitoring (APM) tool’s Software as a service (SaaS) environments on AWS, Google Cloud and IBM Cloud. You will work collaboratively with the entire cloud organization and IBM vendors to support, maintain, and operationally improve the availability and reliability of the Instana offerings.
Your Role and Responsibilities
Instana is a leading observability and application performance monitoring (APM) tool. Our mission is to be the best-in-class tool for cloud-native microservice architecture observability. To achieve this, we receive and process billions of data points every day. We are looking for an experienced engineer to join our globally distributed Site Reliability Engineering (SRE) team that operates Instana’s SaaS platforms.
As a member of the Instana SRE team, you will:
Required Technical and Professional Expertise
You should demonstrate a mix of experience and skills in following areas:
5+ years of software development, software engineering and/or system operations experience supporting cloud offerings
System administration/engineering experience (Ubuntu and RedHat)
Experience with at least one of these datastores:
Experience with at least one of these clouds:
Experience with cloud technologies such as Docker, Kubernetes, and Open Shift
Experience with infrastructure as code and configuration management tools (e.g. Terraform, Chef, Ansible)
Approach troubleshooting systematically and have a deep sense of ownership for your work
Passion for resolving reliability issues and identify strategies to mitigate going forward
Preferred Technical and Professional Expertise
In addition knowledge/experience in any of the following would be an advantage:
Experience with DevOps engineering or SRE
Networking (HTTP, Cloudflare, TLS, Akamai, DNS) to troubleshoot network and load balancer issues.
Source control (Git, GitHub) and CI/CD pipeline (Jenkins)
Software development experience (Golang and Java preferred)
Experience with developing monitoring for production components and instrumenting code for observability using Instana or LogDNA.
Motivated to learn new technologies
Strong verbal and communication skills
Capability to work in a global, multicultural and diverse environment