Instana is seeking experts to lead & provide architecture consultancy, sizing advisory and installation efforts for large scale ingress K8s platforms for our largest accounts around the globe. In this role you will be working with the largest clients in need of running a self-hosted offering. You will help around architecture, sizing, installation and a hand-over towards customer or Instana/IBM operations teams. In addition to working directly with customers and Technical Account Managers, you will work closely with the Technical Support team on escalated issues and share knowledge and best practices as well as with the Engineering and Product Management teams to collaborate on how these systems can be further improved and extended.

Areas of Responsibility Include:

  • Ensure large scale on-prem self-hosted Instana backends are designed, installed and functioning optimally before handing over to operations
  • Work with new customer SRE teams to define the architecture and setup the platform
  • Instana Agent roll-out automation and enterprise readiness planning
  • Develop, setup and optimize self-monitoring for self-hosted systems
  • Improve and develop best practices and documentation for large scale setups
  • Execute updates cycles of the Instana backend and datastores
  • Facilitate migrations and updates of or to distributed datastores
  • Maintain and develop tools for automation scripts and ease of installation
  • Assist in troubleshooting priority incidents and perform detailed root cause analysis

Skills and Experience Needed:

  • Strong written and verbal communication skills
  • Experience with distributed data stores and queues such as Cassandra, Elasticsearch, Zookeeper, ClickHouse and Apache Kafka is preferred.
  • Experience with K8s and Docker and components involving networking, storage, firewalls, etc.
  • Familiarity with: Java, Golang
  • Excellent skills debugging cloud based distributed systems
  • Experience with container orchestration in micro service architectures
  • Experience with Jenkins CI/CD pipelines and Git proficiency
  • Broad cloud provider experience (AWS, GCP, Azure, IBM Cloud)
  • Experience with Infrastructure as Code, e.g.CloudFormation, Pulumi, or Terraform
  • Experience with Configuration management tooling, e.g. in Ansible, Chef, Puppet or Salt
  • Experience and familiarity with APM products and services is highly preferred
  • 2+ years experience in a Site Reliability Engineering or DevOps environment
  • Bachelor of Science degree in Computer Science or other related technical discipline