ResponsibilitiesThis is a key role that should have the engineering knowledge, production experience, and hands-on implementation ability.
You will contribute in areas such as:
Ensure the highest levels of our system performance, availability, and scalability.
Work closely with the development team to integrate new deployment processes and strategies.
Seek out problems and opportunities in devops enablers infrastructure areas and solve them.
Help develop and maintain a state of the art platform as a service solution, using the latest and greatest technologies and approaches (e.g.
Kubernetes, Docker, Microservices, etc.)
Help develop the best possible continuous delivery pipelines supporting features like automated promotion to production, automated canary releasing, or blue-green deployments.
Implement monitoring and logging solutions that enable the production systems to be monitored 24/7.
Respond to requests from engineering by building self-service solutions.
Make sure that any tech solution that you put in place is robust, will scale, and that failover/BCP systems are in place.
Implement robust security measures for infrastructure, including monitoring and responding to attacks on our systems.
Able to guide other SRE members on large, complex projects.
Work collaboratively with the engineering team, give technical solutions, or accept challenges from them.
RequirementsStrong computer engineering foundation from work and related academic degrees.
5+ years of experience in IT Operations, infrastructure, and system engineering.
Must have experience in maintaining Data Center and managing large networks.
Hands-on experience with containerization and container orchestration (e.g.
Docker and Kubernetes).
Must have experience in Linux System Administration, performance tuning in RHEL/CentOS/Debian/Ubuntu distributions.
Must have experience in load balancers, clusters, and failover technologies.
Must have experience in CD, CI, and configuration management tools such as Jenkins, GitLab, Ansible.
Must have experience in scripting skills in Bash.
Must have experience in configuring service discovery, cloud and non-cloud based monitoring tools (Consul, Nagios/Icinga, Cacti, Stackdriver, New Relic).
Experience in designing failover clusters using Nginx, Varnish, MongoDB, Postgres, TimescaleDB, Redis, Kafka, and ElasticSearch will be a plus.
Hands-on experience with AWS or GCP will be a plus.
Strong team player with the capability to learn, communicate, and guide other members on new technology.#J-18808-Ljbffr