Chaos Engineering Specialist

Details of the offer

We are looking for a skilled and passionate Platform Engineer with expertise in Chaos Engineering and resiliency testing.
The ideal candidate will have a strong background in distributed systems, cloud infrastructure, and container orchestration.
You will be responsible for designing, implementing, and managing chaos experiments to test the resilience of our platform.
Your work will directly contribute to our platform's ability to withstand and recover from unexpected failures, ensuring continuous and reliable service for our clients.
ResponsibilitiesDevelop and implement chaos engineering strategies to test the resilience of our platform infrastructure.
Design, execute, and automate chaos experiments using tools such as Gremlin, Chaos Mesh, Litmus, or similar.
Collaborate with platform engineering and DevOps teams to identify critical systems and components for testing.
Build and maintain a robust monitoring and observability framework to analyze the impact of chaos experiments.
Identify weaknesses in the current infrastructure and provide recommendations for improvement.
Integrate chaos engineering practices into CI/CD pipelines using GitOps tools like ArgoCD and Atlantis.
Contribute to the development and maintenance of Kubernetes clusters, AWS EMR, AWS MSK Kafka, and VSphere environments.
Utilize Terraform for infrastructure as code (IaC) to manage cloud resources.
Participate in on-call rotation and assist in incident management and root cause analysis.
Stay up to date with the latest trends and best practices in chaos engineering, resiliency testing, and cloud infrastructure.
Functional CompetenciesStrong understanding of Kubernetes, Docker, and container orchestration.
Proficiency in AWS services, including EMR, MSK Kafka, and experience with VSphere.
Experience with infrastructure as code (IaC) tools, particularly Terraform.
Familiarity with GitOps practices and tools such as ArgoCD and Atlantis.
Hands-on experience with chaos engineering tools (e.g., Gremlin, Chaos Mesh, Litmus).
Solid understanding of distributed systems, microservices architecture, and cloud-native technologies.
Excellent problem-solving skills and a proactive approach to identifying and addressing potential issues.
Strong communication skills and the ability to work effectively in a collaborative team environment.
Qualifications & ExperienceMinimum Qualifications
Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
3+ years of experience in platform engineering, site reliability engineering (SRE), or DevOps roles, with a focus on chaos engineering.#J-18808-Ljbffr


Nominal Salary: To be agreed

Source: Whatjobs_Ppc

Requirements

Salesforce Developer Admin (Remote)

Salesforce Developer Admin | Lot Squared Development | Malaysia Salesforce Developer/Admin 100% remote Position Type: Full-time Work Hours: 9am to 6pm US EST...


Lot Squared Development - Malasia

Published a month ago

Computer Operator

Computer OperatorJob No.:499028 Employment Type:Full time Departments:Information Technology Department Job Functions:Information Technology Job Description:...


Bank Of China - Malasia

Published a month ago

Azure Cloud Developer & Devops Engineer (12 Months Contract With Agency)

Job Description - Azure Cloud Developer & DevOps Engineer (12 months Contract with Agency) (MER0003FU3) Azure Cloud Developer & DevOps Engineer (12 months Co...


Daimler Ag (Canada) - Malasia

Published a month ago

My-Technical Specialist

Choose a language(This will update the page.) Do you love how it feels to help others? After customers purchase our products, you're the one who helps them g...


Apple Inc. - Malasia

Published a month ago

Built at: 2025-01-05T00:40:42.500Z