Cyberjaya, Malaysia | Posted on 17/05/2024
As a Principal Engineer - Disaster Recovery at Deriv, you'll play a critical role in ensuring the resilience and continuity of our trading platform in the face of potential disruptions and disasters. This position requires a deep understanding of disaster recovery principles, technologies, and challenges specific to the trading industry.
Develop and execute comprehensive disaster recovery strategies tailored to the unique needs and challenges of the trading industry.
Identify and prioritise critical systems, applications, and data streams for rapid recovery in the event of disruptive incidents.
Design, implement, and maintain robust disaster recovery solutions leveraging cutting-edge technologies and best practices.
Collaborate with internal teams to integrate disaster recovery capabilities seamlessly into our trading platform architecture.
Conduct thorough risk assessments to identify potential vulnerabilities and single points of failure within our infrastructure and operations.
Implement proactive measures and safeguards to mitigate risks and enhance overall resilience.
Plan, coordinate, and execute regular disaster recovery testing exercises to validate the effectiveness and reliability of recovery procedures.
Optimise testing methodologies to simulate real-world scenarios and uncover potential weaknesses. Lead incident response efforts during crisis situations, providing technical expertise and guidance to minimise downtime and mitigate the impact.
Requirements 15+ years of experience in disaster recovery planning, implementation, and management, preferably within the trading or financial services industry
Bachelor's or Master's degree in Computer Science, Information Technology, or related field
Deep technical expertise in infrastructure architecture, high availability systems, data replication, and disaster recovery technologies
Proven track record of designing and implementing scalable disaster recovery solutions in complex, high-volume trading environments
Excellent problem-solving skills with the ability to anticipate and address technical challenges proactively
Strong communication and collaboration skills, with the ability to effectively engage with cross-functional teams and senior stakeholders
Deep knowledge of the following technologies:
- Servers: Linux, Windows, IIS, and SQL
- Cloud: AWS and GCP
- Database: Postgres and MySQL
- Monitoring : Datadog and Grafana
- AWS Fault Injection Service, AWS S3, Route53, EC2, Security Group
- Chef
- Load testing
- Terraform
- Languages: Perl, Python, and Go
Excellent spoken and written English communication skills
What's good to have Familiarity with International Standards ISO 27001, 22301, NIST SP-800, and FFIEC
Certifications related to Disaster Recovery and Business Continuity (CBCP, CBCI, CDRE, ITIL4, etc.)
Knowledge of Kubernetes and microservices
The best workplace you can possibly imagine — agorgeous 5-storey building including a rooftop garden, a gym,squash court, yoga room, barbecue pit, jam studio, and a lotmore!
A chance to work with toptalent from across the globe (70+ nationalities)
Ample team-building and bondingactivities
Great overseas travelopportunities
Competitive salary andannual performance bonus
#J-18808-Ljbffr