Site Reliability Engineer (Sre)

Details of the offer

Engineering - Software (Information & Communication Technology)
As a Site Reliability Engineer (SRE), your role is to support and enhance the reliability and performance of vital services, bridging development and operations. You'll contribute to building a stable, scalable infrastructure, emphasizing solid system architecture and SRE best practices like Service Level Objectives (SLOs), Service Level Indicators (SLIs), and reducing manual operational tasks. This position involves collaborating with cross-functional teams to promote continuous learning and accountability while driving reliability improvements.
Responsibilities:
Architect and deploy robust systems designed for high availability and scalability.
Develop automation scripts and tools to streamline operations and minimize manual intervention.
Set, monitor, and analyze SLOs and SLIs to ensure systems align with business performance standards.
Perform in-depth post-incident reviews to identify root causes and implement improvement measures.
Work with development and operations teams to establish reliable system practices and effective incident management strategies.
Diagnose and resolve issues related to databases, network connectivity, and deployments, including underlying platform issues (e.g., Kubernetes, virtual machines).
Ensure adherence to Service Level Agreements (SLAs), maintaining high standards in service delivery.
Identify and resolve system performance bottlenecks, offering recommendations for optimization.
Qualifications:
Minimum of 3 to 8 years of experience in IT and around 1 year relevant experience in SRE
Proficient in languages like Python, Golang, or Java, with a focus on improving operational processes.
Proven experience in system design and architecture, emphasizing reliability and scalability.
Strong knowledge of SRE practices, including SLOs, SLIs, toil reduction, and post-incident analysis.
Familiarity with cloud environments (e.g., AWS, Azure, Google Cloud) and their management.
Solid background in Linux system administration.
Skilled in diagnosing and troubleshooting performance and connectivity issues.
Understanding of networking principles and effective troubleshooting strategies.
Excellent analytical skills and a proactive approach to operational challenges.
Able to work independently while collaborating effectively within a team setting.
Preferred Qualifications:
Experience with monitoring tools and performance optimization.
Skilled in scripting or automating administrative tasks.
Knowledge of networking principles and troubleshooting techniques.
Hands-on experience with cloud platforms (e.g., AWS, Azure, Google Cloud).
Familiar with DevOps frameworks and practices, including CI/CD, infrastructure as code, and containerization.#J-18808-Ljbffr


Nominal Salary: To be agreed

Source: Whatjobs_Ppc

Requirements

Senior .Net Developer

Job Description General Responsibilities • Support superior in managing day-to-day operating expenditures against operating budgets to ensure efficient usage...


Flintex Consulting Pte Ltd - Kuala Lumpur

Published a month ago

Junior Software Engineer

Job Brief: This department plays a crucial role in maintaining the company's competitive edge and ensuring its long-term growth and sustainability. Responsib...


Scicom - Kuala Lumpur

Published a month ago

Tier/Level 3 - Support Services Engineer

About Ekco Founded in 2016 Ekco is now one of the fastest growing cloud solution providers in Europe! We specialise in enabling companies to progress along t...


Ekco - Kuala Lumpur

Published a month ago

Senior Software Engineer

Our Vision Rotate is the go-to team in the cargo industry for commercial decision-making tools and solutions. We help airlines turn data into action and impr...


Rotate - Kuala Lumpur

Published a month ago

Built at: 2024-12-26T13:08:33.546Z