P-9025914 Site Reliability Engineer, Principal
P-9025914 Site Reliability Engineer, PrincipalAt AIA we've started an exciting movement to create a healthier, more sustainable future for everyone.
As pioneering innovators for over 100 years, we're now transforming our organisation to be faster, simpler and more connected. Because we want to be even better equipped to develop digital solutions and experiences that help more people live Healthier, Longer, Better Lives.
About the Role
System Reliability Engineer (SRE) to ensure that our cloud application systems are reliable and available to users. The SRE will monitor application systems and establish automated detections, root cause analysis, and formulate preventive actions. They will gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding. They will partner with development teams to improve services.
Functional Duties:
Setup and maintain monitoring of infrastructure and application
Build alerts and auto recovery for various operational issues
Gather and analyze metrics from operating systems as well as applications
Advise in performance tuning and fault finding
Partner with development teams to improve services
Assist formulating preventive actions where possible, lead potential failure scenarios studies and formulate automated recovery methods
Comfortable with working on new tools eg; Azure DevOps, Grafana, ELK, Dynatrace and etc
People Management Duties:
Train and coach other consultants & teammates on your specialties
Be the advisor toward applications and assist application team establish recovery processes
Requirements:
Programming Languages: Java 8 or above (must have)
Experience in developing and optimizing stored procedures for MySQL and MSSQL databases
OS: Linux(RHEL or SUSE) or Windows Server
Scripting(Must have either 1): Shell, Bash, Powershell
Knowledge in open-source distributed version control system, git
Sound knowledge of how REST API works
Experience in Atlassian tools (Jira, Bitbucket, & Confluence)
Familiarity with Azure Cloud services
Working experience with ITIL in Agile environment
Good to have:
Experience with Python programming language
Experience with containerization (Docker, AKS, ACR, EKS, ECS)
Experience in CICD with Azure DevOps
Experience in Dashboard development with Grafana, Azure Monitor, or Dynatrace
Experience in infrastructure management with Terraform or Ansible
Experience with Azure or AWS cloud certification would be an added advantage
Build a career with us as we help our customers and the community live Healthier, Longer, Better Lives.
You must provide all requested information, including Personal Data, to be considered for this career opportunity. Failure to provide such information may influence the processing and outcome of your application. You are responsible for ensuring that the information you submit is accurate and up-to-date.#J-18808-Ljbffr