MSys Inc. Plano, AZ
Job summary:
Title:
Site Reliability Engineer - Hybrid
Location:
Plano, TX, USA, Chicago, IL and Chandler, AZ
Length and terms:
Long term - W2
Position created on 07/01/2025 06:52 pm
Job description:
*** Webcam interview; *** Long term project ***Hybrid***W2
Description:
+ Responsible for reliability and support of Container Platform on prem and external clouds (Azure /AWS /Google)
+ Monitor and troubleshoot Container platform (Openshift), Rancher (RKE) and Azure (AKS) environment performance issues, connectivity issues, security issues, etc.
+ Perform deep dives into systemic and latent reliability issues, Incident management, problem management
+ Identifying, analyzing, and resolving infrastructure vulnerabilities and application deployment issues.
+ Perform blameless RCA, partner with engineering and operation teams across the organization to roll out fixes.
+ Responsible for application onboarding and provide troubleshooting support through the lifecycle of the applications on the container platform.
+ Identify and drive opportunities to improve automation to reduce TOIL and improve operational excellence.
+ Partner with risk, and compliance teams to bring visibility and implement right controls and remediation of vulnerabilities.
+ Ensure resiliency during implementation and identify/fix resiliency problems by collaborating with engineering teams.
+ Be a key stakeholder in the design of cloud services and work with Architecture, engineering, product teams
+ Participate in 24x7 oncall coverage follow the sun model
+ BS /MS degree in Computer Science or related technical field involving systems or equivalent practical experience.
+ Minimum 5+ years of handson experience supporting Kubernetes /Openshift / RKE / EKS Container platform.
+ Experience with Python, Ansible, Golang, and shell scripting
+ Kubernetes /Openshift /Terraform certifications are a plus
+ Strong experience in major services related to Compute, Storage, Network and Security
+ Experience with monitoring tools like Prometheus and Dynatrace, as well as cloud native tools like Azure Monitor and Log Analytics
+ Strong understanding and background of working with a complex IAM infrastructure, including Active Directory, Azure AD Connect, Azure AD, and Ping Identity or other SSO solutions.
+ Advanced knowledge of Linux OS, DNS, DHCP, Kerberos and Windows Authentication
+ Experience with CI/CD tools git /Jenkins, GitOps model
+ Excellent understanding of Linux /Windows operating systems administration
+ Experience in Container security and vulnerability remediation.
+ Systematic problem solving approach, sense of ownership and drive
+ Ability to juggle competing priorities and adapt to changes in project scope.
+ Excellent interpersonal, organizational and communication (written, verbal, and presentation) skills are a must.
+ Proven ability to work independently with minimal supervision and as part of a team with direct responsibilities.
+ Experience in Openshift, RKE, CSP Kubernetes services such as AKS and EKS
+ Experience in Terraform, ArgoCD, Tekton, and K native technologies.
+ Experience in agile deployment methodologies (GitOps)
+ Knowledge of various container runtimes
+ Familiarity with the operator deployment pattern.
+ Experience working in a highly available multi datacenter environment
+ Experience working with monitoring tools such as Prometheus, Splunk, Dynatrace, Sysdig, or similar tools.
+ Understanding of cost management, inventory management, FinOps model
Contact the recruiter working on this position:
The recruiter working on this position is Rohit(Shaji Team) Bala
His/her contact number is His/her contact email is rohit@msysinc.com
Our recruiters will be more than happy to help you to get this contract.