Position: Site Reliability Engineer
Location: Chennai , Bangalore
Exp: 10- 12 Years
Skills : SRE , DataDog, Azure DevOps, Jenkins, Octopus , Cloud , Python

Job Description:

• Ensure smooth production on AWS and GCP by maintaining availability, scalability, and reliability with Kubernetes (GKE), AWS ECS, and cloud-native services, focusing on uptime and availability.
• Build and automate infrastructure with Terraform, CI/CD tools, and Python scripts to reduce manual tasks and optimize error rates and throughput.
• Provide 24×7 on-call support to ensure system availability and quick issue resolution, minimizing MTTR.
• Monitor infrastructure with telemetry, tracking latency, error rates, and other SLIs to ensure seamless operations.
• Improve system performance by analyzing metrics from OS, containers, APIs, and apps to address issues early, focusing on response times and resource usage.
• Automate deployments using CI/CD, ensuring performance, compliance, and cost efficiency.
• Plan immutable infrastructure deployments with automated pipelines, ensuring low latency and cost optimization.
• Collaborate with teams (.NET, Java, APIs, Python) to optimize testing and automate deployments for reliable releases, managing error budgets.
• Design scalable systems, manage platforms for high demand, and monitor capacity and throughput.
• Automate processes for efficiency and resource management, reducing saturation. Ensure binaries, configurations work across environments, focusing on scalability.
• Balance feature development with system stability by managing SLOs and error budgets.
• Experience managing cloud infrastructure on AWS, GCP, and Kubernetes with a focus on scalability and SLO-driven performance.
• Proficiency with tools like DataDog, Azure DevOps, Jenkins, and Octopus for code deployment and monitoring throughput and latency.
• Strong background in software development, test automation, and Infra-as-code (Terraform) for efficient deployments.
• Expertise in Python, .NET, or Java for automating tasks and optimizing performance and latency.
• Familiarity with distributed storage systems, handling RPA toolsets, large datasets, focusing on cost efficiency and data throughput.
• Experience with Kubernetes and AWS/GCP services for resource management and resource usage.
• Proactive in identifying bottlenecks, troubleshooting, and improving system performance.
• Ability to design scalable systems to support business growth, ensuring SLO adherence.

Interested candidate share your resume at rubi.jena@mnrsolutions.in

#SiteReliabilityEngineer #SRE #CloudComputing #AWS #GCP #Kubernetes #DataDog #AzureDevOps #Terraform #Python #DevOps #CI_CD #Jenkins #Octopus #InfrastructureAsCode #Automation #Monitoring #CloudInfrastructure #SoftwareDevelopment #Chennai #Bangalore #JobOpening #TechJobs #Hiring #ITCareers

Site Reliability Engineer

Apply for this position

Our Services

Our Industry

Quick Links

Contact Us