Fortive Jobs

Site Reliability Engineer

Fortive

Site Reliability Engineer

Reposted 15 Days Ago

In-Office or Remote

Hiring Remotely in India

Senior level

In-Office or Remote

Hiring Remotely in India

Senior level

This role involves managing and automating infrastructure, deploying applications, ensuring system availability on cloud platforms, and collaborating with cross-functional teams.

The summary above was generated by AI

Position: Site Reliability Engineer

Job Summary

As a Site Reliability Engineer, you will play a critical role in ensuring the availability and performance of our customer-facing platform. You will work closely with DevOps, DBA, and Development teams to provision and maintain infrastructure, deploy and monitor our applications, and automate workflows. Your contributions will have a direct impact on customer satisfaction and overall experience.

Responsibilities and Deliverables

Manage, monitor, and maintain highly available systems (Windows and Linux)
Analyze metrics and trends to ensure rapid scalability.
Address routine service requests while identifying ways to automate and simplify.
Create infrastructure as code using Terraform, ARM Templates, Cloud Formation.
Maintain data backups and disaster recovery plans.
Design and deploy CI/CD pipelines using GitHub Actions, Octopus, Ansible, Jenkins, Azure DevOps.
Adhere to security best practices through all stages of the software development lifecycle
Follow and champion ITIL best practices and standards.
Become a resource for emerging and existing cloud technologies with a focus on AWS.

Organizational Alignment

Reports to the Senior SRE Manager
This role involves close collaboration with DevOps, DBA, and security teams.

Technical Proficiencies

Hands-on experience with AWS is a must-have.
Proficiency analyzing application, IIS, system, security logs and CloudTrail events
Practical experience with CI/CD tools such as GitHub Actions, Jenkins, Octopus
Experience with observability tools such as New Relic, Application Insights, AppDynamics, or DataDog.
Experience maintaining and administering Windows, Linux, and Kubernetes.
Experience in automation using scripting languages such as Bash, PowerShell, or Python.
Configuration management experience using Ansible, Terraform, Azure Automation Run book or similar.
Experience with SQL Server database maintenance and administration is preferred.
Good Understanding of networking (VNET, subnet, private link, VNET peering).
Familiarity with cloud concepts including certificates, Oauth, AzureAD, ASE, ASP, AKS, Azure Apps, Load Balancers, Application Gateway, Firewall, Load Balancer, API Management, SQL Server, Databases on Azure

Experience

5+ years of experience in SRE or System Administration role
Demonstrated ability building and supporting high availability Windows/Linux servers, with emphasis on the WISA stack (Windows/IIS/SQL Server/ASP.net)
3+ years of experience with CI/CD tools
3+ years of experience working with cloud technologies including AWS, Azure.
1+ years of experience working with container technology including Docker and Kubernetes.
Comfortable using Scrum, Kanban, or Lean methodologies.

Education

Bachelor’s Degree or College Diploma in Computer Science, Information Systems, or equivalent experience.

Similar Jobs

Arrow Electronics, Inc.

Senior Site Reliability Engineer

2 Days Ago

In-Office or Remote

Senior level

Cloud • Enterprise Web • Hardware • Information Technology • Internet of Things • Robotics • Semiconductor

Build, automate, and operate a global cloud platform: develop automation in Go/Python, manage large-scale EKS clusters (Karpenter), author Terraform and Helm IaC, lead incident response and post-mortems, define SLIs/SLOs, implement observability (Datadog/Prometheus/Grafana) and PagerDuty on-call, and develop secure self-service tools to meet SOC2. Night-shift role based in Ahmedabad, India.

Top Skills: Amazon EksAWSCachingDatadogDynamoDBGoGrafanaHelmKafkaKarpenterKubernetesMskPagerdutyPrometheusPythonTerraform

AT&T

Site Reliability Engineer

3 Days Ago

In-Office or Remote

Senior level

Internet of Things • Mobile • Retail

Lead platform reliability and DevOps automation: implement CI/CD with GitHub Actions, automate JFrog/Helm and image migrations, enable microservices deployments, and operate observability and logging stacks. Provide Tier 3 troubleshooting and incident leadership, manage cloud infrastructure governance, capacity and DR planning, cost/license governance, and maintain SOPs and reliability best practices.

Top Skills: AirflowAlertmanagerApache FlinkAws MskAzure Container Registry (Acr)Azure Event HubAzure Kubernetes Service (Aks)Azure MonitorConfluent CloudConfluent KafkaFluentbitGithub ActionsGrafanaHelmJavaJfrogKubernetesOpensearchPostgresPrometheusPythonReactSpring BootThanos

AT&T

Site Reliability Engineer

4 Days Ago

In-Office or Remote

Expert/Leader

Internet of Things • Mobile • Retail

Lead SRE/Platform Engineer responsible for platform reliability, incident escalation, mentoring Tier 2/3 engineers, driving CI/CD and automation, owning observability and streaming stack governance, cloud governance, capacity and DR planning, and platform tooling/upgrade roadmaps.

Top Skills: Ai-Assisted Productivity ToolsAirflowAlertmanagerApache FlinkAws MskAzure Container Registry (Acr)Azure Event HubAzure Kubernetes Service (Aks)Azure MonitorCi/CdConfluent CloudConfluent KafkaFluentbitGithub ActionsGrafanaHelmJavaJfrogKubernetesOpensearchPostgresPrometheusPythonReactSpring BootThanos

What you need to know about the Chennai Tech Scene

To locals, it's no secret that South India is leading the charge in big data infrastructure. While the environmental impact of data centers has long been a concern, emerging hubs like Chennai are favored by companies seeking ready access to renewable energy resources, which provide more sustainable and cost-effective solutions. As a result, Chennai, along with neighboring Bengaluru and Hyderabad, is poised for significant growth, with a projected 65 percent increase in data center capacity over the next decade.