The Senior IO Engineering Analyst manages incident queues, monitors infrastructure, troubleshoots compute and network issues, and coordinates escalations while ensuring service continuity and compliance with SLAs.
Requisition Number: 2352102
Optum is a global organization that delivers care, aided by technology to help millions of people live healthier lives. The work you do with our team will directly improve health outcomes by connecting people with the care, pharmacy benefits, data and resources they need to feel their best. Here, you will find a culture guided by inclusion, talented peers, comprehensive benefits and career development opportunities. Come make an impact on the communities we serve as you help us advance health optimization on a global scale. Join us to start Caring. Connecting. Growing together.
The IO Engineer - Senior Analyst is responsible for proactive queue management, infrastructure monitoring, and L1.5/L2 incident handling across enterprise infrastructure environments.
This role serves as a critical operational backbone-triaging alerts, managing incident queues, performing initial diagnostics, and coordinating escalations across compute, virtualization, and basic network domains. The position ensures service continuity, SLA adherence, and accurate problem escalation to engineering teams.
The ideal candidate has solid foundational experience with Windows/Linux systems, virtualization technologies, basic networking concepts, and hands-on exposure to enterprise monitoring platforms such as WhatsUp Gold, SolarWinds, SCOM, and Nagios. Success in this role requires solid operational discipline, analytical thinking, and effective communication in a 24x7 production environment.
Primary Responsibilities:
Required Qualifications:
Preferred Qualifications:
At UnitedHealth Group, our mission is to help people live healthier lives and make the health system work better for everyone. We believe everyone-of every race, gender, sexuality, age, location and income-deserves the opportunity to live their healthiest life. Today, however, there are still far too many barriers to good health which are disproportionately experienced by people of color, historically marginalized groups and those with lower incomes. We are committed to mitigating our impact on the environment and enabling and delivering equitable care that addresses health disparities and improves health outcomes - an enterprise priority reflected in our mission.
Optum is a global organization that delivers care, aided by technology to help millions of people live healthier lives. The work you do with our team will directly improve health outcomes by connecting people with the care, pharmacy benefits, data and resources they need to feel their best. Here, you will find a culture guided by inclusion, talented peers, comprehensive benefits and career development opportunities. Come make an impact on the communities we serve as you help us advance health optimization on a global scale. Join us to start Caring. Connecting. Growing together.
The IO Engineer - Senior Analyst is responsible for proactive queue management, infrastructure monitoring, and L1.5/L2 incident handling across enterprise infrastructure environments.
This role serves as a critical operational backbone-triaging alerts, managing incident queues, performing initial diagnostics, and coordinating escalations across compute, virtualization, and basic network domains. The position ensures service continuity, SLA adherence, and accurate problem escalation to engineering teams.
The ideal candidate has solid foundational experience with Windows/Linux systems, virtualization technologies, basic networking concepts, and hands-on exposure to enterprise monitoring platforms such as WhatsUp Gold, SolarWinds, SCOM, and Nagios. Success in this role requires solid operational discipline, analytical thinking, and effective communication in a 24x7 production environment.
Primary Responsibilities:
- Queue Management & Incident Coordination
- Monitor, triage, and manage daily incident, request, and problem queues
- Validate ticket severity, categorization, prioritization, and routing to support SLA compliance
- Provide L15/L2-level troubleshooting for compute, virtualization, and basic network issues
- Escalate high-severity or complex issues to L2/L3 engineering teams following defined runbooks
- Maintain clear, timely status updates to stakeholders during active incidents
- Ensure ticket quality, documentation accuracy, and proper closure
- Monitoring & Event Response
- Continuously monitor infrastructure alerts and dashboards using enterprise tools including:
- WhatsUp Gold
- SolarWinds
- System Center Operations Manager (SCOM)
- Nagios
- Perform initial diagnostics for alerts related to:
- Server health
- Virtual machines
- Basic network performance
- Review logs, system metrics, and health indicators prior to escalation
- Identify recurring alerts and patterns and recommend alert tuning or noise reduction opportunities
- Escalate validated issues with appropriate diagnostic data and context
- Continuously monitor infrastructure alerts and dashboards using enterprise tools including:
- Compute Support (Windows / Linux)
- Validate OS-level health indicators including:
- CPU, memory, disk utilization
- Services and processes
- Assist with operational tasks such as:
- VM restarts (per runbook)
- Basic patch validation
- Log and diagnostic data collection
- Perform initial troubleshooting of:
- Service failures
- Access issues
- Account-related incidents
- Validate OS-level health indicators including:
- Virtualization & Platform Support
- Monitor and validate VMware platform alerts such as:
- VM unresponsiveness
- Datastore space warnings
- vCenter or ESXi connectivity issues
- Support basic VM lifecycle activities under documented procedures
- Capture and provide diagnostic artifacts for virtualization-related escalations
- Monitor and validate VMware platform alerts such as:
- Basic Network Troubleshooting
- Perform preliminary network diagnostics including:
- Ping
- Traceroute
- Nslookup
- Port connectivity checks
- Identify symptoms of connectivity, firewall, or routing issues
- Escalate efficiently with clear problem descriptions and diagnostic evidence
- Collaborate with network teams during active incident resolution
- Perform preliminary network diagnostics including:
- Operational Excellence
- Follow ITIL-aligned processes for Incident, Change, and Problem Management
- Maintain accurate documentation:
- SOPs
- Runbooks
- Escalation paths
- Participate in 24x7 operations and rotational shift schedules as required
- Contribute to continuous improvement efforts, including:
- Alert reduction
- Workflow optimization
- Queue hygiene and operational efficiency
- Comply with the terms and conditions of the employment contract, company policies and procedures, and any and all directives (such as, but not limited to, transfer and/or re-assignment to different work locations, change in teams and/or work shifts, policies in regards to flexibility of work benefits and/or work environment, alternative work arrangements, and other decisions that may arise due to the changing business environment). The Company may adopt, vary or rescind these policies and directives in its absolute discretion and without any limitation (implied or otherwise) on its ability to do so
Required Qualifications:
- Bachelor's degree in Computer Science, Engineering.
- 3+ years of experience in:
- Infrastructure monitoring
- Queue management
- Infrastructure or NOC operations
- Experience using ticketing and ITSM tools, including:
- ServiceNow (preferred)
- JIRA or Rally
- Working knowledge of:
- Windows Server fundamentals
- Linux OS fundamentals
- Hands-on exposure to enterprise monitoring tools:
- WhatsUp Gold
- SolarWinds
- SCOM
- Nagios
- Familiarity with VMware vSphere concepts:
- vCenter
- ESXi
- Virtual machines
- Datastores
- Understanding of networking fundamentals:
- TCP/IP
- DNS
- Routing
- Ports and firewalls
- Proven solid analytical thinking, communication, and incident coordination skills.
- Demonstrated ability and willingness to support a global 24x7 production environment
Preferred Qualifications:
- ITIL certification or formal ITIL process exposure
- Exposure to scripting or automation:
- PowerShell
- Python
- Shell scripting
- Familiarity with cloud platforms:
- Microsoft Azure
- AWS
- GCP
At UnitedHealth Group, our mission is to help people live healthier lives and make the health system work better for everyone. We believe everyone-of every race, gender, sexuality, age, location and income-deserves the opportunity to live their healthiest life. Today, however, there are still far too many barriers to good health which are disproportionately experienced by people of color, historically marginalized groups and those with lower incomes. We are committed to mitigating our impact on the environment and enabling and delivering equitable care that addresses health disparities and improves health outcomes - an enterprise priority reflected in our mission.
Top Skills
AWS
Azure
GCP
JIRA
Linux
Nagios
Powershell
Python
Scom
Servicenow
Solarwinds
VMware
Whatsup Gold
Windows
Optum Chennai, Tamil Nadu, IND Office
Chennai, India, India
Similar Jobs at Optum
Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
The Senior Software Engineer I will develop and maintain front-end and back-end systems using React and AWS serverless technologies, ensuring secure data integration and optimal performance.
Top Skills:
Api GatewayAws LambdaCloudFormationDynamoDBGitGithub ActionsJavaScriptNode.jsPythonReactS3TerraformTypescript
Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
The role involves implementing data mapping and transformation, troubleshooting issues, collaborating with stakeholders, and documenting technical processes to improve health outcomes.
Top Skills:
SparkAWSAzureData WarehousingETLHadoopHiveOraclePythonScalaSQLSQL Server
Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
Design and develop software architecture, automate workflows, lead a small team, optimize code, and ensure system scalability and reliability.
Top Skills:
AzureCi/CdDockerGitGrafanaJava 1.8JpaJunitKubernetesMockitoPostgresSplunkSpring BootSpring RestTestng
What you need to know about the Chennai Tech Scene
To locals, it's no secret that South India is leading the charge in big data infrastructure. While the environmental impact of data centers has long been a concern, emerging hubs like Chennai are favored by companies seeking ready access to renewable energy resources, which provide more sustainable and cost-effective solutions. As a result, Chennai, along with neighboring Bengaluru and Hyderabad, is poised for significant growth, with a projected 65 percent increase in data center capacity over the next decade.

