Incident Manager

ID
2025-2377
Job Locations
US
Category
Information Technology
Type
Full Time

Overview

HaloMD

 

Who We Are

HaloMD is a leader in Independent Dispute Resolution (IDR) under The No Surprises Act and state regulations, empowering out-of-network healthcare providers with cutting-edge technology and industry expertise to maximize reimbursements.

 

Job Summary

The Incident Manager is responsible for coordinating the real-time response to incidents and maintaining production change integrity across data, applications, automations, and infrastructure. This role ensures disruptions are handled quickly and effectively while upholding operational standards and business continuity. Outside of incidents, this person works closely with business and technical stakeholders to monitor high-impact areas, improve operational readiness, and ensure that all production changes follow governance and compliance requirements. 

Responsibilities

Essential Job Duties and Responsibilities

 

Incident Management

  • Coordinate the live response to incidents impacting data systems, applications, infrastructure, automations, and network operations.
  • Evaluate incident severity, align appropriate resources (including security or networking stakeholders), and manage resolution efforts.
  • Support triage of basic network alerts and connectivity issues in coordination with infrastructure teams.
  • Ensure accurate documentation of incident activity, response steps, and decision points.
  • Strong understanding of operational SLAs, incident classification criteria, and escalation protocols

 

Root Cause & Remediation

  • Lead post-incident reviews and ensure root causes are identified with assigned corrective actions.
  • Track remediation progress and collaborate with teams to prevent recurrence.
  • Analyze incident patterns and recommend improvements to reduce impact and frequency.
  • Partner with security and infrastructure teams when root causes involve access controls, network paths, or unauthorized activity.

 

 

Communication & Documentation

  • Communicate clearly and promptly with internal stakeholders during active incidents, including escalations related to network or security.
  • Maintain and update incident response procedures, SLAs, runbooks, and operational documentation in tools like Confluence.
  • Ensure historical records of incidents are complete for audit, compliance, and trend analysis.

 

Change Management

  • Review and validate production change requests to confirm all requirements and safeguards are in place.
  • Maintain a current and complete change log across environments, including changes impacting network routing or firewall rules.
  • Collaborate with DevOps, data, application, and infrastructure teams to reduce deployment risk and improve release consistency.

 

Continuous Improvement & Tooling

  • Evolve and enforce standards for incidents and change processes across the technology landscape.
  • Manage and enhance tooling that supports incident response and change control (e.g., Jira, Grafana, network monitors, or endpoint detection tools).
  • Partner with teams to improve observability, alerting, and resilience across systems, with some awareness of network health and endpoint security triggers.

 

Security & Network Awareness

  • Basic familiarity with incident types involving endpoint protection, identity access, or firewall policy violations.
  • Comfortable coordinating with security analysts during potential data loss, suspicious login activity, or threat detection alerts.
  • Awareness of common networking protocols and how they impact system availability or user experience during outages.

Qualifications

Education and/or Experience

  • 2–4 years of experience in incident management, production operations, or change governance.
  • Strong background across data, application, infrastructure, or automation platforms.
  • Familiarity with observability tools and incident management platforms.
  • Ability to analyze logs, identify issues using monitoring data (familiarity with Grafana or similar tool), and write or interpret SQL.
  • Experience applying ITIL, SRE, or DevOps principles in real-time operations.
  • Excellent communication skills, especially in time-sensitive scenarios.
  • Detail-oriented and organized with a high sense of ownership.
  • Willingness to support after-hours incident response when needed.
  • Ability and willingness to learn end-to-end business processes, which are essential to effectively supporting and executing responsibilities in this role.
  • Preferred: Awareness of basic security incident workflows, common network protocols, and coordination with infrastructure or security teams during cross-domain events.

 

Perks & Benefits:

  • Remote & Hybrid opportunities – Work from anywhere within the United States with reliable high-speed internet
  • Multiple medical plan options
  • Health Savings Account with company contributions
  • Dental & vision coverage for you and your dependents
  • 401k with Company match
  • Vacation, sick time & Company paid holidays
  • Company wellbeing program with health insurance incentives

 

What’s Next?

If you’re ready to bring your skills and passion to our growing team, we want to hear from you! Apply today and help us create a future where success is the standard. 

 

#IND123

Options

Sorry the Share function is not working properly at this moment. Please refresh the page and try again later.
Share on your newsfeed