Resolver (formerly Crisp)
Provide third-line support for infrastructure and applications within the SRE team. Monitor and maintain cloud-based environments (AWS, Azure, GCP) to ensure high availability and resilience. Respond to alerts from monitoring platforms and follow runbooks/SOPs for resolution. Assist in incident management, including root cause analysis (RCA) and postmortem documentation for P1/P2 incidents. Maintain and tune alerting thresholds to reduce false positives. Contribute to the creation and updating of SOPs/runbooks for repeatable processes. Support BAU activities, such as: Elastic index snapshots Security alert reviews Daily volume alert triage Reprocessing persistent failures
Collaborate with engineering teams to uphold Service Level Objectives (SLOs). Adhere to ITIL practices and assist in problem management processes.
Basic understanding of cloud platforms (AWS, Azure, GCP). Strong troubleshooting and fault-finding skills. Familiarity with monitoring solutions and alert management. Knowledge of incident management processes. Exposure to ITIL-based environments. Willingness to work in a 24/7 operational support model (rotational shifts).