This role sits within the Site Reliability Engineering team and is part of the wider Crisp Technical Services business unit. Our suite of SaaS, distributed systems and product integrations help our internal stakeholders run their critical business operations and provide customers in turn with industry leading threat detection technology products. You’ll play a key role in the formation of a new area within Crisp: that aims to drive operational excellence and customer focus into the operation of our SaaS hosted application suite.
As a Cloud Engineer, you will be using your skills and expertise on cloud platforms to maintain and improve our cloud infrastructure running on the Google Cloud platform, orchestrate deployments and support our industry leading SaaS solution. As part of the SRE team you will be an integral part of ensuring our platforms are highly available and resilient, through continual monitoring and providing improvement suggestions. You will work closely with engineering teams in Development and Delivery to uphold contracted Service Level Objectives (SLOs). You will be tasked with ensuring our internal and externally available systems have reliability, and uptime appropriate to user needs.
ROLE DUTIES AND REQUIREMENTS
- Work in a team to provide third line support for the infrastructure and application
- Take responsibility, ownership, and coordinate fault resolution. Work alongside a team of engineers where necessary to fix faults that are raised against the supported elements, networks, or applications,
- Own the deployment process to provide regular service improvements delivered by our engineering team
- For service impacting incidents lead the investigation into the RCA, producing any reports and co-ordinating the delivery of any fixes to mitigate further occurrences
- Use and maintain our monitoring platforms
Essential Experience
- Experience in working in a distributed, cloud environment using Azure/AWS/GCP
- Excellent fault finding ability
- Linux (Debian/Ubuntu)
- Windows Server
- SQL
- Software Defined Networking
- Cloud & Platform Security
- Monitoring Solutions
- Incident Management
Desirable Experience
- AWS/Google Cloud Certifications
- Release and Deployment Tooling
- New Relic
- Octopus Deploy
- Elasticsearch
- Consul
- Docker