Ensure reliability and uptime of gaming platform through SRE practices
About the role
Drive reliability and performance of our gaming platform through SRE practices. You'll implement monitoring, respond to incidents, conduct post-mortems, and continuously improve platform stability.
Responsibilities
- Monitor platform reliability and SLAs
- Respond to incidents and outages
- Implement observability and monitoring
- Conduct post-incident reviews
- Automate operational tasks
- Improve system reliability and performance
- Participate in on-call rotation
- Work with engineering on reliability improvements
Requirements
- 5+ years of SRE or DevOps experience
- Strong experience with monitoring tools (Prometheus, Grafana)
- Knowledge of distributed systems and microservices
- Experience with incident management
- Scripting and automation skills
- Understanding of cloud platforms (AWS/GCP)
- Excellent troubleshooting abilities
- Gaming or high-availability platform experience
Benefits
- Salary €45,000 - €58,000
- On-call compensation
- Hybrid work
- Health insurance
- Work on platform reliability
- Professional development