Site Reliability Engineer - Platform Team
Soundtrack Your Brand is a Spotify-backed company that offers music streaming services for businesses. We serve small customers, like the hairdresser around the corner, and large enterprises like McDonald's, Toni & Guy and TAGHeuer.
Playing music in businesses are far different from playing music at home. A business music service is pretty much a collaborative, multi-user, multi-location game where enterprises need a vast selection of music that's continually updated. Also, the music itself profoundly influence consumers' experience and behavior. We have repeatedly proven this in our research and as thought leaders in in-store music research, we try to educate the world about the impact of music.
Our development team consists of 30 very talented, motivated and humble engineers with experiences from EA, Spotify, Skype, iZettle, Viaplay, Blocket, and Aftonbladet. In total, we are about 90 people working from Birger Jarlsgatan in Stockholm, Seattle, and London.
After building an impressive Nordic customer base in the last couple of years, we have now expanded internationally and are currently live in Europe, America, Asia, and Africa.
Site Reliability Engineers (SREs) at Soundtrack Your Brand ensure that we always play the right music, in the right place, at the right time. We accomplish this through automation, machine-learning, monitoring and constant focus on reliability and scalability.
As an SRE in the Platform Team you will be responsible for:
- Monitoring and reliability of
• our infrastructure running on the Google Cloud Platform, using Google App Engine and Container Engine (Docker/Kubernetes)
• our shared services written in Go, Scala and Elixir
• our data processing, analysis and reporting services built with Google Dataflow, BigTable, BigQuery, and Tableau
• external and internal SLAs
- Development of tools for auto-healing features, detecting and resolving anomalies and management of external risks.
- Proactively preventing outages through tight collaboration with the platform developers
- Incident management processes, planning and execution of regular fire drills
- Backup systems and processes
You're friendly, pragmatic, professional, communicative, precise and fun to work with. Also, you are probably comfortable with describing yourself as:
- An analytic problem solver always looking to learn more.
- A great communicator – inside the team, as well as outside of it.
- An open person who say what you mean and mean what you say.
- Ready to get your hands dirty and join a team of doers!
- Organized, detail oriented, and thorough in every undertaking.
- Accustomed to working in a cloud native, containerized environment
- Strong experience from working with modern cloud services like Amazon, Azure or preferably Google Cloud.
- Previous experience from monitoring large distributed systems.
- Managing the ingestion, indexing and searching of large quantities of logs in Elastic and Kibana
- Managing large Prometheus and Grafana projects
- Kubernetes and/or Docker ninja
- Familiar with:
• Agile development methods like Scrum.
• Web based services and APIs.
• Strong knowledge in networking and the TCP/IP protocol
- Ability to actively participate in infrastructure design and implementation.
- Contributions to Open Source projects is a plus.
Checked all the boxes above?
We believe that diversity of perspective and experience will render a better workplace for our employees and a better product for our users. We’d like for new employees to contribute to such a diverse workplace.
Apply to: firstname.lastname@example.org