Back to all jobs


Duties and Responsibilities:

• Build, operate, and maintain scalable, highly-available, and resilient distributed systems

• Participate in chaos engineering exercises to help increase the reliability of systems

• Participate in on-call rotation to ensure critical services are available

• Analyze metrics to create actionable monitors or alerts to ensure critical services are available and performing well

• Work with product teams to ensure SLO targets are met or exceeded

• Participate in root cause analysis after incidents to ensure they won’t happen again

• Identify and implement improvements to Cambria’s CI/CD deployment pipeline to improve speed and efficiency of code deployments

• Maintain excellent documentation and diagrams of the complex systems supporting the Microservices environment


• Bachelor’s degree of Computer Science, Electrical Engineering, or related technical degree or equivalent experience

• Motivated self-learner pushing technology solutions forward, who anticipates problems and challenges

• Solid understanding of Software Engineering and Computer Science principles with security mindfulness

• Mastery in at least one programming language. Python, Powershell, Golang, or Bash preferred

• 3-5 years of experience with building, operating and maintaining complex, highly available, and scalable systems

• Solid foundation in Linux administration and troubleshooting

• Experience working with Agile delivery methodologies such as Scrum and Kanban. Familiarity and experience with ALM toolsets (Cambria uses Jira) and collaboration software (such as Slack, G-Suite, and Confluence).

• Experience with some or all of the following software/tools, or close equivalents:

• Highly-available MySQL clusters

• Elasticsearch clusters

• Apache Kafka clusters

• Hashicorp tools consul vault

• Container technologies such Docker, LXC, or Podman

• Container orchestrators such as Kubernetes or Hashicorp Nomad

• CI/CD tools like Jenkins, TravisCI, or CircleCI. Jenkins Pipelines preferred

• Configuration management tools like Ansible, Saltstack, or Puppet

• Load balancing technologies such as Nginx, Apache2, or Haproxy

• Infrastructure as Code tools such as Terraform or Cloudformation

• Public cloud platforms such as AWS, Azure, or GCP

• Application performance and Monitoring tools such as Datadog or Dynatrace

Minimum Requirements:

Education: Minimum 4-year technical degree or equivalent work experience

Experience: 5 years working in a Site Reliability/Systems Engineering position

We’re an equal opportunity employer. All applicants will be considered for employment without attention to race, color, religion, sex, sexual orientation, gender identity, national origin, veteran or disability status.

    • Location: Minneapolis
    • Date posted:
    • Salary:$100000 - $130000