Head of DevOps
Rohirrim
Why Join Rohirrim?
Rohirrim is at the forefront of innovation in the Gen AI space. Joining our team means being part of a dynamic environment where your leadership and expertise have a real, meaningful impact on our customers, our products, and our company.
Are you passionate about pushing the boundaries of technology in the Gen AI space? Rohirrim is seeking a Head of DevOps to mentor Engineers, lead and foster a high-performing DevOps team, provide technical direction, and drive secure product distribution of cutting-edge applications. If you thrive in a fast-paced environment and enjoy leading by example while staying hands-on, we want to hear from you!
Role Overview
As the Head of DevOps, reporting to the CEO & Founder, you will own our products' deployment velocity to customers while increasing performance, security and resilience across the stack. You will mentor and guide the DevOps team, organize tasks, and ensure technical excellence, all while remaining an active contributor. The Head of DevOps will be responsible for designing, implementing and maintaining robust CI/CD pipelines, cloud infrastructure, and automation frameworks to support our AI-driven applications. The role requires a blend of technical expertise, leadership, customer engagement, and strategic thinking to ensure smooth deployment, scalability, and security of our infrastructure, and our customers’ infrastructure, while fostering a collaborative team culture. It is highly preferred that the candidate is based in the DC Area, and relocation may be required. Frequent attendance to the Corporate HQ in McLean, Virginia is required.
Key Responsibilities:
- Leadership and Team Management:
- Lead, mentor and manage a team of 5+ DevOps engineers, fostering a culture of collaboration, innovation, and continuous improvement
- Conduct regular 1:1s, performance reviews, and professional development planning to support team growth.
- Define team goals, workflows, and best practices to align with company objectives.
- Infrastructure & Automation:
- Design, implement, and manage scalable, secure, and highly available cloud infrastructure (Azure) to support AI workloads.
- Develop and maintain CI/CD pipelines to enable rapid, reliable, and automated deployments of AI models and applications.
- Automate infrastructure provisioning, configuration management, and monitoring using tools like Terraform, Helm, and Kubernetes.
- AI/ML Operations:
- Collaborate with data scientists and AI engineers to streamline MLOps processes, including model deployment, versioning, and monitoring.
- Ensure efficient management of large-scale datasets and compute resources for ingestion, training and inference.
- Security & Compliance:
- Maintain and continuously improve security best practices (e.g., IAM, encryption, network security, CVEs) to protect sensitive data and ensure compliance with industry standards (e.g., SOC 2, GDPR, or HIPAA, as applicable).
- Conduct regular security audits and vulnerability assessments.
- Quality Assurance:
- During Release Management actively participate in quality assurance.
- Monitoring & Performance:
- Establish and maintain monitoring, logging, and alerting systems (e.g., Prometheus, Grafana, ELK Stack) to ensure system reliability and performance.
- Proactively identify and resolve performance bottlenecks in infrastructure and applications.
- Establish Executive dashboards for ease of observability and performance, with weekly reports to eExecutive team, and quarterly reports to the Board of Directors
- Cross-Functional Collaboration:
- Partner with engineering, product, and customer success teams to align DevOps strategies with business and technical requirements.
- Maintain and implementDevOps best practices and drive adoption of modern methodologies across the organization.
- Innovation & Best Practices:
- Contribute to the strategic roadmap for infrastructure scalability and operational excellence.
- Stay abreast of emerging DevOps and AI technologies to recommend and implement innovative solutions.
- Automate and streamline processes for scale when and where possible.
Minimum Qualifications:
- 7+ years of experience in DevOps, Site Reliability Engineering (SRE), or related fields, with at least 2 years in a leadership or management role.
- Proven track record of managing and mentoring technical teams (3+ team members).
- Extensive experience with Azure cloud platform (AWS and GCP experience are pluses) and containerization technologies (Docker, Kubernetes).
- Hands-on experience with CI/CD tools (e.g., Jenkins, GitLab CI, GitHub Actions, Azure DevOps) and infrastructure-as-code (e.g., Terraform, CloudFormation).
- Proficiency in scripting and programming (e.g., Python, Bash) for automation and tooling.
- Strong knowledge of Linux/Unix systems administration and networking.
- Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack).
- Familiarity with MLOps practices and tools (e.g., Kubeflow, MLflow) is a plus.
- Exceptional leadership and communication skills to inspire and align teams.
- Ability to thrive in a fast-paced, startup environment with ambiguity and rapid iteration.
- Strong problem-solving skills and a proactive approach to identifying and addressing challenges.
- Bachelor’s degree in computer science, Engineering, or a related field (or equivalent experience).
Preferred Qualifications:
- Relevant certifications (e.g., Azure Certified DevOps Engineer, Kubernetes Administrator) are a plus.
- Experience working in AI-native or data intensive startup environment.
- Knowledge of hybrid cloud, private cloud and multi-cloud architectures
- Passion for mentorship and leadership with a track record of growing and developing engineers.
- Exceptional debugging and analytical capabilities.
- Strong communication skills, with the ability to convey technical concepts effectively.
- A bias for action—taking initiative to solve problems without perfect direction.