Lead Site Reliability Engineer
Zest AI
Founded in 2009, Zest AI has been pioneering innovative AI technology with a mission to expand credit access to create opportunities for more Americans to pursue their financial goals. Zest AI is transforming the $17 trillion US consumer credit market by delivering AI technology that helps lenders identify creditworthy borrowers overlooked by traditional methods, while leveling the playing field so financial institutions of all sizes can harness AI to provide better lending experiences.
With over 50 issued and pending patents across automation, accuracy, performance, and model explainability, Zest AI is a leader in financial technology, providing financial institutions with AI tools that create a more resilient US financial system and a stronger U.S. economy. With over 600 active AI models, financial institutions rely on Zest AI's comprehensive suite of solutions spanning marketing, underwriting, fraud detection, lending intelligence, and more to make smarter lending decisions while improving profitability. This US-based technology as a service company is headquartered in Los Angeles, California.
About the Role
We’re looking for a seasoned Lead Site Reliability Engineer (SRE) to help shape the systems and practices that make our platform resilient, observable, and scalable. You'll lead by driving initiatives, building internal tools, and mentoring others to foster a culture of reliability and excellent operational standards.
We're looking for a strategic, service-oriented engineer who excels at identifying high-impact opportunities, delivering scalable solutions, and bridging technical and business teams through clear communication and collaboration.
Responsibilities
- Build Tooling & Infrastructure: Design and implement internal developer tools, observability dashboards, and AI-powered alerting systems that support operational excellence and empower teams through self-service.
- Drive Strategic Initiatives: Lead the adoption of modern DevOps practices across engineering teams such as standardizing environments, improving service readiness, and increasing release reliability.
- Automate Reliability & Observability: Develop intelligent systems for automated alerting, diagnostics, and incident response using AI. Enhance observability through centralized dashboards and proactive monitoring.
- Mentor & Influence: Coach engineers and leaders on best practices in DevOps and SRE. Champion systems thinking, operational maturity, and a culture of continuous improvement.
- Establish Standards & Automation: Define and implement engineering standards with a focus on deterministic automation, usability, accessibility, and long-term resilience.
Required Qualifications
- 5+ years of experience as a Site Reliability Engineer, with a focus on reliability or infrastructure.
- Hands-on experience implementing and integrating observability platforms (Datadog strongly preferred).
- Proven track record in incident response, remediation, and building incident management protocols.
- Strong troubleshooting skills under pressure in complex, distributed environments.
Preferred Qualifications
- Proficiency in Python and/or TypeScript.
- Experience deploying and operating infrastructure in cloud environments (AWS preferred).
- Skilled in Infrastructure-as-Code practices and tools (Terraform preferred, CDK/Typescript is a bonus).
- Familiarity with container orchestration and operations (Kubernetes and AWS ECS Fargate preferred).
- Strong background in Linux systems and shell scripting.
- Experience designing highly available and scalable architectures.
Perks and benefits:
- The opportunity to join a mission-focused company
- People – the best part of Zest
- Robust medical, dental and vision insurance plans
- Annual bonus plan participation
- 401(k) with generous match
- Employee Awards and Recognition
- 11 company holidays
- Winter break (office closed between Christmas and New Year's Day)
- Unlimited vacation time
- Employee Resource Groups
- Generous family leave policy (12 week maternity leave / 6 week paternity leave)
- Phone, internet, wellness, and professional development allowances
- Employee gatherings, including Town Hall meetings
Additionally, Zestys based locally and around our Burbank HQ enjoy:
- Beautiful, modern, dog-friendly office with lounge areas, video games, and gigantic jigsaw puzzles
- Daily catered lunches from LA’s best restaurants and a fully stocked kitchen
- Complimentary manicures, pedicures, and mindfulness sessions
- Company happy hours, social events, outings, and much more!
About Zest AI:
Creating a diverse and inclusive culture where all are welcomed, valued, and empowered to achieve our full potential is important to who we are and where we’re headed in the future. We know that unique backgrounds, experiences, and perspectives help us think bigger, spark innovation, and succeed together. Zest is committed to diversity, equity, and inclusion and encourages professionals from underrepresented groups in technology and financial services to apply.
Our core values are Communication, Collaboration, Bias for Action, Client-centricity, and Heart. Learn more at Zest.ai, follow us on LinkedIn (linkedin.com/company/zest-ai/) or Twitter @Zest_AI, or check out our Insights blog (https://www.zest.ai/cms/insights