Senior Site Reliability Engineer

Optimizely

Optimizely

Software Engineering

Hanoi, Vietnam

Posted on Apr 28, 2026

At Optimizely, we're on a mission to help people unlock their digital potential. We do that by reinventing how marketing and product teams work to create and optimize digital experiences across all channels. With Optimizely One, our industry-first operating system for marketers, we offer teams flexibility and choice to build their stack their way with our fully SaaS, fully decoupled, and highly composable solution.

We are proud to help more than 10,000 businesses, including H&M, PayPal, Zoom, and Toyota, enrich their customer lifetime value, increase revenue and grow their brands. Our innovation and excellence have earned us numerous recognitions as a leader by industry analysts such as Gartner, Forrester, and IDC, reinforcing our role as a trailblazer in MarTech.

At our core, we believe work is about more than just numbers -- it's about the people. Our culture is dynamic and constantly evolving, shaped by every employee, their actions and their stories. With over 1500 Optimizers spread across 12 global locations, our diverse team embodies the "One Optimizely" spirit, emphasizing collaboration and continuous improvement, while fostering a culture where every voice is heard and valued.

Join us and become part of a company that's empowering people to unlock their digital potential!

To get a sneak peek into our culture, find us on Instagram: @optimizely

Introduction

As a Senior Site Reliability Engineer at Optimizely, you will play a critical role in ensuring the reliability, performance, and scalability of our digital platforms. You will collaborate with cross-functional teams to design, implement, and maintain robust systems and processes that enhance the overall user experience.

Job Responsibilities

  • System Design and Implementation: Design and implement reliable and scalable systems to support our digital platforms. Collaborate with software engineers to integrate reliability into the architecture.
  • Monitoring and Performance Optimization: Develop and maintain monitoring solutions to ensure system performance and availability. Identify and resolve performance bottlenecks and optimize system performance.
  • Incident Management: Lead incident response efforts, including troubleshooting, root cause analysis, and implementing corrective actions to prevent future incidents.
  • Automation and Tooling: Develop and maintain automation tools and scripts to improve system efficiency and reduce manual intervention. Implement infrastructure as code practices.
  • Collaboration and Communication: Work closely with cross-functional teams to align on reliability goals and best practices. Communicate effectively with stakeholders to provide updates on system status and improvements.
  • Continuous Improvement: Stay updated with the latest industry trends and technologies related to site reliability engineering. Proactively identify opportunities for process and system improvements.

Knowledge and Experience

  • Proven experience as a Senior Site Reliability Engineer or similar role in a fast-paced environment.
  • Strong understanding of cloud computing, networking, and system architecture. Preferably AWS, GCP is a plus.
  • Proficiency in scripting and automation tools (e.g., Python, Bash, Terraform, Chef).
  • Experience with observability tools (e.g., Datadog, Prometheus, Grafana, ELK Stack).
  • Kubernetes Expertise: Demonstrated experience in designing, deploying, and managing applications in Kubernetes environments. Proficiency in configuring and optimizing Kubernetes clusters for scalability, reliability, and performance. Hands-on experience with Kubernetes tools and technologies such as Helm, Kustomize, and Kubectl.
  • Istio Proficiency (preferred): Familiarity with Istio service mesh architecture and its components is a plus. Experience in implementing Istio for traffic management, security, and observability within microservices architectures is desirable but not required. Candidates with a willingness to learn and explore Istio are encouraged to apply.
  • Experience (preferred) with message broker, preferably Kafka.
  • Understanding (preferred) of coordination services such as Zookeeper.
  • Proficiency in version control software, particularly Git, is required.
  • Excellent problem-solving skills and attention to detail.
  • Strong communication and collaboration skills.
  • Proficiency in English is required.

Optimizely is committed to a diverse and inclusive workplace. Optimizely is an equal opportunity employer and does not discriminate on the basis of race, national origin, gender, gender identity, sexual orientation, protected veteran status, disability, age, or other legally protected status.

#LI-SR1