Platform Engineering Manager
Nexus Cognitive
Software Engineering
Atlanta, GA, USA
Posted on Oct 24, 2025
Description
The Platform Engineering Manager will lead the engineering teams responsible for building, automating, and operating the NexusOne data platform within enterprise client environments.
This individual is a player-coach — equally capable of hands-on delivery and strategic leadership — driving platform automation, scalability, and reliability across global deployments.
They will oversee a cross-functional team of Platform Engineers, DevOps, and DataOps resources, ensuring all NexusOne deployments meet performance, compliance, and uptime goals.
Core Responsibilities
- Lead and mentor Platform Engineers, DevOps, and DataOps teams responsible for standing up and managing NexusOne environments.
- Define and implement infrastructure-as-code (IaC) patterns and CI/CD pipelines for consistent, repeatable deployments.
- Partner with the Platform Architect to translate solution designs into executable engineering plans.
- Drive environment readiness (Pre-Prod, UAT, Production) in alignment with enterprise change management processes.
- Establish and monitor operational SLAs, uptime metrics, and performance KPIs.
- Ensure strong security posture, compliance with IAM policies, and integration with enterprise authentication frameworks.
- Oversee patching, upgrades, and lifecycle management for all open-source components (Spark, Trino, Airflow, NiFi, Ranger, etc.).
- Lead incident response, root-cause analysis, and postmortems to ensure platform stability and continuous improvement.
- Support internal and client-facing Architecture Review Boards and technical documentation efforts.
Requirements
Key Skills & Tools
- Infrastructure & Automation: Jenkins, GitLab CI/CD, Terraform, Helm, Kubernetes (K8s), OpenShift (OCP), Vault, Ansible
- Platform Components: Spark, Trino, Airflow, NiFi, Ranger, Iceberg, DataHub
- Security & Networking: IAM, SSO, RBAC, Kerberos, firewall configurations, and secure network zoning
- Monitoring & Observability: Prometheus, Grafana, Datadog, Splunk
- Languages & OS: Linux, Bash, Python, YAML, Go (plus familiarity with Java/Scala)
- Process & Delivery: Agile/Scrum, DevOps, ITIL change management
Ideal Background
- 8–12 years of experience managing platform or infrastructure teams in data platform, DevOps, or SRE organizations.
- Strong background as a hands-on engineer in earlier roles; comfortable leading by example.
- Experience at scale — e.g., Cloudera, Databricks, AWS EMR, or large hybrid-cloud enterprise environments.
- Proven record building and mentoring global teams (USA/India model preferred).
- Deep understanding of enterprise-grade deployment, observability, and operational excellence practices.
Success Criteria
- Successful on-time deployment of NexusOne environments with high platform stability and zero major incidents.
- Demonstrated improvement in deployment velocity and reduction in environment provisioning time.
- Clear, measurable SLA adherence across uptime, performance, and support metrics.
- Documented automation and CI/CD frameworks reusable across future clients.
- Positive team engagement and retention; consistent delivery of high-quality, secure platform builds.