Senior Data Architect (Remote)

Location: Remote. Must overlap with US Central and EU working hours.Employment Type: Full-time No part-time availability. No split focus.Start: ASAP (client timeline: ~16 weeks for Phase 2 MVP, likely follow-on phases) long term contract with Endrada This is a high-rigor environment. You will work with very senior client engineers and principal architects who expect you to reason at depth about Spark/Databricks internals, orchestration semantics, failure modes, and production SDLC.What you will own (Phase 2 deliverables)You will lead architecture + hands-on implementation of a Temporal-based orchestration wrapper that triggers, monitors, and classifies Databricks job runs, including:1) Temporal infrastructure & deployment- Help deliver a production-grade Temporal deployment aligned to the client's Hub + Spoke architecture (in coordination with Cloud Engineering)- Demonstrate deployments/execution in staging workspace- AWS is the target cloud; identify Azure gaps (don't ignore cross-cloud realities)2) Multi-environment SDLC- Support multiple environments (dev/staging/production)- Integrate with the client's existing internal deployment tooling and namespacing patterns- Ensure clean promotion paths with appropriate guardrails3) Production pilot: migrate authentication pipeline- Migrate authentication token generation + secret-writing pipeline from its current orchestration into Temporal as a high-value, low-risk production pilot4) Implement the 'Sequence Pipeline' pattern in Temporal- Replicate the current 'Sequence Job' pattern using Temporal workflows- Implement 'pick up running child job' to prevent redundant compute costs- Implement step-level recovery: if Task 5 of 10 fails, keep results from 1–4 and allow resume from 5 (no 'restart everything')- Add audit logging / observability for execution history + outcomes- Deliver an operational runbook for triage and ongoing operations in Temporal5) Security & permissions model- Implement a robust permissions pattern so Temporal can trigger and monitor 'child' jobs across Databricks workspaces- Maintain strict logical separation: Temporal is the 'control plane,' Databricks remains the data/compute plane6) Reference implementation- Build a 'dummy' reference job sequence as a blueprint for the client's engineers to extend in Phase 3What is intentionally out of scope (so you can focus)Phase 2 explicitly defers deeper data-domain workstreams (DLQ enhancements, domain-specific pilots, hybrid compute guardrails, cost attribution) to Phase 3. You are not expected to become the business-domain owner of the client's graphlogic—your job is to build a reliable orchestration layer that respects it.This is not a 'PowerPoint architect' roleYou will:- Write production code- Own failure modes and recovery semantics- Ship to dev/test/prod with a real SDLC- Produce runbooks that on-call engineers can actually useIf you prefer advisory-only architecture or you need someone else to 'operationalize' your designs, this will not be a fit.Required qualifications (non-negotiable)Hands-on architecture + delivery- 8+ years in data engineering / platform engineering, including 3+ years as a technical lead/architect shipping production systems- Proven ownership of a system from design → implementation → production rollout → operational handoffDatabricks + Spark depth- Deep expertise with Databricks (Jobs/Workflows, cluster configs, execution semantics, failure patterns)- Deep Spark fundamentals: shuffles, partitioning, skew, caching, job planning, and debugging via logs/event timelines- (The client's engineers operate at this level.)Durable orchestration / workflow systems- Strong experience with orchestration frameworks beyond UI-based DAG builders:- Temporal (preferred), Cadence, AWS Step Functions, Argo Workflows, Airflow at scale with custom state/recovery semantics, etc.- You must understand: idempotency, deterministic execution, retries vs replays, compensation patterns, state persistence, and workflow versioningPython + API integration- Strong production Python (packaging, testing, typing discipline, structured logging)- Experience integrating with REST APIs / SDKs (Databricks Jobs API patterns, auth, rate-limits, retries)Cloud + security- AWS fluency: IAM, networking boundaries, secrets management, KMS, deployment patterns- Comfortable partnering with Cloud Engineering but able to lead technically (you can't outsource all infra thinking)Operating model- Able to be 100% dedicated to this workstream during critical phases (no '50% attention' model)- Comfortable working across time zones (US Central + Europe overlap)Preferred qualifications (strongly preferred)- Temporal in production (or Cadence) with real incident learnings- Experience implementing 'meta-orchestrators' that coordinate other orchestrators/systems- OpenTelemetry / structured observability patterns (logs + metrics + traces)- Experience with large 'DAG of DAGs' pipelines, long runtimes, expensive failure restarts- Databricks certifications (or willingness to obtain/renew quickly as part of partner commitments)How we hire:Introductory Call (20 min): Short conversation with our Recruiter to discuss your background and expectations.Deep technical interview (1 - 1,5 h): (Spark/Databricks + orchestration semantics) and System design exercise (go though a durable orchestration wrapper with step-level resume)Client Interview (45 min - 1 h): Required in this case

Veröffentlicht	vor etwa 1 Monat
Läuft ab	in etwa 2 Monaten
Art des Vertrags	B2B, Festanstellung

Senior Data Architect

Status

Hexjobs Insights

Schlüsselwörter

Ähnliche Jobs, die für Sie von Interesse sein könnten

Basierend auf "Senior Data Architect"

Status