Design and build robust, scalable organizational data infrastructure and architecture. Identify and implement process improvements (e.g., infrastructure redesign, automation of data workflows, performance optimizations) Select appropriate tools, services, and technologies to build resilient pipelines for data ingestion, transformation, and distribution. Develop and manage ELT/ETL pipelines and related applications. Collaborate with global teams to deliver fault-tolerant, high-quality data engineering solutions. Perform monthly code quality audits and peer reviews to ensure consistency, readability, and maintainability across the engineering codebase.
Proven experience building and managing ETL/ELT pipelines. Advanced proficiency with Azure, AWS, and Databricks (with focus on data services) Deep knowledge of Python, Spark ecosystem (PySpark, Spark SQL) and relational databases Experience building REST APIs, Python SDKs, libraries, and Spark-based data services. Hands-on expertise with modern frameworks and tools like FastAPI, Pydantic, Polars, Pandas, Delta Lake, Docker, Kubernetes Understanding of Lakehouse architecture, Medallion architecture, and data governance Experience with pipeline orchestration tools (e.g., Airflow, Azure Data Factory) Strong communication skills, ability to work cross-functionally with international teams. Skilled in data profiling, cataloging, and mapping for technical data flows Understanding of API product management principles, including lifecycle strategy, documentation standards, and versioning
Deep understanding of cloud architecture (compute, storage, networking, security, cost optimization) Experience tuning complex SQL/Spark queries and pipelines for performance. Hands-on experience building Lakehouse solutions using Azure Databricks, ADLS, PySpark, etc. Familiarity with OOP, asynchronous programming, and batch processing paradigms Experience with CI/CD, Git, and DevOps best practices