Negotiable Salary
Qode
Ohio, USA
Job Summary We are looking for an experienced AWS Data Engineer with strong expertise in Python and PySpark to design, build, and maintain large-scale data pipelines and cloud-based data platforms. The ideal candidate will have hands-on experience with AWS services, distributed data processing, and implementing scalable solutions for analytics and machine learning use cases. Key Responsibilities · Design, develop, and optimize data pipelines using Python, PySpark, and SQL. · Build and manage ETL/ELT workflows for structured and unstructured data. · Leverage AWS services (S3, Glue, EMR, Redshift, Lambda, Athena, Kinesis, Step Functions, RDS) for data engineering solutions. · Implement data lake/data warehouse architectures and ensure data quality, consistency, and security. · Work with large-scale distributed systems for real-time and batch data processing. · Collaborate with data scientists, analysts, and business stakeholders to deliver high-quality, reliable data solutions. · Develop and enforce data governance, monitoring, and best practices for performance optimization. · Deploy and manage CI/CD pipelines for data workflows using AWS tools (CodePipeline, CodeBuild) or GitHub Actions. Required Skills & Qualifications · Strong programming skills in Python and hands-on experience with PySpark. · Proficiency in SQL for complex queries, transformations, and performance tuning. · Solid experience with AWS cloud ecosystem (S3, Glue, EMR, Redshift, Athena, Lambda, etc.). · Experience working with data lakes, data warehouses, and distributed systems. · Knowledge of ETL frameworks, workflow orchestration (Airflow, Step Functions, or similar), and automation. · Familiarity with Docker, Kubernetes, or containerized deployments. · Strong understanding of data modeling, partitioning, and optimization techniques. · Excellent problem-solving, debugging, and communication skills.