Synechron

Cloud Data Engineer – Python, Spark/Scala, AWS & Data Warehousing

Reposted 2 Days Ago

Be an Early Applicant

In-Office or Remote

2 Locations

Senior level

In-Office or Remote

2 Locations

Senior level

The role requires designing end-to-end data-centric solutions, developing ETL pipelines, collaborating with teams, and promoting industry best practices in data engineering projects.

The summary above was generated by AI

Job Summary

Synechron is seeking a highly skilled Data Engineer to lead the design, development, and optimization of enterprise data pipelines and analytics solutions. This role involves working on big data platforms, leveraging cloud-native AWS services, and integrating AI-driven applications to support business intelligence and operational excellence. You will develop scalable, secure, and high-performance data architectures, working closely with various stakeholders to meet data governance, compliance requirements, and strategic objectives.

Software Requirements

Required:

Hands-on experience with Python, PySpark, and Scala for building scalable data processing jobs (4+ years)
Expertise in big data platforms such as Spark, Hadoop, Hive, and related processing frameworks
Proven experience in end-to-end ETL pipeline development including ingestion, transformation, and data validation
Strong familiarity with cloud data ecosystems on AWS, including EMR, S3, Glue, CloudFormation, CDK, and Data Pipeline
Experience with relational databases: PostgreSQL, Oracle, SQL Server
Experience working with NoSQL databases like DynamoDB or Cassandra (preferred)
Working knowledge of metadata management, data lineage, and data governance tools

Preferred:

Exposure to AI/ML applications, especially with Generative AI and frameworks like LangChain, Llama, or Hugging Face
Familiarity with model management and prompt engineering (preferred)
Knowledge of containerization (Docker) and Kubernetes for scalable deployment

Overall Responsibilities

Design, develop, and support data pipelines and architectures to enable enterprise analytics, reporting, and data governance
Build scalable batch and streaming data workflows supporting business-critical functions
Optimize performance, latency, and throughput of data processing jobs through profiling and tuning
Implement security, privacy, and regulatory standards in data pipelines and repositories
Collaborate with data scientists, BI teams, and applications teams to ensure data quality and availability
Automate data ingestion, transformation, and deployment processes for operational efficiency
Ensure high system reliability and availability, supporting infrastructure and platform monitoring
Lead or contribute to cloud migration and data modernization initiatives
Stay updated on emerging data technologies, automation, and AI/ML advancements to incorporate into existing platforms

Technical Skills (By Category)

Programming Languages (Essential):

Python, Scala, PySpark for big data processing

Preferred:

Additional scripting languages such as Shell or Bash for automation

Frameworks & Libraries:

Spark, Hadoop ecosystem (Hive, Kafka preferred)
Data validation, lineage, and governance tools
AI/ML frameworks such as LangChain, Hugging Face (preferred)

Databases & Data Storage:

Relational: PostgreSQL, Oracle, SQL Server
NoSQL: DynamoDB, Cassandra (preferred)

Cloud Technologies:

AWS: EMR, S3, Glue, CloudFormation, CDK, Lambda, Data Pipeline, CloudWatch

Data Governance & Security:

Metadata management, data lineage, security best practices, and compliance standards (PCI, GDPR)

Experience Requirements

4+ years designing and implementing enterprise data pipelines in cloud environments
Proven experience working with big data frameworks and AWS cloud solutions
Strong understanding of data governance, security standards, and regulatory compliance in enterprise contexts
Experience integrating AI/ML solutions or supporting AI workflows (preferred)
Past involvement in data migration, modernization, or platform automation projects

Day-to-Day Activities

Architect, develop, and optimize complex data pipelines and architectures supporting enterprise analytics
Implement data ingestion, transformation, and validation workflows across diverse data sources
Troubleshoot and resolve pipeline performance and security issues
Automate infrastructure provisioning, deployment, and management in cloud environments
Collaborate with data scientists, BI teams, and application developers to align data solutions with business goals
Monitor system health, perform root cause analysis, and implement efficiency improvements
Document data architecture, lineage, and governance procedures
Stay current on new tools, frameworks, and AI/ML innovations relevant for data engineering

Qualifications

Bachelor’s or Master’s degree in Computer Science, Data Engineering, or a related field
4+ years of experience in cloud data engineering, big data processing, and ETL development
Proven track record of successfully supporting large-scale, secure, and compliant data solutions
Relevant certifications such as AWS Data Analytics or Big Data certifications are advantageous

Professional Competencies

Strong analytical and troubleshooting capabilities for data processing and pipeline issues
Excellent collaboration and stakeholder management skills
Leadership qualities for guiding junior team members and influencing best practices
Ability to adapt to evolving industry standards, tools, and compliance regulations
Results-driven with a focus on data quality, security, and operational reliability

SYNECHRON’S DIVERSITY & INCLUSION STATEMENT

Diversity & Inclusion are fundamental to our culture, and Synechron is proud to be an equal opportunity workplace and is an affirmative action employer. Our Diversity, Equity, and Inclusion (DEI) initiative ‘Same Difference’ is committed to fostering an inclusive culture – promoting equality, diversity and an environment that is respectful to all. We strongly believe that a diverse workforce helps build stronger, successful businesses as a global company. We encourage applicants from across diverse backgrounds, race, ethnicities, religion, age, marital status, gender, sexual orientations, or disabilities to apply. We empower our global workforce by offering flexible workplace arrangements, mentoring, internal mobility, learning and development programs, and more.

All employment decisions at Synechron are based on business needs, job requirements and individual qualifications, without regard to the applicant’s gender, gender identity, sexual orientation, race, ethnicity, disabled or veteran status, or any other characteristic protected by law.

Candidate Application Notice

Top Skills

Spark

AWS

Big Data

Hive

Hugging Face

Langchain

Llama

Obiee

Oracle Sql

Pl/Sql

Pyspark

Python

SAS

Scala

Spark

Tableau

Teradata

Unix Shell Scripting

Similar Jobs

JPMorganChase

Software Engineer

Yesterday

Remote or Hybrid

India

Mid level

Financial Services

As a Software Engineer III, you'll design and deliver secure technology products, troubleshoot software solutions, and ensure quality in production code, while working in an agile environment and contributing to diverse team culture.

Top Skills: CicsCobolDb2ImsJclVsam

GitLab

Senior Director, Enterprise Technology & AI - India

Yesterday

Easy Apply

Remote

India

Easy Apply

Senior level

Cloud • Security • Software • Cybersecurity • Automation

Lead and scale GitLab's Enterprise Technology & AI operations in India, focusing on delivery excellence, QA, and building high-performing teams in a remote environment.

Top Skills: Agile MethodologiesAi ToolsCi/CdCpqDevOpsErpIntegration TechnologiesSalesforce

MetLife

Software Dev Engineer Test II

Yesterday

Remote or Hybrid

India

Mid level

Fintech • Information Technology • Insurance • Financial Services • Big Data Analytics

The role involves identifying defects in code, writing test automation, validating application functionality, and managing bug reports for software testing.

Top Skills: Automation FrameworksCoding StandardsTest AutomationTest Strategies

What you need to know about the Chennai Tech Scene

To locals, it's no secret that South India is leading the charge in big data infrastructure. While the environmental impact of data centers has long been a concern, emerging hubs like Chennai are favored by companies seeking ready access to renewable energy resources, which provide more sustainable and cost-effective solutions. As a result, Chennai, along with neighboring Bengaluru and Hyderabad, is poised for significant growth, with a projected 65 percent increase in data center capacity over the next decade.