Synechron

Cloud Data Engineer – Python, Spark, Scala, AWS & AI Integration

Reposted 2 Days Ago

In-Office or Remote

2 Locations

Mid level

In-Office or Remote

2 Locations

Mid level

Design, develop, and maintain data pipelines using cloud-native tools and integrate AI/ML capabilities to support data-driven decision-making.

The summary above was generated by AI

Job Summary

Synechron is seeking a highly skilled Data Engineer to design, develop, and maintain data pipelines and analytical solutions within enterprise data platforms. This role requires hands-on expertise with cloud-native data processing, leveraging big data frameworks, and integrating AI/ML capabilities to support business intelligence and data-driven decision-making. As a strategic contributor, you will lead initiatives to deliver scalable, reliable, and secure data solutions aligned with industry regulations, fostering operational efficiency and advanced analytics.

Software Requirements

Required:

Hands-on experience with Python, PySpark, and Scala for building data pipelines and processing massive datasets (4+ years)
Proficiency in big data platforms: Spark, Hadoop, or similar (batch and streaming processing)
Experience with cloud-based data solutions on AWS, including EMR, S3, Glue, CloudFormation, and CDK
Deep understanding of SQL and relational databases like PostgreSQL, SQL Server, and DynamoDB
Familiarity with ETL frameworks and data management best practices
Knowledge of data lineage, data quality, and metadata management

Preferred:

Experience integrating AI/ML models and GenAI frameworks such as LangChain, Hugging Face
Exposure to NoSQL databases and advanced data storage solutions
Knowledge of containerization and orchestration: Docker, Kubernetes

Overall Responsibilities

Design, develop, and support scalable data pipelines and data processing architectures using PySpark, Scala, and cloud-native tools
Build and optimize batch and streaming data workflows supporting enterprise analytics and reporting
Translate business requirements into robust, high-performance data solutions ensuring data integrity and security
Perform performance tuning, debugging, and fine-tuning of data processing jobs to meet strict SLAs
Implement data quality, compliance, and governance standards across enterprise data assets
Collaborate with data analysts, data scientists, and business teams to develop data models and insights
Lead efforts to automate data workflows, orchestrate processing pipelines, and support cloud migrations
Provide production support and perform root cause analysis for data pipeline issues
Stay updated on emerging data technologies, AI/ML integrations, and industry best practices
Document processes, data lineage, and architecture to support compliance and operational transparency

Technical Skills (By Category)

Programming Languages (Essential):

Python, Scala, PySpark (4+ years)
SQL for data querying, validation, and optimization

Preferred:

Java or additional scripting languages for automation

Frameworks & Libraries:

Spark, Hadoop, Hive, and related big data tools
AI/ML frameworks such as LangChain, Hugging Face, or similar (preferred)
Data validation and lineage tools

Databases & Storage:

Relational: PostgreSQL, SQL Server, Oracle
NoSQL: DynamoDB, Cassandra

Cloud Technologies:

AWS: EMR, S3, Glue, Lambda, CloudFormation, CDK, Redshift (desired)

Data Management & Governance:

Metadata management, data lineage, data quality frameworks
Enterprise data governance standards and compliance requirements

Experience Requirements

4+ years of hands-on experience designing, developing, and supporting enterprise data pipelines
Proven expertise working with big data frameworks such as Spark, Hadoop, Hive, and Kafka
Practical experience with cloud-native data solutions on AWS or similar platforms
Exposure to applying AI or GenAI models within data pipelines is highly valued
Experience working in regulated industries like banking, finance, or healthcare is advantageous

Day-to-Day Activities

Develop, test, and optimize data pipelines handling large-scale datasets
Coordinate with data scientists, analytics teams, and product owners to refine data models
Troubleshoot and resolve performance bottlenecks or data quality issues
Automate data workflows and orchestrate processes using cloud-native tools and frameworks
Support cloud infrastructure provisioning and migration strategies
Monitor data pipeline health, perform root cause analysis, and implement improvements
Document data processes, lineage, and governance policies for compliance and operational transparency
Stay updated on emerging trends in AI/ML, data engineering, and cloud-native solutions

Qualifications

Bachelor’s or Master’s degree in Computer Science, Data Engineering, or related field
4+ years of experience with cloud-based data engineering and big data platforms
Proven track record delivering scalable, secure, and regulatory-compliant data pipelines
Certifications such as AWS Certified Data Analytics or equivalent are preferred

Professional Competencies

Strong analytical and troubleshooting skills for complex data environments
Excellent collaboration and stakeholder management skills
Leadership qualities for guiding junior team members and technical decision-making
Adaptability to new tools, protocols, and industry changes
Results-oriented with a focus on data quality, security, and operational efficiency
A continuous learning mindset for emerging technologies in data science and cloud data engineering

SYNECHRON’S DIVERSITY & INCLUSION STATEMENT

Diversity & Inclusion are fundamental to our culture, and Synechron is proud to be an equal opportunity workplace and is an affirmative action employer. Our Diversity, Equity, and Inclusion (DEI) initiative ‘Same Difference’ is committed to fostering an inclusive culture – promoting equality, diversity and an environment that is respectful to all. We strongly believe that a diverse workforce helps build stronger, successful businesses as a global company. We encourage applicants from across diverse backgrounds, race, ethnicities, religion, age, marital status, gender, sexual orientations, or disabilities to apply. We empower our global workforce by offering flexible workplace arrangements, mentoring, internal mobility, learning and development programs, and more.

All employment decisions at Synechron are based on business needs, job requirements and individual qualifications, without regard to the applicant’s gender, gender identity, sexual orientation, race, ethnicity, disabled or veteran status, or any other characteristic protected by law.

Candidate Application Notice

Top Skills

AWS

Cdk

CloudFormation

Docker

DynamoDB

Emr

Glue

Hadoop

Hugging Face

Kubernetes

Langchain

Postgres

Pyspark

Python

Scala

Spark

SQL

SQL Server

Similar Jobs

Atlassian

Software Engineer

4 Hours Ago

Remote

India

Mid level

Cloud • Information Technology • Productivity • Security • Software • App development • Automation

Design, build, and operate large-scale ML training and inference infrastructure and backend services. Enable ML lifecycle (training, deployment, serving) for product teams, own services end-to-end including IaC, CI/CD, observability and on-call incident response.

Top Skills: Java,Kotlin,Go,Python,Aws,Gcp,Ec2,Gke,Eks,Gpus,Kubernetes,Containers,Ci/Cd,Mlops,Ml Lifecycle,Llm,Autoscaling,Load Balancing,Config Management,Secret Management,Monitoring,Observability,Operators/Controllers

CrowdStrike

Automation Engineer

4 Hours Ago

Remote or Hybrid

India

Senior level

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity

Drive enterprise-wide automation initiatives by designing and implementing solutions using N8N and Tray.ai. Collaborate with teams to automate processes, integrate systems, and ensure high availability of workflows while mentoring others and reporting on impact.

Top Skills: Gemini EnterpriseJSONMcpN8NPythonRest ApisSQLTray.AiXML

CrowdStrike

Account Manager

4 Hours Ago

Remote or Hybrid

India

Junior

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity

Manage contract renewal cycles with partners and customers to drive on-time renewals and ARR growth. Provide and deliver quotes, consult to resolve questions, manage booking, communicate product value, and coordinate with Account Managers for seamless customer experience.

Top Skills: Salesforce,Crm,Saas,Cybersecurity,Endpoint Security

What you need to know about the Chennai Tech Scene

To locals, it's no secret that South India is leading the charge in big data infrastructure. While the environmental impact of data centers has long been a concern, emerging hubs like Chennai are favored by companies seeking ready access to renewable energy resources, which provide more sustainable and cost-effective solutions. As a result, Chennai, along with neighboring Bengaluru and Hyderabad, is poised for significant growth, with a projected 65 percent increase in data center capacity over the next decade.