Walmart Global Tech Logo

Walmart Global Tech

DATA ENGINEER III

Posted 3 Days Ago
Be an Early Applicant
Hybrid
Chennai, Tamil Nadu
Senior level
Hybrid
Chennai, Tamil Nadu
Senior level
The Data Engineer III will design, develop, and implement data pipelines and integration solutions using Spark, Scala, and Python on Google Cloud Platform. Responsibilities include optimizing workflows, ensuring data quality, managing data processing systems, and collaborating with teams to facilitate data access for analysis and reporting.
The summary above was generated by AI

Position Summary...Demonstrates up-to-date expertise and applies this to the development, execution, and improvement of action plans by providing expert advice and guidance to others in the application of information and best practices; supporting and aligning efforts to meet customer and business needs; and building commitment for perspectives and rationales. Provides and supports the implementation of business solutions by building relationships and partnerships with key stakeholders; identifying business needs; determining and carrying out necessary processes and practices; monitoring progress and results; recognizing and capitalizing on improvement opportunities; and adapting to competing demands, organizational changes, and new responsibilities. Models compliance with company policies and procedures and supports company mission, values, and standards of ethics and integrity by incorporating these into the development and implementation of business plans; using the Open Door Policy; and demonstrating and assisting others with how to apply these in executing business processes and practices.
Scope of Work:
• As a Data Engineer, you will play a critical role in designing, developing, and implementing data pipelines and data integration solutions using Spark, Scala, Python, Airflow and Google Cloud Platform (GCP).
• You will be responsible for building scalable and efficient data processing systems, optimizing data workflows, and ensuring data quality and integrity.
• Monitor and troubleshoot data pipelines to ensure data availability and reliability
• Conduct performance tuning and optimization of data processing systems for improved efficiency and scalability
• Work with stakeholders including the Executive, Product, Data and Design teams to assist with data-related technical issues and support their data infrastructure needs.
• Work closely with data scientists and analysts to provide them with the necessary data sets and tools for analysis and reporting
• Create data tools for analytics team members that assist them in building and optimizing our product into an innovative industry leader.
• Stay up-to-date with the latest industry trends and technologies in data engineering and apply them to enhance the data infrastructure

What you'll do...

Tech. Problem Formulation Requires knowledge of: Analytics/big data analytics / automation techniques and methods; Business understanding; Precedence and use cases; Business requirements and insights. To identify possible options to address the business problems within one's discipline through relevant analytical methodologies. Demonstrate understanding of use cases and desired outcomes. Understanding Business Context Requires knowledge of: Industry and environmental factors; Common business vernacular; Business practices across two or more domains such as product, finance, marketing, sales, technology, business systems, and human resources and in-depth knowledge of related practices; Directly relevant business metrics and business areas. To support the development of business cases and recommendations. Drives delivery of project activity and tasks assigned by others. Supports process updates and changes. Support, under guidance, in solving business issues. Data Source Identification Requires knowledge of: Functional business domain and scenarios; Categories of data and where it is held; Business data requirements; Database technologies and distributed datastores (e.g. SQL, NoSQL); Data Quality; Existing business systems and processes, including the key drivers and measures of success. To support the understanding of the priority order of requirements and service level agreements. Helps identify the most suitable source for data that is fit for purpose. Performs initial data quality checks on extracted data. Data Transformation and Integration Requires knowledge of: Internal and external data sources including how they are collected, where and how they are stored, and interrelationships, both within and external to the organization; Techniques like ETL batch processing, streaming ingestion, scrapers, API and crawlers; Data warehousing service for structured and semi-structured data, or to MPP databases such as Snowflake, Microsoft Azure, Presto or Google BigQuery; Pre-processing techniques such as transformation, integration, normalization, feature extraction, to identify and apply appropriate methods; Techniques such as decision trees, advanced regression techniques such as LASSO methods, random forests etc; Cloud and big data environments like EDO2 systems. To extract data from identified databases. Creates data pipelines and transform data to a structure that is relevant to the problem by selecting appropriate techniques. Develops knowledge of current data science and analytics trends. Data Modeling Requires knowledge of: Cloud data strategy, data warehouse, data lake, and enterprise big data platforms; Data modeling techniques and tools (For example, Dimensional design and scalability), Entity Relationship diagrams, Erwin, etc. ; Query languages SQL / NoSQL; Data flows through the different systems; Tools supporting automated data loads; Artificial Intelligent - enabled metadata management tools and techniques. To analyze complex data elements, systems, data flows, dependencies, and relationships to contribute to conceptual, physical, and logical data models. Develops the Logical Data Model and Physical Data Models including data warehouse and data mart designs. Defines relational tables, primary and foreign keys, and stored procedures to create a data model structure. Evaluates existing data models and physical databases for variances and discrepancies. Develops efficient data flows. Analyzes data-related system integration challenges and proposes appropriate solut Code Development and Testing Requires knowledge of: Coding languages like SQL, Java, C++, Python and others; Testing methods such as static, dynamic, software composition analysis, manual penetration testing and others; Business, domain understanding. To write code to develop the required solution and application features by determining the appropriate programming language and leveraging business, technical, and data requirements. Creates test cases to review and validate the proposed solution design. Creates proofs of concept. Tests the code using the appropriate testing approach. Deploys software to production servers. Contributes code documentation, maintains playbooks, and provides timely progress updates. Data Governance Requires knowledge of: Data value chains; Data processes and practices; Regulatory and ethical requirements around data; Data modeling, storage, integration, and warehousing; Data value chains (identification, ingestion, processing, storage, analysis, and utilization); Data quality framework and metrics; Regulatory and ethical requirements around data privacy, security, storage, retention, and documentation; Business implications on data usage; Data Strategy; Enterprise regulatory and ethical policies and strategies. To support the documentation of data governance processes. Supports the implementation of data governance practices. Data Strategy Requires knowledge of: Understanding of business value and relevance of data and data enabled insights / decisions; Appropriate application and understanding of data ecosystem including Data Management, Data Quality Standards and Data Governance, Accessibility, Storage and Scalability etc; Understanding of the methods and applications that unlock the monetary value of data assets. To understand, articulate, and apply principles of the defined strategy to routine business problems that involve a single function.

SkillSets:

  • Proven working experience as a Data Engineer with a minimum of 5 years in the field.
  • Strong programming skills in Scala and experience with Spark for data processing and analytics
  • Familiarity with Google Cloud Platform (GCP) services such as BigQuery, GCS, Dataproc etc.
  • Experience of developing near real-time ingestion pipelines using kafka and spark structured streaming.
  • Experience with Data modelling, Data warehousing and ETL processes
  • Understanding of data warehousing concepts and best practices
  • Strong knowledge of SQL and NoSQL systems
  • Proficiency in version control systems, particularly Git.
  • Proficiency in working with large-scale data sets and distributed computing frameworks
  • Familiarity with CI/CD pipelines and tools such as Jenkins or GitLab CI.
  • Familiarity with schedulers like Airflow.
  • Strong problem-solving and analytical skills
  • Familiarity with BI and Visualisation tools like Tableau or Looker
  • A background in Generative Artificial Intelligence (Gen AI) is desirable but not essential

Minimum Qualifications...

Outlined below are the required minimum qualifications for this position. If none are listed, there are no minimum qualifications.

Minimum Qualifications:Option 1: Bachelor's degree in Computer Science and 2 years' experience in software engineering or related field. Option 2: 4 years' experience in software engineering or related field. Option 3: Master's degree in Computer Science.

Preferred Qualifications...

Outlined below are the optional preferred qualifications for this position. If none are listed, there are no preferred qualifications.

Primary Location...Rmz Millenia Business Park, No 143, Campus 1B (1St -6Th Floor), Dr. Mgr Road, (North Veeranam Salai) Perungudi , India

Top Skills

Python
Scala

Similar Jobs

17 Days Ago
Easy Apply
Hybrid
Chennai, Tamil Nadu, IND
Easy Apply
Mid level
Mid level
Artificial Intelligence • Consumer Web • Edtech • Enterprise Web • HR Tech • Social Impact • Generative AI
As a Data Platform Engineer, you will design and build a self-service data infrastructure for Udemy's data mesh. You'll create scalable data pipelines using AWS services and tools like Airflow and Kafka, and contribute to data quality and privacy initiatives. You will work collaboratively in a dynamic environment focused on innovation.
Top Skills: JavaPythonScala
10 Days Ago
Chennai, Tamil Nadu, IND
Senior level
Senior level
Software
The Data Engineer III at PDI Technologies develops and maintains cloud-based data analytics infrastructure, designing data architecture, optimizing data pipelines, and ensuring data system reliability. They lead projects, mentor junior engineers, and collaborate with teams to meet data needs, while maintaining data security and compliance.
Top Skills: SQL
Yesterday
Chennai, Tamil Nadu, IND
Senior level
Senior level
Fintech • Financial Services
The Principal Data Engineer will develop high-quality software products and lead engineering practices. Responsibilities include actively contributing to Agile teams, mentoring junior engineers, and creating scalable solutions while influencing technical architecture and systems.
Top Skills: PysparkScala

What you need to know about the Chennai Tech Scene

To locals, it's no secret that South India is leading the charge in big data infrastructure. While the environmental impact of data centers has long been a concern, emerging hubs like Chennai are favored by companies seeking ready access to renewable energy resources, which provide more sustainable and cost-effective solutions. As a result, Chennai, along with neighboring Bengaluru and Hyderabad, is poised for significant growth, with a projected 65 percent increase in data center capacity over the next decade.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account