Blend is a premier AI services provider, committed to co-creating meaningful impact for its clients through the power of data science, AI, technology, and people. With a mission to fuel bold visions, Blend tackles significant challenges by seamlessly aligning human expertise with artificial intelligence. The company is dedicated to unlocking value and fostering innovation for its clients by harnessing world-class people and data-driven strategy. We believe that the power of people and AI can have a meaningful impact on your world, creating more fulfilling work and projects for our people and clients. For more information, visit www.blend360.com
Job DescriptionYou will be a key member of our Data Engineering team, focused on designing, developing, and maintaining robust data solutions on on-premise environments. You will work closely with internal teams and client stakeholders to build and optimize data pipelines and analytical tools using Python, PySpark, SQL, and Hadoop ecosystem technologies. This role requires deep hands-on experience with big data technologies in traditional data center environments (non-cloud).
What you’ll be doing?
- Design, build, and maintain on-premise data pipelines to ingest, process, and transform large volumes of data from multiple sources into data warehouses and data lakes
- Develop and optimize PySpark and SQL jobs for high-performance batch and real-time data processing
- Ensure the scalability, reliability, and performance of data infrastructure in an on-premise setup
- Collaborate with data scientists, analysts, and business teams to translate their data requirements into technical solutions
- Troubleshoot and resolve issues in data pipelines and data processing workflows
- Monitor, tune, and improve Hadoop clusters and data jobs for cost and resource efficiency
- Stay current with on-premise big data technology trends and suggest enhancements to improve data engineering capabilities
- Bachelor’s degree in Computer Science, Software Engineering, or a related field
- 5+ years of experience in data engineering or a related domain
- Strong programming skills in Python (with experience in PySpark)
- Expertise in SQL with a solid understanding of data warehousing concepts
- Hands-on experience with Hadoop ecosystem components (e.g., HDFS, Hive, Oozie, Sqoop)
- Proven ability to design and manage data solutions in on-premise environments (no cloud dependency)
- Strong problem-solving skills with an ability to work independently and collaboratively
- Excellent communication skills and ability to engage with technical and non-technical stakeholders