Clarifai Logo

Clarifai

AI Data Lead

Job Posted 18 Hours Ago Reposted 18 Hours Ago
Be an Early Applicant
Remote
Hiring Remotely in India
Mid level
Remote
Hiring Remotely in India
Mid level
The AI Data Lead will manage the process of creating datasets for AI models, focusing on video and image data initially and expanding to text data. Responsibilities include strategy development, pipeline management, and quality assurance for data labeling.
The summary above was generated by AI
About Clarifai

Clarifai is a leading, full-lifecycle deep learning AI platform for computer vision, natural language processing, and audio recognition. We help organizations transform unstructured images, video, text, and audio data into structured data at a significantly faster and more accurate rate than humans would be able to do on their own. Founded in 2013 by Matt Zeiler, Ph.D. Clarifai has been a market leader in AI since winning the top five places in image classification at the 2013 ImageNet Challenge. Clarifai continues to grow with employees remotely based throughout the United States and in Tallinn, Estonia.

We have raised $100M in funding to date, with $60M coming from our most recent Series C, and are backed by industry leaders like Menlo Ventures, Union Square Ventures, Lux Capital, New Enterprise Associates, LDV Capital, Corazon Capital, Google Ventures, NVIDIA, Qualcomm and Osage.

Clarifai is proud to be an equal opportunity workplace dedicated to pursuing, hiring, and retaining a diverse workforce.

Impact

We believe that world-class AI is built on a foundation of world-class data. The AI Data Lead for will own the critical, end-to-end process of creating and curating the high-quality datasets that fuel our models. You will be a power user of Clarifai's suite of automated data labeling products, providing direct feedback to our product and engineering teams to drive continuous improvement.

Initially, this role will concentrate on building our next-generation vision datasets, with a heavy emphasis on full-motion video. Over time, the scope will strategically expand to include the development of our large-scale language datasets for advanced NLP models.

Opportunity
  1. Dataset Strategy & Pipeline Development:
  • Collaborate with ML and product teams to define data requirements, starting with complex video and image use cases and expanding into text and language.
  • Design and execute a comprehensive strategy for data acquisition and augmentation.
  • Build, scale, and maintain robust data pipelines to ingest, process, and version large-scale multimedia datasets.
  1. Third-Party Labeling & Internal Tool Management (Primary Focus):
  • Leverage Clarifai's automated and AI-assisted labeling tools to efficiently pre-label data and manage human-in-the-loop workflows.
  • Serve as the primary lead for external data labeling vendors who will often verify or enrich AI-generated labels, ensuring projects are on time and within budget.
  • Author crystal-clear labeling instructions for complex tasks, from object tracking in video to, eventually, named entity recognition in text.
  • Implement and manage a rigorous quality assurance (QA) framework for both AI- and human-generated labels.
  1. Product Feedback & Improvement Loop:
  • Act as a key internal customer for Clarifai's data labeling products.
  • Provide structured, expert feedback to our product and engineering teams to identify bugs, suggest feature enhancements, and guide the product roadmap.
  • Continuously evaluate and pioneer new strategies for combining automated labeling with human verification to maximize quality and efficiency.
  1. Leadership & Collaboration:
  • Lead and mentor a focused set of data labeling partners.
  • Foster a culture of data excellence, ownership, and continuous improvement.
  • Communicate project status, challenges, and outcomes effectively to all stakeholders. Keep track of budgets.
Requirements
  • 3+ years in data engineering, with a proven history of building and managing complex data pipelines.
  • Direct, hands-on experience managing third-party data labeling services or in-house annotation teams.
  • Experience working with large-scale vision datasets (image or video).
  • Deep understanding of data labeling processes and quality metrics.
  • Strong proficiency in Python and SQL.
  • Experience with cloud data services (AWS, GCP, or Azure).
  • Exceptional project management, communication, and vendor management skills.
  • A meticulous eye for detail and an unwavering commitment to data quality.
Great to Have
  • Specific experience with the complexities of full-motion video datasets and annotation (e.g., temporal consistency, event tagging).
  • Experience in an environment where you regularly used internal tools and provided feedback for their improvement ("dogfooding").
  • Experience with large-scale language or text datasets.
  • Previous experience in a technical leadership or mentorship role.
  • Experience using a variety of data annotation platforms and tools.

Top Skills

AWS
Azure
GCP
Python
SQL

Similar Jobs

18 Hours Ago
Remote
India
Senior level
Senior level
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
The Senior Site Reliability Engineer will enhance system reliability, build automation, collaborate on cloud deployments, and mentor engineers to improve software reliability.
Top Skills: AWSAzureDatadogDockerEc2GCPGoKibanaKubernetesRubyTerraform
18 Hours Ago
Remote
India
Senior level
Senior level
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
The Staff Site Reliability Engineer will enhance software reliability, automate operations, educate teams, and improve cloud deployments at Coinbase, fostering a culture of excellence in engineering and reliability.
Top Skills: AWSAzureDatadogDockerEc2GCPGoKibanaKubernetesRubyTerraform
18 Hours Ago
Easy Apply
Remote
India
Easy Apply
Senior level
Senior level
Artificial Intelligence • Fintech • Hardware • Information Technology • Sales • Software • Transportation
The Senior Salesforce Developer will build and maintain Salesforce systems, implement technical solutions, collaborate with teams, and optimize existing implementations while adhering to best practices and CI/CD processes.
Top Skills: ApexBulk ApiCi/CdGearsetGitJavaScriptNetSuiteRest ApiSalesforceSfdxSoap ApiSOQLSoslVisualforceZuora Billing

What you need to know about the Chennai Tech Scene

To locals, it's no secret that South India is leading the charge in big data infrastructure. While the environmental impact of data centers has long been a concern, emerging hubs like Chennai are favored by companies seeking ready access to renewable energy resources, which provide more sustainable and cost-effective solutions. As a result, Chennai, along with neighboring Bengaluru and Hyderabad, is poised for significant growth, with a projected 65 percent increase in data center capacity over the next decade.
By clicking Apply you agree to share your profile information with the hiring company.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account