Job Description
Role: Data Engineer
Responsibilities
- Co-develop and manage the unstructured data pipeline lifecycle from development to production.
- Enable NLP, entity extraction, permissibility tagging, and redaction modules for unstructured content.
- Advise and manage automation/orchestration workflows.
- Assist in onboarding new data operations engineers.
- Establish reusable data engineering patterns for consuming data from document stores.
- Data modeling and architecture design across Lakehouse zones.
- Advise on engineering standards, best practices, and frameworks.
- Enable and maintain CI/CD DataOps capabilities.
Experience & Skills
- Over 10 years of IT experience.
- 4+ recent years in Azure stack.
- 3+ recent years creating pipelines with Azure Data Factory (ADF) involving multiple activities.
- 3+ recent years developing PySpark notebooks in Azure Databricks clusters.
- 2 years with SDLC agile methodologies and CI/CD pipelines using GitHub Actions.
- Experience working with offshore and onsite teams for coordinated delivery in SDLC Agile.
- 3+ years of Scrum, stand-ups, JIRA board management, estimation, and staff augmentation at project level.
- 3+ years involved in deployment and post-deployment activities in production.
- Exposure to Retrieval-Augmented Generation (RAG), PyTorch, and Python API integration with vector databases.
- Familiarity with vectorization using OpenAI embeddings and storing data in vector databases.
Industry & Department
- Industry Type: IT Services & Consulting
- Department: Engineering - Software & QA
Employment & Education
- Employment Type: Full Time, Permanent
- Education: Any graduate or postgraduate.
Additional Technologies & Concepts
- Data AI, NLP, Generative AI, Azure