Fusemachines

Mid Level Data Engineer


PayCompetitive
LocationRemote
Employment typeContract

What's your preference?

Apply with job updates
  • Job Description

      Req#: C6w2xcwgNl

      About Fusemachines

      Fusemachines is a leading AI strategy, talent, and education services provider. Founded by Sameer Maskey Ph.D., Adjunct Associate Professor at Columbia University, Fusemachines has a core mission of democratizing AI. With a presence in 4 countries (Nepal, United States, Canada, and Dominican Republic and more than 400 full-time employees). Fusemachines seeks to bring its global expertise in AI to transform companies around the world.

      About the role:

      This is a remote contract position of 3 months - 6 months who will be responsible for designing, building, testing and maintaining the infrastructure and code required for data integration, storage, processing, and analytics (BI, visualization and Advanced Analytics).

      We are looking for a skilled Data Engineer with a strong background in Python, SQL, Pyspark, Databricks, Apache Kafka, Snowflake and Azure cloud-based large scale data applications with a passion for data quality, performance and cost optimization. The ideal candidate will develop in an Agile environment, contributing to the architecture, design, implementation and testing of Data products in the Healthcare Insurance Industry. This role involves hands-on coding and collaboration with multi-disciplined teams to achieve project objectives, delivering a solution for Pricing Information Published under the Transparency in Coverage Final Rule.

      Qualification & Experience

      • Must have a full-time Bachelor's degree in Computer Science or similar
      • At least 3 years of experience as a data engineer with strong expertise in Python, Databricks, Snowflake, Apache Kafka, Azure or other hyperscalers.
      • 3+ years of experience with Azure DevOps, GitHub.
      • Proven experience delivering large scale projects and products for Data and Analytics, as a data engineer.
      • Following certifications:
        • Microsoft Certified: Azure Fundamentals
        • Microsoft Certified: Azure Data Engineer Associate
        • Databricks Certified Associate Developer for Apache Spark
        • Databricks Certified Data Engineer Associate
        • SnowPro Core Certification
        • SnowPro Advanced Data Engineer
        • Microsoft Certified: Azure Solutions Architect Expert: nice to have

      Required skills/Competencies

      • Strong programming Skills in one or more languages such as Python (must have), Scala, and proficiency in writing efficient and optimized code for data integration, storage, processing and manipulation.
      • Strong understanding and experience with SQL and writing advanced SQL queries.
      • Thorough understanding of big data principles, techniques, and best practices.
      • Solid Databricks development experience with significant Python, PySpark, Spark SQL, Pandas, NumPy in Azure environment.
      • Strong experience with scalable and distributed Data Processing Technologies such as Spark/PySpark (must have: experience with Azure Databricks is a plus), DBT and Kafka, to be able to handle large volumes of data.
      • Strong experience in designing and implementing efficient ELT/ETL processes in Azure and Databricks and using open source solutions being able to develop custom integration solutions as needed.
      • Skilled in Data Integration from different sources such as APIs, databases, flat files, Apache Parquet, event streaming, with technologies such as Databricks, Azure Data Factory.
      • Expertise in data cleansing, transformation, and validation.
      • Proficiency with Relational Databases (Oracle, SQL Server, MySQL, Postgres, or similar) and NonSQL Databases (MongoDB).
      • Good understanding of Data Modeling and Database Design Principles. Being able to design and implement efficient database schemas that meet the requirements of the data architecture to support data solutions.
      • Strong experience in designing and implementing Data Warehousing, data lake and data lake house, solutions in Azure, Databricks and Snowflake.
      • Good experience with Delta Lake, Unity Catalog, Delta Sharing, Delta Live Tables (DLT).
      • Strong understanding of the software development lifecycle (SDLC), especially Agile methodologies.
      • Strong knowledge of SDLC tools and technologies Azure DevOps and GitHub, including project management software (Jira, Azure Boards or similar), source code management (GitHub, Azure Repos or similar), CI/CD system (GitHub actions, Azure Pipelines, Jenkins or similar) and binary repository manager (Azure Artifacts or similar).
      • Strong understanding of DevOps principles, including continuous integration, continuous delivery (CI/CD), infrastructure as code (IaC – Terraform), configuration management, automated testing, performance tuning and cost management and optimization.
      • Deep knowledge in cloud computing specifically in Microsoft Azure services related to data and analytics, such as Azure Data Factory, Azure Databricks, Azure Functions, Azure Synapse Analytics, Azure Stream Analytics, SQL Server, Azure Blob Storage, Azure Data Lake Storage, Azure SQL Database, etc.
      • Experience in Orchestration using technologies like Databricks workflows and Apache Airflow.
      • Strong knowledge of data structures and algorithms and good software engineering practices.
      • Strong analytical skills to identify and address technical issues, performance bottlenecks, and system failures.
      • Proficiency in debugging and troubleshooting issues in complex data and analytics environments and pipelines.
      • Good understanding of Data Quality and Governance, including implementation of data quality checks and monitoring processes to ensure that data is accurate, complete, and consistent.
      • Experience with BI solutions such as PowerBI is a plus.
      • Good Problem-Solving skills: being able to troubleshoot data processing pipelines and identify performance bottlenecks and other issues.
      • Effective communication skills to collaborate with cross-functional teams, including business users, data architects, DevOps engineers, data analysts, data scientists, developers, and operations teams.
      • Ability to document processes, procedures, and deployment configurations.
      • Understanding of security practices, including network security groups, Azure Active Directory, Azure Key Vault, encryption, and compliance standards.
      • Ability to implement security controls and best practices within data and analytics solutions, including proficient knowledge and working experience on various cloud security vulnerabilities and ways to mitigate them.
      • Self-motivated with the ability to work well in a team.
      • Possesses strong leadership skills with a willingness to lead, create Ideas, and be assertive.
      • A willingness to stay updated with the latest services, Data Engineering trends, and best practices in the field.
      • Comfortable with picking up new technologies independently and working in a rapidly changing environment with ambiguous requirements.
      • Care about architecture, observability, testing, and building reliable infrastructure and data pipelines.

      Responsibilities

      • Design, develop, test and maintain high-performance, large-scale, complex data architectures, which support data integration (batch and real-time, ETL and ELT patterns from heterogeneous data systems: APIs and platforms), storage (data lakes, warehouses, data lake houses, etc), processing, orchestration and infrastructure. Ensuring the scalability, reliability, and performance of data systems.
      • Contribute to detailed design, architectural discussions, and customer requirements sessions.
      • Actively participate in the design, development, and testing of big data products..
      • Construct and fine-tune Apache Spark jobs and clusters within Databricks and Azure platform.
      • Assess best practices and design schemas that match business needs for delivering a modern analytics solution (descriptive, diagnostic, predictive, prescriptive) within Snowflake.
      • Design and develop clear, maintainable code with automated testing using Pytest, unittest, integration tests, performance tests, regression tests, etc.
      • Collaborating with cross-functional teams and Product, Engineering, Data Scientists and Analysts to understand data requirements and develop data solutions, including reusable components meeting product deliverables.
      • Evaluating and implementing new technologies and tools to improve data integration, data processing, storage and analysis.
      • Evaluate, design, implement and maintain data governance solutions: cataloging, lineage, data quality and data governance frameworks that are suitable for a modern analytics solution, considering industry-standard best practices and patterns.
      • Ensure data quality and accuracy.
      • Assess best practices and design schemas that match business needs for delivering a modern analytics solution (descriptive, diagnostic, predictive, prescriptive).
      • Design, Implement and maintain data security and privacy measures.
      • Mentoring and guiding junior data engineers.
      • Be an active member of an Agile team, participating in all ceremonies and continuous improvement activities, being able to work independently as well as collaboratively.

      Fusemachines is an Equal Opportunities Employer, committed to diversity and inclusion. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or any other characteristic protected by applicable federal, state, or local laws.

  • About the company

      Fusemachines, with a mission to Democratize AI, is making AI accessible to everyone with enterprise AI services and high-quality AI education.