Fusemachines

Data Engineer


PayCompetitive
LocationRemote
Employment typeContract

This job is now closed

  • Job Description

      Req#: RSDhAWdTxn

      About Fusemachines

      Fusemachines is a leading AI strategy, talent, and education services provider. Founded by Sameer Maskey Ph.D., Adjunct Associate Professor at Columbia University, Fusemachines has a core mission of democratizing AI. With a presence in 4 countries (Nepal, United States, Canada, and Dominican Republic and more than 450 full-time employees) Fusemachines seeks to bring its global expertise in Data, Analytics and AI to transform companies around the world.

      About the role:

      We are seeking a highly skilled and experienced AWS Data Engineer to join our team on a remote, contract basis, working in easter/central US hours. In this role, you will be responsible for designing, building, and maintaining the infrastructure and code required for data integration, storage, and processing in an efficient and optimized manner, for a legal company focused on automation of processes using AI.

      Qualification & Experience

      • Must have a full-time Bachelor's degree in Computer Science or similar
      • At least 3 years of experience as a data engineer with strong expertise in Python, AWS and APIs.
      • 3+ years of experience with Azure DevOps, GitHub and Jenkins.
      • Proven experience delivering large scale projects and products for Data and Analytics, as a data engineer.
      • Following certifications:
        • AWS Certified Cloud Practitioner
        • AWS Certified Data Engineer - Associate
        • Databricks Certified Associate Developer for Apache Spark: nice to have
        • Databricks Certified Data Engineer Associate: nice to have

      Required skills/Competencies

      • Strong programming Skills in one or more languages such as Python (must have), Scala, and proficiency in writing efficient and optimized code for data integration, storage, processing and manipulation.
      • Strong understanding and experience with SQL and writing advanced SQL queries.
      • Thorough understanding of big data principles, techniques, and best practices.
      • Strong experience with scalable and distributed Data Processing Technologies such as Spark/PySpark (must have, experience with Databricks is a plus), DBT and Kafka, to be able to handle large volumes of data.
      • Strong experience in designing and implementing efficient ELT/ETL processes being able to develop custom integration solutions as needed.
      • Skilled in Data Integration from different sources such as APIs, databases, flat files, Apache Parquet, event streaming.
      • Expertise in data cleansing, transformation, and validation.
      • Proficiency with Relational Databases (Oracle, SQL Server, MySQL, Postgres, or similar) and NonSQL Databases (MongoDB).
      • Good understanding of Data Modeling and Database Design Principles. Being able to design and implement efficient database schemas that meet the requirements of the data architecture to support data solutions.
      • Strong experience in designing and implementing Data Warehousing, data lake and data lake house, solutions in AWS.
      • Strong understanding of the software development lifecycle (SDLC), especially Agile methodologies.
      • Strong knowledge of SDLC tools and technologies, including project management software (Jira, Azure Boards or similar), source code management (GitHub, Azure Repos or similar), CI/CD system (GitHub actions, AWS CodeBuild, AWS CodePipeline, AWS Azure Pipelines, Jenkins or similar) and binary repository manager (AWS CodeArtifact, Azure Artifacts or similar).
      • Strong understanding of DevOps principles, including continuous integration, continuous delivery (CI/CD), infrastructure as code (IaC – Terraform), configuration management, automated testing, performance tuning and cost management and optimization.
      • Deep knowledge in cloud computing specifically in AWS services related to data and analytics, such as Glue, SageMaker, Redshift, Lambda, Kinesis, S3, Lake Formation, EC2, ECS/ECR, EKS, IAM, CloudWatch, CosmoDB, etc
      • Experience in Orchestration using technologies like Databricks workflows and Apache Airflow.
      • Strong knowledge of data structures and algorithms and good software engineering practices.
      • Experience building scalable, production-ready REST APIs.
      • Knowledge of API design, testing and documentation.
      • Experience with Python web frameworks like Flask and FastAPI.
      • Nice to have: Experience with EKS and Helm Charts.
      • Strong analytical skills to identify and address technical issues, performance bottlenecks, and system failures.
      • Proficiency in debugging and troubleshooting issues in complex data and analytics environments and pipelines.
      • Good understanding of Data Quality and Governance, including implementation of data quality checks and monitoring processes to ensure that data is accurate, complete, and consistent.
      • Good Problem-Solving skills: being able to troubleshoot data processing pipelines and identify performance bottlenecks and other issues.
      • Effective communication skills to collaborate with cross-functional teams, including business users, data architects, DevOps engineers, data analysts, data scientists, developers, and operations teams.
      • Ability to document processes, procedures, and deployment configurations.
      • Understanding of security practices, including network security groups, encryption, and compliance standards.
      • Ability to implement security controls and best practices within data and analytics solutions, including proficient knowledge and working experience on various cloud security vulnerabilities and ways to mitigate them.
      • Self-motivated with the ability to work well in a team.
      • A willingness to stay updated with the latest services, Data Engineering trends, and best practices in the field.
      • Comfortable with picking up new technologies independently and working in a rapidly changing environment with ambiguous requirements.
      • Care about architecture, observability, testing, and building reliable infrastructure and data pipelines.
      • Ability to quickly ramp up and contribute to an existing codebase.

      Responsibilities:

      • Follow established design and constructed data architectures. Developing and maintaining data pipelines, ensuring data flows smoothly from source to destination.
      • Handle ELT/ETL processes, including data extraction, loading and transformation.
      • Contribute to data quality assurance efforts, such as implementing data validation checks and tests to ensure accuracy, completeness and consistency of data.
      • Test software solutions and meet product quality standards prior to release to QA.
      • Ensure the reliability, scalability, and efficiency of data systems are maintained at all times. Identifying and resolving performance bottlenecks in pipelines due to data, queries and processing workflows to ensure efficient and timely data delivery.
      • API design, testing and documentation.
      • Work with DevOps teams to optimize our resources, such as storage.
      • Provide guidance and mentorship to junior data engineers, fostering their professional growth.
      • Assist in the configuration and management of data warehousing and data lake solutions.
      • Collaborate closely with cross-functional teams including Product, Engineering, Data Scientists, and Analysts to thoroughly understand data requirements and provide data engineering support.
      • Takes ownership of storage layer, database management tasks, including schema design, indexing, and performance tuning.
      • Evaluate and implement cutting-edge technologies and continue learning and expanding skills in data engineering and cloud platforms.
      • Develop, design, and execute data governance strategies encompassing cataloging, lineage tracking, quality control, and data governance frameworks that align with current analytics demands and industry best practices working closely with Data Architect.
      • Ensure technology solutions support the needs of the customer and/or organization.
      • Define and document data engineering processes and data flows.
      Fusemachines is an Equal Opportunities Employer, committed to diversity and inclusion. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or any other characteristic protected by applicable federal, state, or local laws.
  • About the company

      Fusemachines, with a mission to Democratize AI, is making AI accessible to everyone with enterprise AI services and high-quality AI education.