Data Engineer – Python & Airflow
Cambridge, United Kingdom
Job Description
As a Data Engineer working as part of our collaborative and talented team, you will work on our Python stack, which includes our data lake and processing pipelines.
This is a hybrid working opportunity and you will be required to work on a hybrid basis from our Cambridge office (CB1 2JH).
Role responsibilities
How will you make an impact?
- Design, develop, and maintain scalable and efficient data pipelines for ingesting, processing, and transforming large volumes of data.
- Optimize and tune existing data infrastructure for performance, reliability, and scalability.
- Find, extract and validate valuable data from publicly available and proprietary data sources to extend internal data sources.
- Implement data governance and security measures to ensure the integrity and confidentiality of sensitive information.
- Develop and maintain a long-term vision for key components of our data infrastructure, ensuring scalability, reliability, and performance.
- Design and implement new features, demonstrating a deep understanding of data engineering principles and industry best practices.
- Work with cloud platforms to deploy and manage data solutions.
- Collaborate with cross-functional teams to understand business needs and provide data engineering support for various projects.
- Monitor and troubleshoot data pipeline issues, ensuring timely resolution and minimal disruption to data workflows.
- Stay current with industry trends and emerging technologies in data engineering and analytics to provide input during planning discussions to help sharp the direction of projects and initiatives.
Do you have what we are looking for?
- Proven experience as a Data Engineer, with a strong portfolio of data-related projects.
- Proficiency in modern Python programming, ideally Python 3.9 or later.
- Familiarity with version control systems, such as Git.
- Solid understanding of data modelling, database design, and ETL processes.
- Experience with processing pipeline framework Airflow, or similar frameworks such as Kedro, Luigi or Argo.
- Proficiency in SQL and experience with both relational and NoSQL databases.
- Familiarity with data warehousing solutions and technologies.
- Familiarity with cloud data processing and compute services e.g. EC2, S3 in AWS or equivalent in Azure or GCP
- Experience with data processing libraries like Pandas, NumPy or Dask.
- Experience with big data technologies such as Apache Spark, Hadoop, or similar frameworks is appreciated but optional.
Do you have experience in any of our bonus areas?
- Familiarity with back-end technologies such as Alembic, FastAPI or Flask
- Experience leading and/or coordinating implementation of (data-)-engineering features
- Partnering with other technical teams to prepare technical requirements and designs
- Mentoring junior colleagues to promote technical upskilling
- A product-led engineering mindset
Apply
Career Focus: Analyst, Data Management/Data Science, Engineer
Similar Jobs
Clinical Trial Manager
Responsible for aspects of clinical development planning and strategy, c...
Associate Director, Quality Assurance
This role is a member of the GVP QA team providing Quality oversight of ...
Sr. Director, Quality Control
The Senior Director of Quality Control has full accountability and respo...
Quality Control Analyst III
The Quality Control Technical Operations (QCO) Analyst III plays a cruci...