Metadata-Driven Approach for Clinical Data Lakes

Life Sciences, Clinical Trials,
  • Thursday, March 21, 2019

Clinical data lakes store, cleanse and unify data that can come from different source systems. Some sources of data include:

  • Clinical Trial Management Systems (CTMS)
  • Electronic Data Capture (EDC)
  • Third-party data sources:
    • Files from Application Program Interface (API)-based connection
    • Trial Master File (TMF) systems
    • Electronic Medical Records (EMR) or Electronic Health Records (EHR)
    • Lab data
    • Wearable devices

As a result of these variable sources of data, clinical data lakes can become complex, although the same datasets can be used in multiple business use cases. However, more source systems feeding the data lake means greater data redundancy, resulting in maintenance and support nightmares. Moreover, as more use cases emerge for the data store in the clinical data lakes, complexities and redundancies across data models, data transformation jobs and data fetching (read operations) also emerge.

Metadata-driven design and architecture can greatly reduce or solve these challenges.

It provides the ability for a system to treat each data element based on what is available in the system and what is needed to cater to a use case, then using a repository of metadata along with dynamic Artificial Intelligence/Machine Learning (AI/ML)-driven data pipelines to ingest, standardize, transform and read datasets without creating redundancy and support or maintenance overheads.

The key components of clinical data lakes which use Metadata Driven Approach are:

  • Metadata Repository
  • Metadata Identification, Parsing Service
  • AI/ML Models as a Service for inference
  • Workflow Automation Service

Key Amazon Web Services (AWS) solutionss can be leveraged to build a robust Metadata Driven Clinical Data Lake. These include:

  • Amazon Simple Storage Service (Amazon S3)
  • AWS Lambda
  • Amazon Elastic Compute Cloud (Amazon EC2)
  • Amazon Elastic Container Service (Amazon ECS)
  • AWS Elastic Beanstalk
  • Amazon Dynamo DB and Redis
  • Amazon Redshift

Join this free webinar to learn how a metadata-driven approach will help data analysts and bioinformaticians focus on data analysis without worrying about data management-related activities. Learn how this approach helps to scale clinical data lakes by onboarding new source systems and enable more use cases without having to rebuild data pipelines or redesigning data models. All with the added benefit of easier support/maintenance and richer audit trails for governance.


Dr. Aaron Friedman, Partner Network Global Healthcare and Life Sciences Technical Lead, Amazon Web Services (AWS)

Dr. Aaron Friedman is the Amazon Web Services (AWS) Partner Network Global Healthcare and Life Sciences technical lead. He works with independent software vendors and systems integrators to build healthcare solutions on AWS and bring the best possible experience to their customers. His passion is working at the intersection of science, big data and software. Prior to working at AWS, he was the first technical employee at Human Longevity, Inc., where he built omic-guided health solutions. Aaron holds a PhD in Biomedical Sciences from the University of California, San Diego and graduated summa cum laude from Washington University at St. Louis with a Bachelor of Science in Biomedical Engineering.

Message Presenter

Krunal Patel, Vice President, Engineering, Saama

Krunal Patel is Vice President of  Engineering at Saama. He is resident tech wizard, having mastered over 40 technologies. Krunal has proven expertise in designing, building and delivering complex enterprise-grade software products mainly in the life sciences domain. His management skills have helped him lead technical and business teams, coordinate consultants and direct high-profile programs to success. Versatile in tech, projects, design and delivery, Krunal has helped businesses win through designing cutting-edge technology solutions and participating in sales cycles.

Message Presenter

Who Should Attend?

Senior professionals from large to medium mid-market pharma/biotech companies and CROs involved in:

  • Clinical Operations
  • Clinical Research & Development
  • Data Analytics, Engineering & Stewardship
  • Medical Affairs / Chief Medical Office
  • Regulatory Affairs
  • Strategic Planning

The session will benefit companies with working on Phase I-IV clinical studies.

What You Will Learn

In this webinar, attendees will learn about:

  • Adding intelligence to clinical data pipelines

Xtalks Partner

Saama Technologies

Saama Technologies is the advanced data and analytics company delivering actionable business insights for life sciences and the Global 2000. Saama is singularly focused on driving fast, flexible, impactful business outcomes for its clients through advanced data and analytics. Saama’s unique “hybrid” approach integrates focused solutions and expertise across the life sciences domain, business consulting, machine learning, automated data management, cloud and big data technologies. Saama’s approach integrates manual and disconnected data initiatives into a well-aligned roadmap facilitating the client’s journey from strategy through solution implementation.

You Must Login To Register for this Free Webinar

Already have an account? LOGIN HERE. If you don’t have an account you need to create a free account.

Create Account