Arrow Left What Does "Good" Look Like In RWE Research?

Establishing the Fundamentals for Data Integrity and Governance

A standardised research framework that incorporates current best practices to document the assessment of the real-world data quality, and its fitness for purpose for regulatory and HTA decision-making. The recommendations are organised into three key elements (data extraction, data curation and data characterisation), with the relevant prioritisation for each level of recommendations and rationale included.

This page lists all subrecommendations linked to the overarching recommendation 2, "Establishing the Fundamentals for Data Integrity and Governance". You can use the tiles below to jump directly to a specific subrecommendation. 

Subrecommendation 2.1: Data Characterisation

Systematic and standardized approaches to evaluating and documenting the properties, quality, and limitations of data sources will enable assessment of representativeness to the target population, period, and outcomes of interest, while identifying potential biases and constraints.

Rationale

Data sources for RWE can be heterogenous and may sometimes be not fit for purpose. Adequate data characterisations enable the evaluation of the dataset’s representativeness to the target population, the time period and any outcomes of interest or relevance. Systematic and structured reporting of the dataset strengths, limitations and potential biases enhances its transparency and reliability.

Details

The study development should include, as a minimum: 

Essential
  1. Data Source Context:
    1. Document complete data source context, including:
      1. Purpose of original data collection.
      2. Incentives or regulations that may affect data recording.
      3. Changes in data collection over time.
  2. Fitness-for-Purpose Assessment
    1. Provide structured evaluation of:
      1. Alignment between variables and endpoints.
      2. Target population representativeness.
      3. Endpoint validity in real-world context.
      4. Power/sample size adequacy for the research question.
  3. Missing Data Analysis
    1. Conduct and document thorough missing data analysis, including:
      1. Potential impacts of missing data and limitations on study conclusions.
      2. Sensitivity analyses using different missing data approaches.
      3. Transparent reporting of missingness by key variable and whether they invalidate research objectives. 
Important 
  1. Representative Population Analysis
    1. Document population characteristics, including:
      1. Demographic comparison to target population.
      2. Selection factors influencing data collection.
      3. Coverage of relevant subpopulations.
      4. Generalisability assessment
  2. Variable Validity Assessment
    1. For key study variables, document:
      1. Comparison to gold standard measures (if applicable).
      2. Validation studies (if available).
      3. Known measurement errors or biases.
      4. Construct validity assessment (e.g. observed vs expected distributions).
Optional
  1. Data Lineage Visualisation
    1. Provide visual representation of:
      1. Data flows from source to analysis.
      2. Transformation and linkage processes.
      3. Quality check points.
      4. Decision points in data processing

Subrecommendation 2.2: Data Extraction

Transparent and standardized approaches to retrieving relevant information and preparing data for analysis will demonstrate alignment with study objectives, reduce risks of bias, and ensure consistency and reliability.

Rationale

In RWE studies, transparent and well-documented extraction is essential to show how the data align with study objectives, ensure reproducibility, and minimises risks of bias. Inconsistent documentation of data extraction practices can undermine the credibility of findings and limit their use in decision-making.  

Details

The study development should include, as a minimum: 

Essential
  1. Study Objective Alignment, (mainly researchers and study sponsors), including:
    1. Document how the extraction plan aligns with specific study objectives and research questions.
    2. Demonstrate how extracted variables directly address the research question and study endpoints.
    3. Identify and justify any indirect variables in relation to the research question.
  2. Dataset Documentation
    1. Record complete dataset provenance, including:
      1. Dataset name and version.
      2. Data provider and extraction date.
      3. Timeframe covered (start/end dates).
  3. Data Transformation Documentation
    1. Document all data transformations, including:
      1. Coding algorithms for derived variables.
      2. Natural language processing or AI methodologies used.
      3. Version control for extraction code/queries.
Important
  1. Implement and document Quality Controls in the study documents, including:
    1. Validation procedures for extraction tools.
    2. Procedures for handling missing or inconsistent data.
    3. Audit trails of extraction processes.
Optional
  1. Alternative Extraction Strategies, including:
    1. Document consideration of alternative extraction approaches.
    2. Justify selection of final extraction methodology.
    3. Report sensitivity analyses using alternative extraction methods.

Subrecommendation 2.3: Data Curation

Cleaning, standardizing, and organizing data to meet quality standards for analysis will strengthen the validity and reproducibility of results.

Rationale

Data curation steps are essential for maintaining data integrity and ensuring that study results are reproducible and reliable. Without systematic curation practices, issues such as missing values, coding inconsistencies, or undocumented transformations can compromise the validity of findings.

Details

The study development should include, as a minimum: 

Essential
  1. Comprehensive Metadata Management
    1. Document complete metadata, including:
      1. Standard terminologies used.
      2. Mapping to reference terminologies where applicable.
      3. Versioning of terminologies.
      4. Relationship between structured and unstructured data elements.
  2. Data Cleaning Protocols
    1. Document systematic cleaning procedures, including:
      1. Outlier detection and handling methods.
      2. Missing data assessment and imputation strategies.
      3. Duplicate identification and resolution.
      4. Inconsistency detection and correction.
    2. Provide rationale for all data modification decisions.
  3. Multi-source Data Harmonisation
    1. For studies combining multiple data sources, including:
      1. Document variable harmonisation procedures.
      2. Map and validate common data elements across sources.
      3. Assess and document completeness of data mapping.
      4. Address temporal alignment issues between sources.
Important
  1. Data Quality Metric Reporting
    1. Implement and report standardised quality metrics:
      1. Completeness (% of missing values by variable).
      2. Consistency (internal validation checks passed).
      3. Plausibility (% of values within expected ranges).
      4. Timeliness (lag between data collection and availability).
  2. Issue Resolution Documentation
    1. Maintain structured documentation of:
      1. Data quality issues identified.
      2. Methods used to address issues.
      3. Impact assessment of data quality problems.
      4. Unresolved limitations remaining after curation.
Optional

These elements represent best practices that further enhance data extraction transparency and trust. They may be useful for complex studies, novel data sources, or submissions where extraction methods may significantly impact results.

  1. Curation Code Transparency
    1. Share with decision-makers the curation code or detailed workflows.
    2. Document software and versions used.
    3. Provide change logs for iterative curation processes.
This page belongs to a series of pages about the IDERHA report "Recommendations on policies to support the acceptance of heterogeneous health data research in regulatory and HTA decision-making", published in November 2025. The full report is available as a PDF, or you can visit the page with an executive summary.
Share this page…