Download PDF

Clinical Data Archiving

Data Archiving

Introduction:

Clinical data archiving includes the planning, implementing and maintaining of a repository of documents and records that contain clinical data together with any interpretive information from a clinical trial.

  • The clinical data archive should include a centralized table of contents for all studies.
  • Clinical trial documents must be maintained for a period of two years after completion of Clinical trials

Scope

This section provides an outline to help clinical data managers develop an archiving strategy. It includes a discussion of the regulatory requirements surrounding clinical data archives, a description of the components of an archive and information about data formats that can be used to support that archive. This document focuses on the components of the study archive that are the responsibility of data management. There is no discussion about the archiving of study documents such as the study protocol and other regulatory documents, as these sections seldom, if ever, are a data management responsibility.

A summary of the types of information that should be included in a clinical data archive

Clinical data: All data collected in the trial. This includes both CRF data and data that is collected externally (i.e., labs, ECGs or electronic patient diaries).

External data: For data that is collected externally and loaded into a CDMS system, the archive should include all of the load files.

Coding dictionaries: a copy of the dictionary should be included.

Lab ranges: Laboratory reference ranges. If more than one version of reference ranges were used in the course of the trial, each version should be retained in the archive.

Audit trail: Entire contents of the study audit trail.

Queries: Copies of all queries, query correspondence and query resolutions. Paper queries may be scanned and indexed.

CRF images in PDF format: for paper-based trials, scanning the forms and converting them to PDF format typically obtain CRF images. For Electronic Clinical Trials, the EDC/M Application may create PDF images of the electronic forms.

Minimum Standards

  • The clinical data archive should include a centralized table of contents for all studies.
    • The accessibility of the clinical data archive should be tested following every major upgrade of the active clinical data management system.

Best Practices

  • All clinical data, metadata, administrative data and reference data should be maintained in an industry standard, open systems format, such as CDISC ODM.
  • An electronic repository links all study components including the clinical data, CRF (Case Report Form) images in PDF form, program files, validation records and regulatory documentation.
  • The audit trail should be stored in open format files in a secure file system location.
  • Copies of all user and system documentation for any applications used to collect or manage clinical data are retained in the corporate library or archive facility.
  • Reports describing the study metadata, including data structures, edit check descriptions, lab-loading specifications are printed and stored in a central document library.
  • The study validation binder should be included in the document library.
  • System security reports, including user listings, access rights and the dates of authorization, should be printed and filed or scanned.
  • The edit check archive should include all program code for edit checks, functions and sub-procedures together with a copy of the version control information.
  • Paper CRFs should be scanned and indexed. If an EDC, Electronic Data Capture, system is used, entry screens should be archived as PDF.

Background

Most clinical data is collected as part of an effort to submit a licensing application to the FDA – either to CDER or CBER. The ICH GCP requirements stipulate that data collected in a clinical trial must be maintained for a period of two years either following the last regulatory submission or following a decision to discontinue development of a compound/biologic or medical device. To meet this requirement, as well as to ensure that the sponsor is able to answer questions relating to the clinical trial data that may emerge many years after the trial is conducted, it is important to archive clinical data.

Historically, the most common mechanism for long term clinical data storage has been to extract the final data from the clinical data management system into SAS datasets. The extracted SAS datasets are still an important component of the clinical data archive however, with the increasing importance of electronic regulatory submissions in recent years, requirements for clinical data archives are changing. As a result, clinical records that are part of an electronic submission must now comply with the 21 CFR Part 11 ruling, which was originally published in 1997. Part 11 enforces specific requirements with respect to authentication and auditing of electronic records. In addition, the FDA’s Guidance for Computer Systems Used in Clinical Trials defines requirements for data archiving. This guidance was published in 1999 as a guide to the interpretation of the Part 11 policies and other related policies. To fully meet the requirements of these regulations and guidelines, a more comprehensive archiving strategy is needed.

Regulations and Guidance

The 21 CFR Part 11 ruling includes no specific requirements for data retention or data archiving capabilities. However, the FDA has made it clear that the intent of the ruling is to supplement the Predicate rules and ICH GCP requirements for those cases where electronic records are either directly or indirectly part of an electronic submission.

Guidance documents with specific mention of archive and record retention requirements include:

  • Guidance for Industry: Computer Systems Used in Clinical Trials (CSUCT) published by the FDA in 1999. This document describes fairly stringent requirements surrounding the need to preserve the systems environment in which electronic records are captured and managed.
  • ICH Good Clinical Practice (Section 5 Investigator requirements) provides information about record retention requirements.
  • Draft Guidance for Industry: 21 CFR Part 11 Electronic Records; Electronic Signatures Maintenance of Records. This Draft Guidance, published in July 2002 addresses some of the concerns raised by industry representatives about the stringency of the CSUCT guidance with respect to archiving and describes an alternate strategy involving migration of systems.

Regulatory Guidance is being actively developed in the area of electronic records handling. Before finalizing your clinical data archive design, it is important to consult with the Regulatory Affairs specialists within your organization to ensure your design approach is consistent with the organizations’ regulatory policies. A well-designed clinical data archive can facilitate compliance with the long-term data access requirements of the regulations – for paper based or for electronic clinical trials.

Archive Contents

In order for an auditor to successfully reconstruct a clinical trial, an auditor must be able to view not only the clinical data, but also the manner in which the data is obtained and managed.

Many electronic data records are obtained using data entry screens from an Electronic Data Capture (EDC) or Clinical Data Management (CDM) system. In order to recreate the manner in which data are collected, it is necessary to be able to demonstrate the way that the data entry screens looked and behaved during the entry process. Fortunately, most data collection systems are capable of providing data entry screen printouts both with and without the clinical data. For systems that provide on-line edit checking during the entry process, metadata about the edit checks – including the field where the check is applied, the program code of the actual check, the dates when the check was active – should be part of the archive as well.

In many trials a large volume of data may come from external systems such as lab test results from a central lab or ECG data from an ECG core lab. External data of this type is typically batch loaded into an EDC or a CDM system. In order to re-create this data collection process, load program specifications, logs from the operation of the loading programs and all of the interim load files should be retained.

Once data has been entered into an in-house electronic record, it may be edited to correct transcription errors or transformed as part of a statistical calculation. Enough information must be retained in the archive to trace the data and any modifications to the data. Enough information must also be retained to demonstrate that all modifications to the data have been made in accordance with all applicable guidelines and regulations. This will include the system audit trail, discrepancy logs, queries, query replies and query resolutions.

For data that is managed externally, but which is loaded into an in-house system for reconciliation, reviews or other purposes, it is generally sufficient to limit the archive to the actual data and any information pertaining to how the data is managed internally. The vendor can do archiving of any records that reflect how the data is managed in the external vendor’s system. The trial sponsor is ultimately responsible for ensuring that any vendor, who provides trial data, works in accordance with regulatory requirements. Therefore, the sponsor should ensure that any signed contract with a vendor includes a section on archiving. The information in this section should comply with both sponsor and regulatory requirements.

A summary of the types of information that should be included in a clinical data archive is provided in the table on the following page:

Archive Component Requirement
Clinical data All data collected in the trial. This includes both CRF data and data that is collected externally (i.e., labs, ECGs or electronic patient diaries).
External data For data that is collected externally and loaded into a CDMS system, the archive should include all of the load files.
Structural metadata Information about the structure of the clinical data. Typically this will be information about the tables, variable item names, forms, and visits and any other objects. It also includes codelists.
Coding dictionaries If data has been autoencoded, using a company dictionary or synonym table, a copy of the dictionary should be included.
Lab ranges Laboratory reference ranges. If more than one version of reference ranges were used in the course of the trial, each version should be retained in the archive.
Audit trail Entire contents of the study audit trail. It is essential that the study audit trail be included in the archive in a tamper-proof format.
Listings of edit checks, derived data Edit check definitions. These may be provided either as program listing files or as a report from the study definition application.
Discrepancy management logs Listings of records that failed edit checks together with information on how the discrepancies were managed during the course of the study.
Queries Copies of all queries, query correspondence and query resolutions. Paper queries may be scanned and indexed.
Program code Program code from data quality checking programs, data derivations and statistical analyses performed with the clinical data. Program documentation should be stored. Ideally, the program documents should be done online and indexed or hyperlinked.
CRF images in PDF format For paper-based trials, CRF images are typically obtained by scanning the forms and converting them to PDF format. For Electronic Clinical Trials, PDF images of the electronic forms may be created by the EDC/M Application.
Data management plan PDF or paper version of MS Word and Power Point documents containing the study data management plan.
Study validation documentation Contents are described in the GCDMP chapter on systems validation. This document may be in paper or electronic form.

Technical Requirements

Designing a Clinical Data Archive for long-term accessibility presents a challenge in the face of proprietary applications, tools and platforms. As technology evolves, vendors provide new versions of their systems; however, they are not always economically motivated to ensure backward compatibility. The older a file, the more likely the file format will not be readable using the current version of a system. For this reason, the ideal clinical data archive should be based on standards and open systems.

The open formats that are typically used for clinical study archives are described in the table below. No single format is ideal in all circumstances. Due to the fact that a study archive will usually include many different types of information, it will most likely include multiple formats. The format chosen for each type of information should be based on the likely future use of the information. For example, if clinical data will need to be re-analyzed, it should be archived in a format that facilitates loading into a database or analysis tool.

Format Description Pros and Cons
Comma Separated Values (CSV) Plain ASCII text with commas used field delimiters. CSV files can be edited with text editors, word processors and spreadsheet programs such as Microsoft Excel. Pros: Conceptually straightforward. Can be readily imported into almost any database.Cons: Requires separate handling of metadata, administrative data and audit trails.
XML Extensible Markup Language. Vendor independent, ASCII based technology for transfer of structured information between dissimilar systems. Used as the basis for the CDISC Operational Data Model. Pros: Open standard created specifically for clinical trial data. Can include structural metadata, administrative data and clinical data within a single file.Cons: Still unfamiliar to many data managers and IT staff.
SAS Version 5 transport files Open source format provided by SAS corporation. Commonly used for submitting clinical data to the FDA. Can be read by the SAS Viewer that is distributed free of charge on the SAS web site. Pros: Familiar to clinical data managers and regulators. Works well with SAS data analysis tools.Cons: Proprietary format.
SAS version 5 transport files (continued) Variable naming restrictions. Requires separate handling of metadata, administrative data and audit trails.
Adobe PDF Open source format provided by Adobe Systems. Widely used standard for transmission of text documents. Default format for transmission of information to the FDA. Can be read by the Acrobat Reader that is available free of charge from the Adobe web site.

Long-term data access requirements suggest that the choice of data format is limited to ASCII based formats or formats based on an open standard such as SAS Transport files. The choice may be further influenced by the format used in the original data management or data collection system.

Archives for Clinical Sites

The CFR predicate rules and the ICH Good Clinical Practice (GCP) guidelines specify that a copy of the clinical data must be retained at the investigator site throughout the records retention period. For paper based studies, this can be achieved by keeping a copy of the paper records at the site. For EDC studies that are conducted using an ASP model, it is important to have a strategy in place for ensuring that these guidelines are met. Many EDC vendors will provide PDF files for all of the eCRFs, electronic Case Report Forms, collected from the site during the trial.

Recommended Standard Operating Procedures

  • Study Archiving Procedures