Pfizer Docs Part 2: What's a CTD?
We did some organizing but it still just looks like a big pile of crap. What is supposed to be here, and how is it structured?
Welcome to part 2 of Reading 24,000 Pages of Documents Until I Scream. If you’re starting with this post and want to read the previous post because you like doing things in order then click here:
If you want a copy of the spreadsheet I created listing all the files, click here:
The pages being released by the FDA all come from the Biologics License Application (BLA) submitted by Pfizer for approval of their vaccine Comirnaty (which they have been distributing under Emergency Use Authorization as BNT162b2). The next step in our analysis is to ask ourselves what is supposed to be included in the BLA. If we don’t understand what’s supposed to be there, how will we know if anything is missing?
Since the FDA numbered all the pages, let’s first see if there are any gaps in their page numbering. These page numbers were generated by the FDA so we’re referring to them as FOIA page numbers to avoid any confusion with the actual document page numbers from Pfizer.
If you read the last post you already know that CBER means the Center for Biologics Evaluation and Research (a part of the FDA) and 2021-5683 is the FOIA request number to which they are responding. After that, they started numbering pages at 0000001. So we know in advance they plan releasing 9,999,999 pages or fewer, since that’s all the digits they gave themselves (that’s partially a joke, but it ties in with something later).
The page numbers indicate 24,281 pages but there are some gaps in the pages the FDA actually sent, and this dump includes 22,987 numbered pages. There are some extra files they didn’t number, which is probably an error since those files belong in Module 5 of the CTD (that will make sense in a moment).
The FDA hasn’t done us any favors assigning the FOIA numbers because they didn’t put the files in order beforehand. That means we can use the FOIA numbers to track what the FDA sent us, but we can’t use them to tell us if we’re seeing all the Pfizer documents from the Biologics License Application (BLA).
If we want to understand what we’re seeing, and how much of it has been released, we need to learn a tiny bit about how these BLAs are structured.
The Common Technical Document (CTD) format
The BLA is structured per something called the CTD format, and the FDA has a published guidance1 on it. This format was developed to try and harmonize these types of documents among the United States, the European Union, and Japan.
There are 5 Modules in the CTD format:
Module 1 - Administrative and Prescribing Information
Module 2 – Common Technical Document Summaries
Module 3 - Quality
Module 4 - Nonclinical Study Reports
Module 5 - Clinical Study Reports
Modules
The Pfizer BLA documents released so far are from Modules 1, 2, 4, and 5. Most of the files in this dump are from Module 5, which is the clinical study module.
Some of the file names tell us the Module number that file belongs in, others don’t. For example, Case Report Forms are all part of Module 5, but four of the files have names like CRFs-for-site-1055.pdf (a strange name since this file only contains one CRF, not multiple CRFs). But we can organize these ourselves since we now know what belongs in each module.
[Update: After reviewing these in more detail, I realized that the file contains the record for one study participant, and each participant record consists of many Case Report Forms. The rest of this post has been edited to clarify this point.]
Of the first 150 files released 125 are from Module 5, which contains the actual clinical study and post market data. The spreadsheet over on GitHub will be updated to organize files based on Modules rather than just in the order of FOIA numbers, which clearly makes more sense.
Pfizer has provided two versions of the Table of Contents for the BLA, following the CTD format. The first of these is part of the Request for Priority Review Designation (125742_S1_M1_cover.pdf) and this request letter is part of Module 1. They also included another, more detailed version (Supplemental-Index-12-22-21.pdf), which appears to be some screen captures from whatever software they were using for project tracking.
A note on the Case Report Forms
The participant records are huge - they are a big fraction of the page count so we’re going to briefly discuss them before looking at more interesting things.
Each subject has a participant record that contains many Case Report Forms (CRF), and these records can be hundreds of pages long. The FDA sent nine files containing participant records in this batch, and those nine files contain records for 52 subjects.
Notice the number of pages - these files contain 13,037 pages for those 52 subjects, or an average of 250 pages per subject (the shortest is 123 pages, the longest is 692).
On a side note: the entire trial had about 44,000 subjects, and at 250 pages each this would be 11,000,000 pages of participant records (more than the range of FOIA page numbers). As of today, 13,037 pages of our 22,987 total pages are just these files, which is over 50% of the page count.
A note on site and subject numbers
While we’re looking at CRFs, it’s a good time to learn about subject numbers. Each subject (participant) has an eight digit number, of which the first four digits are the site number, followed by a four digit number for the subject.
Following a very common convention designed to prevent confusion, the site numbers start with 1001. This way every site has a 4-digit number, and they’re numbered 1001 - 1270. The same is true for the subjects (starting with 1001) so the first subject at the first site is subject number 10011001.
In the table above, file 125742_S1_M5_CRF_c4591001-1085-10851018.pdf contains a single record for subject 10851018. Since we’ve been learning what all the numbers mean we can now break this whole sequence down:
125742 is the BLA number
S1 I haven’t yet figured out
M5 means Module 5
CRF means Case Report Form
c4591001 is the number of the clinical trial
1085 is the site number
10851018 is the subject number
The “S1” is on all the files with Module numbers so it might be something simple like “Submission 1” to help them keep track of files if the submission were rejected and resent.
Module 5 contains a list of all sites used in the clinical trial (5.2-listing-of-clinical-sites-and-cvs-pages-1-41.pdf) and they are numbered up to 270 but some numbers are skipped and there are only 196 addresses listed (assuming I counted correctly). It looks like some testing centers have more than one address, but are treated as one site because both addresses are run by the same investigator. There are also some duplicates, such as here on page 10:
So we’ll need to conduct a manual review if we want to know the exact number of sites. Most of the sites are in the U.S. but there are also sites in Argentina, Brazil, Germany, South Africa, and Turkey. Pfizer subcontracted companies (like Ventavia Research Group) to run these sites, and each subcontractor runs multiple sites.
Note in some cases Ventavia has subcontracted to someone else, so they are managing more sites than one would think from this file. For example, site 1096 is Dr. Van Tran Family Practice. But in the file CRFs-for-site-1096.pdf we can see these subjects are being managed by Ventavia. It’s nothing nefarious - they need to work with a lot of clinicians to sign up 44,000 people.
A quick note on how software is documented
Some of the files are text printouts of programs used in data analysis. For the non-programmers out there, programs don’t have page numbers - the lines are all just listed sequentially. With books we talk about page numbers, in software we talk in line numbers.
The FDA assigned some “page numbers” to these files, and those are the first part of the file name - which of course makes for very long file names. They also missed a few files so we’re starting with a bit of a mess here.
These are short programs in a language used for statistical analysis (SAS) and we might need some help to interpret them. They all appear to be used for analysis of the clinical data, and to be part of Module 5.
Now for some useful info if you’re into a little light reading
Module 2: Clinical Overview
Module 2 contains summaries of the various parts of the submission. For example, STN-125742_0_0-Section-2.5-Clinical-Overview.pdf is Pfizer’s summary of their own clinical trial and the results. Most of this document is technical but it’s highly informative if you have a medical background.
The trial was conducted in multiple phases. If you’re curious what Phase 1 was, that phase was designed to determine the dosage level:
Module 2: Summary of Clinical Safety
The file STN-125742_0_0-Section-2.7.4-summary-clin-safety.pdf contains a summary of the adverse events from the clinical study, including deaths. For example, on page 218 they tell us:
2.7.4.2.4.3.1. Deaths (Phase 3, Study C4591001)
There were 15 deaths in the BNT162b2 group and 14 deaths in the placebo group from Dose 1 to the unblinding date during the blinded placebo-controlled follow-up period (Table 16). None of these deaths were assessed by the investigator as related to study intervention.
So the all-cause mortality was essentially the same in both the vaccine group and the control (placebo) group - remember that the next time someone lectures you about these vaccines saving lives.
Module 5: Post Marketing - more interesting stuff
Another interesting document is from Module 5: 5.3.6-postmarketing-experience.pdf. From the document itself:
This document provides an integrated analysis of the cumulative post-authorization safety data, including U.S. and foreign post-authorization adverse event reports received through 28 February 2021.
One of the things redacted is the number of doses of BNT162b2 they have shipped, so it’s not easy to estimate what percentage of people have a given adverse event (the “(b) (4)” is a marker showing where something was redacted).
It is estimated that approximately (b) (4) doses of BNT162b2 were shipped worldwide from the receipt of the first temporary authorisation for emergency supply on 01 December 2020 through 28 February 2021.
Although we don’t know the total number of people vaccinated, there were quite a few case reports. At some point we should probably compare these numbers to VAERS data for other types of vaccines.
Cumulatively, through 28 February 2021, there was a total of 42,086 case reports (25,379 medically confirmed and 16,707 non-medically confirmed) containing 158,893 events. Most cases (34,762) were received from United States (13,739), United Kingdom (13,404) Italy (2,578), Germany (1913), France (1506), Portugal (866) and Spain (756); the remaining 7,324 were distributed among 56 other countries.
Pfizer also included a nice little chart for us (page 8):
Now we can better understand some of the documents
We’ve learned enough that some of the documents will start to make sense. For example, here’s a page from 125742_S1_M5_5351_c4591001-fa-interim-excluded-patients.pdf about subjects in Phase 1:
These are Phase 1 subjects who were excluded from the final data. From the clinical summary we discussed earlier we know there were two vaccine candidates and multiple dosages, and here we can see what each subject was getting (one of the two candidates or the placebo) and the dosage.
For the first subject on this page, we now know that the numbers in the subject column are the study number (C4591001) followed by the site number (1002) then the subject number (10021053).
Now, as we dig deeper into the documents we’ll understand at least a little bit of what we’re reading. This is good, because things will get tough as we dig into the more technical details of this BLA.
FDA Guidance on CTDs:
https://www.fda.gov/media/71666/download
Archived copy:
https://web.archive.org/web/20210305025804/https://www.fda.gov/media/71666/download