May 20, 2020
From my first undergraduate course in biostatistics, I knew I wanted to pursue a career as a biostatistician. Eager to combine my passions for math and medicine, I enrolled in a graduate program for biostatistics at Columbia University. One of my first courses was an introduction to randomized clinical trials.
Learning about the design and statistical aspects of clinical trials fascinated me. After all, this particular study design was considered to be the “gold standard” in research. Meaning that if properly designed, balance on baseline covariates could be achieved, eliminating the need to adjust for such confounders in the multivariable modeling phase or to have to explore other adjustment techniques.
We even designed a mock phase II trial during the course. Looking back, I realize I should have included a section about the instruction of proper data collection and randomization techniques. However, no additional lectures or classroom time would prepare me for the design and analysis complexities that I would encounter in the real world as a biostatistician.
Since starting my career, I have worked on the planning and analysis of a variety of observational studies as well as some clinical trials. One trial in particular stands out to me, for it challenged several of the points that I had learned in my graduate course.
When the data were first sent to me, I was notified that I would need to manually exclude several patients as they were later identified as violating the randomization parameters. Some patients were randomized to the different study groups but violated the study protocol and others were screened but never randomized.
From a statistician’s point of view, I could not believe that I had to discard valuable patient data from the primary analysis. As a patient myself, I would be disheartened if I found out that months or years of the collection of my data resulted in information that could not be used in the way it was intended.
As I learned more about the trial, it seemed several people had been working on the data collection and much of the data were collected on paper forms. Many papers were misplaced, and the data were collected in a scattered, disorganized manner. In order to pull the necessary data into a format for me to analyze, the investigator spent several weeks manually creating a database with all the necessary information, alongside performing their regular clinical duties.
Even then, when the data were sent to me, I spent many hours cleaning and transforming the data into a format that I could work with. Despite the fact that this study was documented as a randomized clinical trial, the investigator and I still had to consider the adjustment of certain baseline characteristics in the multivariable models as we were doubtful that balance was achieved between the study groups.
To complicate matters further, we were faced with a considerable amount of missing data, largely due to paper forms being misplaced or values not being entered as desired. While missing data is practically inevitable, the reason perplexed me. If the data were recorded in a different format other than on paper, perhaps digitally recorded via computer or smartphone device, it is likely that the missing data would have been far less than we observed.
This is not to say that the missing data could have been eliminated completely (as some variables are inherently prone to missingness or lack of subject response), but the digitization of the recording of this data could have greatly helped. Essentially less of the data would have been “lost in translation” from paper to computer. Moreover, when data are automatically taken or recorded in real time, there are less opportunities for human data entry errors (as is the case with reading and entering information off of paper forms).
The accessibility of the study protocol is another factor that could have greatly improved the success of this trial. Given that so many different people were at one point or another working on this trial, I can imagine that not everyone was completely familiar with the protocol. If the protocol were housed in some central, digital area, would the randomization parameters still have been violated? While I cannot answer this question with 100% confidence (there is always some form of doubt in statistics), I do believe that many more patients would have been randomized correctly.
In case you were wondering, the trial did produce some interesting results, and after several hours of careful consideration, analysis, and discussion, our abstract was accepted at a well-regarded conference. Overall, I can say my experience working on this trial was extremely eye-opening and thought-provoking. While I am grateful for this learning experience of working on a challenging study, I do hope that clinical trials continue to become increasingly digitized in the years to come. The amount of time, effort, and money that could be saved will be worthwhile, as this will allow for more focus to be placed on saving and improving the lives of millions of people worldwide.
You can connect with Victoria on LinkedIn.
Victoria Cooley is a research biostatistician in the Division of Biostatistics and Epidemiology in the Department of Population Health Sciences at Weill Cornell Medicine in New York City. She holds an M.S. in biostatistics from Columbia University, and B.S. in health science from Springfield College. Before joining Weill Cornell Medicine, Victoria worked on the analysis of clinical, molecular genetics, and biochemical data from the North American Mitochondrial Disease Consortium Registry at Columbia University Medical Center.
Victoria engages in short-term and long-term biostatistical consultations for investigators. A large part of her work centers on her support to researchers through the Clinical and Translational Science Center, where she provides data analysis and prepares reports, methods sections, and analysis plans for abstracts, manuscripts, and grants. She also works closely with the Departments of Pediatrics, Urology, and the Joint Clinical Trials Office and assists in teaching biostatistical methodology to medical residents, fellows, and other research staff.