(by John Johnson)
The talks that everyone is talking about are of course very cool, and we can learn a lot from them. However, I came to this Joint Statistical Meetings in search of some of something a little different. I attended many fewer talks than I have in the past (where I would diligently attend something every session except maybe Thursday morning when I would check out and go. What I found were a lot of devils in the details.
On Saturday I attended a continuing education course on the analysis of register data. Register data is administrative data such as what a government would collect. For example, birth and death data are register data in the US and almost every other country with a functioning government. This data is a challenge to work with for the following reasons:
- It is collected on the whole population, as a census, but is longitudinal in nature
- It is very difficult to curate, and is collected and curated through administrative processes rather than sampling
- It is difficult to quality control, and that control is best done through merging with other data
- Its analysis value increases in merging with other data
- The only source of error is transcription
While I don’t work with register data, I can appreciate the hardships that come from working with administrative data, or data that is collected as an artifact of a transaction. The challenges in merging come from the subtleties in defining the variables, and making sure that variable definitions are consistent across data. It got me to wondering whether many of the challenges and inefficiencies we have in working with this data comes from our sample-based approach to handling it.
Speaking of data, a late Sunday session on CDISC data standards
was well received, and in fact we ran over by over half an hour with consent from the audience. This talk was sponsored by the statistical programming section, but there was something in there for statisticians as well especially regarding the planning of analysis of clinical trial data. Statisticians would do well to learn these standards to some degree, because they will become more of a centerpiece of statistical analysis of clinical trials.
More generally, I am curious how many statistics departments have a class on data cleaning and handling, and, if so, if it is required or a choice for a required track. I was almost completely unprepared for this aspect when I came into the industry, having only managed messy data a little bit during a consulting seminar. In planning data collection, it is important for the statistician to look ahead and thing about how the data will have to be organized for the desired method, and that requires some data handling experience.
On Monday I attended part of the session on reproducible research, and concluded that at least in the pharma/bio industry we have no clue what reproducible research is. We have an excellent notion that research needs to be repeatable, and that documentation needs to accompany analysis to tell someone else how to interpret the findings. However, we don’t really integrate it as closely as is expected in a true reproducible research settings. Maybe CDISC data standards (as discussed above) will eliminate that need at least from the point of view of an FDA reviewer. However, it won’t within companies, or in studies that are not done with CDISC compliant data.
Monday night, I partied with the stat computing and graphics crowd, and had a mostly delightful time. Maybe they can run their raffle and business more efficiently next year. Hint hint.
On Tuesday I supported a colleague in a poster presentation describing challenges in a registry of chronic pain management, and gained a new appreciation for the poster format. Much of the discussion was thoughtful and insightful, and we were able to explain the challenges. It was at least validating that the attendees who stopped by agreed with our challenges and gave some suggestions along the lines we were thinking, and the depth of discussion was stimulating. Off the success of that, I made a point to stop by the posters and found some really good material. I would encourage more posters, and I found that most of the benefit I get from JSM is from small group discussions (and occasionally from the larger talks as well).
It was somewhere in here that Andrew forwarded me an email with a disturbing statistic about the number of investigators who cannot describe a clinical trial or the data, nor can the consulting statistician explain the trial. I think this is a topic we will return to in this blog, and I think I will submit this idea as a biopharm-sponsored invited session next year. I know that the consulting section has sponsored quality sessions on leadership in the past, and I saw a very good session on leadership at ENAR this year. I think it is time to bring it to a wider audience.
Tuesday night and Wednesday were mostly focused on catching up with old and new friends and going to posters. I’m fairly tired by Wednesday on the week of JSM, and even more so given that I got in on Friday this time, so I debated whether I would get anything out of sitting in talks. I found a couple of fascinating posters on using tumor burden to assess cancer drugs and whether safety monitoring of drug trials has an impact on Type II error rate (it does, and it’s nasty). On the basis of this, I hope to see more well-done posters submitted at next year’s meeting. I love the discussion they generate.
I ended up in a fascinating discussion about evidence needed for FDA drug approval, whether subjective Bayes has any role, and the myth and illusion of objectivity. Some of this discussion relates back to “the difference between statistically significant and not significant is not statistically significant,” but I think there are some deeper philosophical problems with the drug evidence evaluation that keep getting swept under the rug, such as the fact that we assume that drug efficacy and safety are static parameters that do not change over time. (There are obvious exceptions to this treatment, such as antibiotics.) This is a true can of worms, and I’ll let them crawl a bit. And yes, practical considerations come into play such as the fact that the choice of software is either do something that is hard to write and verify it is correct, or spend thousands of dollars on software.
Tomorrow is the last day of the conference, and I’ll try to catch a talk or two before I leave. I hope to see you next year, and before!