Dec 8, 2011

by Karen Born Andreas Laupacis

Interpreting randomized trial evidence around mammography

The Canadian Task Force on Preventive Health Care recently released recommendations about screening for breast cancer.

These recommendations have been criticized by some because they emphasize the results of randomized trials.

This article explores the advantages and limitations of randomized trial evidence regarding screening mammography.

The recent recommendations by the Canadian Task Force on Preventive Health Care have been criticized on a number of grounds.

One criticism relates to how they interpreted the scientific evidence regarding screening mammography, specifically basing their assessment of the benefits of screening almost entirely on randomized trials andignoring the advances that have been made in mammography technology in the last two decades.

In this article, we explore what underlies these criticisms.

What is a randomized controlled trial?

A randomized controlled trial is a research design in which patients or communities are randomly allocated (like the flip of a coin) to receive one of two treatments or tests. The patients are then followed over time to see if there are differences in outcomes between the two groups. In the case of the randomized trials of mammography, patients or regions were randomly allocated to a mammography screening program or no screening program. The women were then followed for an average of 11 years, and the number of breast cancers diagnosed, deaths from breast cancer and breast biopsies (among other outcomes) were compared between the two groups.

In health care, randomized trials are considered the “gold standard” if one wants to understand the benefits of a treatment or screening program. This is because randomization makes it likely that any benefit detected at the end of follow-up has occurred because of the treatment and not because the participants in one group are healthier than those in the other group.

The importance of randomization is illustrated by the history of hormone replacement therapy, such as Premarin, as a treatment to prevent heart disease in post menopausal women. Many non-randomized studies found that post-menopausal women who took hormone replacement therapy were less likely to develop heart disease than those who didn’t take hormone replacement therapy. The results were considered so convincing that hormone replacement therapy was strongly recommended by many influential medical associations as a way to prevent heart disease in women.

However, two large randomized trials subsequently showed that not only did hormone replacement therapy not decrease the risk of developing heart disease; the treatment actually increased the risk. It is likely that the apparent benefit from hormone replacement treatment seen in the non-randomized studies was because women who take preventive medicines are generally healthier than those who do not. The non randomized studies made it seem that the drug was responsible for decreasing the risk of heart disease, when in fact life style and other unmeasured factors were likely responsible.

Results of randomized trails of mammographic screening

Women who volunteer for mammographic screening are likely healthier than those who do not, thus creating similar problems with interpreting the results of non-randomized studies as with hormone replacement therapy. For this reason, most groups who develop screening guidelines use the results of randomized trials as the main basis for their guidelines.

The Task Force reviewed 8 published randomized trials of mammographic screening in women younger than 49 years of age. They found that, on average, regular screening with mammography led to a 15% relative risk reduction in death from breast cancer. Because these results come from large randomized trials, it is highly likely that the 15% relative risk reduction really is due to mammography and not due to differences in the risk of developing breast cancer between the women who were screened and the women who were not.

However, for a variety of reasons some critics argue that the randomized trials underestimated the benefits of mammography.

Criticisms of randomized trials of mammographic screening

One criticism is that some women randomized to mammography screening decided not to be screened. Conversely, some women randomized to no screening had a mammogram. In the epidemiology literature, this phenomenon is called “contamination”, and it means that these studies likely under-estimated the maximum potential benefit of screening. Because no screening program will ever convince all eligible women to undergo mammography, and because some women will be screened even in the absence of a formal screening program, the results of these randomized trials likely reflect what will happen in the “real world” when screening programs are introduced.

These results are perhaps most useful for policy makers who are deciding whether or not to pay for a mammographic screening program. However, for a woman contemplating screening mammography who will be complaint with screening recommendations, the results of these trials likely underestimate the benefits of screening for her by a small amount. A second criticism is that the randomized trials studied old mammographic technologies. Since newer technologies are better at detecting cancers in younger women and women with dense breasts than older mammograms, it is argued by some that the randomized trials under-estimate the benefits of mammography as practiced in 2011.

The oldest randomized trial of mammography enrolled women in 1963 and the most recent study enrolled women between 1991 and 1997. Women were followed for an average of 11 years from the time of enrolment. The most recent trial found that mammography led to a 17% relative reduction in the risk of dying from breast cancer in women between 40 and 49 years of age. This is almost the same as the 15% average found when combining all of the studies, and suggests that the newest technology available in the mid 1990s didn’t have a much greater impact on deaths from breast cancer than the older mammographic technologies.

Could 21st century mammographic screening technologies prevent more deaths than the older technologies? The Task Force’s answer to this is that we “require further randomized trials” to sort this out. Is this realistic? Any future randomized trial would need to compare the new technology with an older one. Most women are unlikely to agree to be randomized to such a study. Also, the study would need to include a very large number of women, and even if the study was feasible, the results would not be available for 15-20 years.

Another approach would be to model the impact of current mammographic technology on deaths from breast cancer, using information about how much better the current technology is, and data from the older randomized trials. Although such a modelling exercise would invariably involve making several assumptions, it would lay out how much of an impact on deaths from breast cancer the newer technologies might realistically be expected to have. Given that the most recent study comparing newer mammography with older techniques found very little difference between the two, except in young women with dense breasts, it seems unlikely that the newer techniques would markedly increase the number of breast cancer deaths prevented compared to the published randomized trials.

What about information from non-randomized trials?

One potential approach to the limitations of published randomized trials is to use information from non randomized studies that have looking at breast cancer deaths in regions with no screening, those using older screening technologies and those using newer screening technologies. However, because the populations being studied were not randomized to the type of screening they received, it is possible (some would say likely) that some of the differences in deaths from breast cancer found in such studies are due to differences in the characteristics of the women and/or differences in the management of breast cancer.

It turns out that advocates on both sides of the mammography debate have turned to non-randomized studies to make their arguments, but it appears as if they selectively mention the studies that support their point of view and ignore those that do not.

For example, Peter Goetzsche, a researcher from Denmark has published extensively on this issue, arguing that mammography does more harm than good. In a recent Canadian Medical Association Journal editorial, he referred to non-randomized studies from Sweden that found that death rates from breast cancer had decreased at the same rate in regions with, and without, screening programs.

On the other hand, Martin Yaffe, a researcher from Toronto who argues that the randomized trials under-estimate the benefits of mammography, refers to a non-randomized study of mammography from British Columbia which suggests dramatic benefits from mammographic screening.

Many advocates for one position or the other refer only to non-randomized studies that support their position. If the results of non randomized studies are to be considered alongside the results of randomized trials, it is important that individuals not selectively choose the studies that agree with their position and ignore the ones that do not.

A systematic review on non-randomized studies, which includes the results of all relevant studies, could provide useful input to the debate. However, while there has been a call from some in the scientific community for such a study to be conducted, it has not yet occurred.

Authors

Karen Born

Contributor

Karen is a PhD candidate at the University of Toronto and is currently on maternity leave from her role as a researcher/writer with healthydebate.ca.

Andreas Laupacis

Editor-in-chief Emeritus

Andreas founded Healthy Debate in 2011. He is currently the editor-in-chief of the Canadian Medical Association Journal (CMAJ)

Republish this article

Republish this article on your website under the creative commons licence.

Learn more

The comments section is closed.

4 Comments

Anthony Miller says:

December 28, 2011 at 10:15 am

There is more agreement between Martin Yaffe and myself than may appear.
First we agree that most of the old breast screening trials conducted before the advent of adjuvant therapy can now be discounted. Second, we agree that we both worked diligently to ensure that the standards of mammography in the Canadian National Breast Screening Study (CNBSS) were as good as could be achieved. Third we agree that our objective should be to reduce advanced breast cancer, and death from the disease. My original expectation was that mammography would facilitate this. The CNBS showed that mammography could add little or nothing to screening by clinical breast examination and the promotion of breast awareness by the teaching and re-enforcement of breast self examination. That this is so is because of the nature of screening, which selectively results in the detection of relatively slow growing and easily treated cancers, and often small non-progressive lesions, as is probably more the case for digital mammography. Let us not forget that if the benefit in screened women is a reduction of breast cancer mortality of 24%, this still means that 76% of the deaths from breast cancer destined to occur will still do so. But in fact the 24% from the UK age trial is a biased estimate of the effectiveness of screening, the valid estimate is a non-significant 17% reduction in breast cancer mortality.
However, Yaffe persists in suggesting that the CNBSS was the only study to find that screening also brought forward the diagnosis of advanced, as well as early cancer. Yet Cox (1997) pointed out it was a general phenomenon of all reported breast screening trials. The UK age trial investigators have failed to publish the data to enable us to determine it was so in that trial. But we do know it is an almost general phenomenon, having been reported recently for both prostate (Andriole et al, 2009) and ovarian cancer (Buys et al, 2011) screening.
Yaffe points out the wide variation in the estimates made by Berry et al (2005). What he (and Berry, at least so far) have failed to acknowledge is that the reduction in breast cancer mortality was over-estimated because all the models were based upon the assumption that breast cancer mortality would have increased in the United States in the absence of screening and improved treatment. This assumption was based upon the demonstrated increasing incidence of breast cancer at that time. However, we now know this was because of hormone replacement therapy and once this was reduced, incidence fell (Ravdin et al, 2007). So if the effect of improved therapy is subtracted from the declining mortality, the room left for an effect of screening is very small.
Yaffe misrepresents the process that actually occurred in the screening centres in the CNBSS. Only when the nurses had completed their examination and indicated whether or not the woman was to attend the review clinic was the randomization performed by the centre coordinator. At the review clinic, if the surgeon wished to have a mammogram, this could be done, the nurses knew this, they attended the review clinics and indicated their concerns to the surgeon, they had no incentive to try and subvert the randomization, and they had no power to do so. Further, we have demonstrated several times that randomization was balanced, no other breast screening trial (including the UK age trial) collected the data to enable this to be done.
Yaffe implies I should talk to oncologists. I do so frequently. He may not know that my second published paper reported my attempts as a young physician to treat by chemotherapy women with advanced breast cancer. It was that experience that eventually led to my designing the CNBSS, laboring for many years to procure the funds to enable it to proceed, and ensure our quality control procedures were as good as were possible, including the excellent work performed by my good friend the late Douglas MacFarlane. The CNBSS remains the only breast screening trial to have had a reference radiologist and a reference physicist, and we (not the management committee, which was concerned with finance) encouraged visits and interchanges with experts. I wish the other trialists had been so open.

References

Andriole GL, Grubb RL, Buys SS et al. Mortality results from a randomized prostate-cancer screening trial. New Eng J Med 2009; 360: 1310-19.

Berry DA, Cronin KA, Plevritis SK et al. Effect of Screening and Adjuvant Therapy on Mortality from Breast Cancer. N Engl J Med 2005;353:1784-92.

Buys SS, Partridge E, Black A et al. Effect of Screening on Ovarian Cancer Mortality. The Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Randomized Controlled Trial. JAMA 2011; 305:2295-303.

Cox B. Variation in the effectiveness of breast screening by year of follow-up. Monographs Natl Cancer Inst 1997; 22:69-72.

Ravdin PM, Cronin KA, Howlader N et al. The Decrease in Breast-Cancer Incidence
in 2003 in the United States. New Eng J Med 2007; 356: 1670-4.
Anthony B. Miller says:

December 19, 2011 at 2:53 pm

Martin Yaffe makes a number of assertions that must be refuted.

First, he states that nearly all the breast screening trials performed in the past should be discarded as the quality of mammography has improved so much. In fact, as the quality of mammography improved, there has been no evidence of increasing effectiveness of screening, precisely the opposite. The improvement in quality of mammography has had little impact on its sensitivity, though it has had a major impact on improving its specificity (i.e. in reducing false positives). What has happened in this period has been the greatly improved impact of therapy.

In that respect, he states that I provide no evidence that modern screening will have relatively less impact because of improved therapy. The evidence that this is so has come from models (1,2) while we have recognized that as screening only works if therapy is effective for the detected lesion, it stands to reason that as therapy becomes more effective, more and more able to cure every patient at whatever stage they present, then the contribution of screening will inevitably become nil.

Then he re-raises the faulty critique of Boyd et al (3) to suggest the CNBSS should be discarded, a suggestion rejected by other evaluators (4). However, mud sticks, so it is worth re-stating the facts.

Boyd et al (3) claimed that the CNBSS was marred by faulty randomization, yet independent examination of our data refuted that allegation (5). They place much value on an imbalance of subjects with detected palpable cancer at the prevalence screen. However, you cannot use the final diagnosis of those with detectable abnormalities to establish this point, you have to use the number of women who were detected with suspicious palpable abnormalities by the examiners, and subsequently referred for diagnosis, which were equivalent in the two arms. It is not surprising that mammography brings forward the diagnosis of those with fairly advanced disease as well as those with earlier disease, this has been noted with screening for breast cancer in other breast screening trials, as well as in screening for other cancers. It was the availability of mammography in the mammography arm which resulted in the excess, not flawed randomisation. It is not unreasonable to note that Norman Boyd worked with me on the pilot studies that preceded the initiation of the trial, in which we piloted, and agreed to adopt, the very randomization procedures he criticized. In particular, we knew we could not cope with a telephone system of randomization with women being enrolled at multiple centres across the country with time zones varying by 4 hours, neither the technique of distributed randomization by computer, nor the internet, being then available. It is always easier to be “holier than thou” in retrospect.

Boyd et al (3) also claimed that the quality of the mammography was poor, yet our mammography standards were in part established by one of Boyd’s co-authors who was the CNBSS reference physicist (Martin Yaffe himself) and another of his co-authors (Jong) applied these standards successfully in reading the largest number of mammograms of any radiologist in the CNBSS (Jong). The CNBSS was first or co-first in all the indices of quality that Fletcher et al (6) applied to the breast screening trials.

The third major critique of Boyd et al (3) was too low power (at 7 years) to accept a null result, concluding “Until such time as further follow-up and analysis reduce the uncertainty surrounding the magnitude of the effects on mortality seen in the NBSS, the results of these trials should not be used to change the prevailing scientific view of the potential benefits of screening with mammography.” We did not disagree with that conclusion in 1993, but by 2002, with further follow-up to an average of 13 years, the situation had markedly changed, as the negative findings persisted with far narrower confidence intervals (7,8).

Therefore, it is time to relegate Boyd et al (3) to the ashes, and take the facts as they now are. With each passing year, the hoped for benefits of mammography screening seem more and more like a passing dream.

Anthony B. Miller
Director and Principal Investigator, Canadian National Breast Screening Study

References
1. Blanks RG, Moss SM, McGahan CE, Quinn MJ, Babb PJ. Effect of NHS breast screening programme on mortality from breast cancer in England and Wales, 1990-8: comparison of observed with predicted mortality. BMJ 2000; 321:665-9.
2. Berry DA, Cronin KA, Plevritis SK et al. Effect of Screening and Adjuvant Therapy on Mortality from Breast Cancer. N Engl J Med 2005;353:1784-92.
3. Boyd NF, Jong RA, Yaffe MJ, Tritchler D, Lockwood G, Zylak CJ. A critical appraisal of the National Breast Cancer Screening Study. Radiology 1993;189:661-3.
4. Olsen O, Gotzsche PC. Screening for breast cancer with mammography (Cochrane Review). In: The Cochrane Library, Issue 4, 2001. Oxford
5. Bailar JC, MacMahon B. Randomization in the Canadian National Breast Screening Study. Report of a review team appointed by the National Cancer Institute of Canada. Can Med Ass J 1997; 156: 213-5.
6. Fletcher SW, Black W, Harris R, Rimer BK, Shapiro S. Report on the International Workshop on Screening for Breast Cancer. J Natl Cancer Inst 1993;85:1644–56.
7. Miller AB, To T, Baines CJ, Wall C. Canadian National Breast Screening Study – 2.: 13-year results of a randomized trial in women aged 50-59 years. J Natl Cancer Inst 2000; 92:1490-9.
8. Miller AB, To T, Baines CJ, Wall C. The Canadian National Breast Screening Study-1: Breast cancer mortality after 11 to 16 years of follow-up. A randomized screening trial of mammography in women age 40 to 49 years. Ann Intern Med 2002; 137: 305-12.
Martin Yaffe says:

December 18, 2011 at 3:44 pm

First, I think it’s important to remind ourselves that while the debate about the studies and numbers related to mammography screening is intellectually intriguing, we are dealing with recommendations that will influence the lives (and possible deaths) of real women as well as suffering that they and those around them will experience if they are diagnosed with advanced breast cancer.
I’d like to correct something that Andreas and Karen wrote in the article. The Task Force recommendations have been criticized, not because they emphasize the results of randomized trials, but because trials that were performed with mammography from 50, 40 and 30 years ago were included in the pooled analysis. Just as some trials were eliminated from consideration in the Systematic Review because the epidemiologic methods used did not meet current standards of acceptability, so should the oldest 7 of the 8 trials upon which the Task Force recommendations were based be eliminated. The quality of the mammography performed during the 60s, 70s and 80s was simply nowhere near what is currently the standard and so that technology is irrelevant to answering the question. Dr. Miller is correct that therapies have also advanced since the time of the early trials. He suggests that modern screening will have relatively less impact because of improved therapy (but provides no evidence that this has in fact been the case). My sense is that he sees earlier detection through screening and therapy as being in competition with one another. Most breast oncologists would disagree I think. Improved therapies amplify the benefits of earlier detection and earlier detection allows these therapies to have a greater probability of being successful.
A perfectly conducted RCT is indeed the gold standard for the evaluation of a medical procedure. RCTs have now demonstrated that the intervention of screening combined with timely contemporary therapy can reduce breast cancer mortality. The mixing of long obsolete technology (and treatments) only dilutes the measured effect.
We must recognize that RCTs suffer from limitations. A major concern of the so-called “intent to treat” structure is “contamination” discussed in the article. Participants are randomly assigned to the intervention and control arms of the trial and the results are analyzed according to that randomization. But not all participants in the intervention arm actual receive the intervention and similarly the intervention is not absent for all those in the control group. For example, some women assigned to the mammography screening group refuse screening mammography or miss their examination or attend at a date much later than their invitation. Others in the control group entered the trial explicitly for the purpose of receiving the intervention, and then seek it outside the trial. Analyzing the results according to randomization will underestimate the actual potential of mammography screening. For example, in the UK Age Trial (1), the estimate of mortality reduction associated with the randomization to the screened group for women 40-49 was 17% (not statistically significant), while when the mortality reduction associated with those who actually received mammography screening was 24% (just missing significance).
RCTs are acceptable when we really don’t know whether the technology of intervention such as screening mammography or a particular drug is capable of reducing mortality. But here is where I disagree with Dr. Laupacis. I do not believe that how people behave in an RCT at all reflects real world behaviour outside the trial. We must not confound the investigation of the potential of the technology being evaluated to reduce mortality (given that people receive the intervention) from the question of whether people will accept the intervention.
Once we know from RCTs that earlier detection actually does contribute to mortality reduction, we can focus on how to obtain reasonable compliance among those eligible. In conducting an RCTs it would not be ethical to suggest that one arm was superior, but in the delivery of a screening program it would be expected that the tools of health promotion would be used to familiarize the public with the rationale for the program and educate eligible prospective participants as to the potential benefits (and limitations) of a screening intervention. Given this information, one would expect that participation in the screening intervention would be higher than for those randomly assigned to be screened in an RCT of an unproved intervention.

Actual breast screening programs that admit women in their 40s are in place in parts of Canada and one of these, in British Columbia (2) , have been well studied epidemiologically. These studies, referred to as observational, are frowned upon by some because they are not randomized. And indeed, like RCTs they are susceptible to certain types of bias. One of these is due to self selection, the fact that is often people who are more health conscious who do participate in programs like screening. But it is possible to correct for many of the possible sources of bias, and given the basic validation of mammography screening mammography through RCTs, studies like that of Coldman et al. Can demonstrate what can be accomplished in a real world program in which quality standards, an invitation and follow-up mechanism and modern approaches both to detection and treatment are used. In BC a 26% mortality reduction was seen in women in their 40s who were screened in that program.
I also disagree with the statement: “Given that the most recent study comparing newer mammography with older techniques found very little difference between the two, except in young women with dense breasts, it seems unlikely that the newer techniques would markedly increase the number of breast cancer deaths prevented compared to the published randomized trials “. The question of screening women in their 40s refers specifically to “young women, many whose breasts are relatively dense and where improved accuracy of mammography may help in the detection of small invasive cancers.

The article and the task Force Recommendations also ignored two other very important factors. First, when cancer is detected at a less advanced stage, it is possible in some cases to spare patients the harsher aspects of treatment such as mastectomy and chemotherapy. And, in its report the US Systematic Evidence Review for its Task Force showed that the benefits of screening women in their 40s are proportionally greater because when a breast cancer death is averted there are more years of life saved than in the case of older women. This important information was not presented in the recommendations by either the US or Canadian Task Forces who insisted on defining the benefits of screening only in terms of lives saved rather than the more informative years of life (or quality-adjusted years) saved.
I agree with Dr. Miller that all of the trials with the exception of the Age Trial should be disgarded because of when they were performed. But his trial, the CNBSS, should also be discarded, for the same reason and also because there were many operational problems associated with the conduct of that trial which cast double on the validity of its findings (3).
Finally, I applaud the authors’ suggestion of conducting a systematic review of the modern non-randomized studies, however, I urge them to include on the review team not only people who understand the epidemiological considerations, but also those who can assess the quality of the imaging-related variables.

References
(1) Moss SM, Cuckle H, Evans A, Johns L, Waller M, Bobrow L; Trial Management Group. Effect of mammographic screening from age 40 years on breast cancer mortality at 10 years’ follow-up: a randomised controlled trial. Lancet. 2006 Dec 9;368(9552):2053-60.
(2) Coldman A , Phillips N , Warren L , Kan L . Breast cancer mortality after screening mammography in British Columbia women . Int J Cancer 2007 ; 120 ( 5 ): 1076 – 1080 .
(3) Boyd NF, Jong RA, Yaffe MJ, Tritchler D, Lockwood G, Zylak CJ. A critical appraisal of the Canadian National Breast Cancer Screening Study. Radiology. 1993 Dec; 189(3):661-3.
Anthony B Miller says:

December 10, 2011 at 10:23 am

Although I agree with much of what Andreas and Karen write, there is one issue that they (and indeed the Canadian Task Force) neglect to discuss. That is that the early breast screening trials were conducted in an era when adjuvant chemotherapy and hormone therapy was not available. Their availability and use in Canada when the National Breast Screening trial was conducted must have diminished any benefit derivable from mammography screening (if you can cure all cases no matter what stage they are detected there will be no benefit from screening). Advances in imaging find smaller lesions, at earlier stages of the natural history, many of which would never progress to invasive, let alone potentially fatal cancer. So the only relevant trials now are the CNBSS and the UK age trial, and if you combine them, the estimated benefit is far less than 15% in women age 40-49, and the adverse effects correspondingly much more dominant.