Ontario’s recent budget lauded the expansion of its Health Links program, which funds different community initiatives that aim to prevent health complications among “high cost users,” or the 5% of patients who account for two thirds of health care costs.
But the truth is we don’t know how successful – or not – Health Links are. Formal evaluations looking at the effect they have had on patients’ health have only been initiated on a handful of the 82 Health Links programs. And the data is not yet available. (Health Links was launched in late 2012, with 19 programs).
It’s a common complaint among many in health research and policy that new health care programs aren’t set up to be evaluated and the methods of evaluation are often weak when studies are done. “When we pilot new models in health care, the measurement effort is not usually intense and is not taken as seriously as it needs to be,” says Merrick Zwarenstein, director of the Centre for Studies in Family Medicine at the Schulich School of Medicine & Dentistry.
The issue isn’t limited to Ontario. According to the report by the Advisory Panel on Healthcare Innovation, chaired by David Naylor, there’s a systemic failure “to spread or scale up the best ideas” from new health care delivery projects, in part because poor evaluation makes it difficult to pinpoint what the best ideas are.
That’s a problem, because “a good fraction of the things we try are not going to work,” says Zwarenstein. And when we identify failures and understand why they happened, that’s when we can learn the most about what to do differently going forward, he argues.
The challenges of evaluating new health care models
Politicians are under pressure to implement new programs in a timely fashion, explains Chris Eagle, former CEO of Alberta Health Services and current health care consultant. “The Minister of Health is in place for one to two years on average, and they have to show that they’ve put their stamp on the system,” he says. Evaluations are expensive and it takes years for a new program to affect health outcomes, explains Walter Wodchis, associate professor of the Institute of Health Policy, Management and Evaluation at the University of Toronto. So formal evaluations are not always seen as useful to a government that needs to make quick decisions.
The lack of evaluation of new health care delivery models is “partly [researchers’] own fault,” says Zwarenstein. “The way we’ve done these evaluations has tended to be too time consuming; [reporting of results] doesn’t match the speed in which health decisions need to be made.”
It can also take too long for researchers to get funding approval from independent bodies like Canadian Institutes for Health Research. “The policy makers don’t necessarily want to wait,” for evaluation funding to come through, which can take years, says Sharon Straus, director of the Knowledge Translation Program at St. Michael’s Hospital.
In many cases, whether because of funding timelines or because governments only order evaluations after a program is in place, evaluations of health care programs happen years after implementation. Retrospective studies aren’t ideal, however, because the information researchers need to understand a program’s success or failure typically isn’t available after the fact. Data like how many laboratory tests were done for a patient, or how much physicians were paid to care for a patient, is collected automatically. But unless surveys have been performed as part of a research project, “you won’t know things like, can a patient go to a doctor and ask two questions at a time or are they limited to one question? Will the doctor renew their prescriptions by phone?” says Braden Manns, interim scientific director of the Alberta Health Services Kidney Strategic Clinical Network.
The other problem with after-the-fact evaluations is it’s difficult to understand why one site performs better than another, or what improvements can be made along the way unless you follow a program from the beginning. “We need to go beyond the ‘what’ questions and answer the more useful ‘how’ and ‘why’ questions,” says Francois Champagne, a professor of Health Administration at the University of Montreal. If an evaluation is simply measuring whether a new program has an effect on hospital readmissions or not, for example, “you can not do much to improve your intervention, except cancel it [if no effect is found],” says Champagne. If a program seems to be working better in a rural area, it’s important to understand why. Is it related to the patient population or the dynamics between the providers, for instance?
Ideally, evaluations should include both quantitative research that looks at results and qualitative research that seeks to explain the reasons behind the results. For example, a randomized study of virtual wards in Ontario found that they didn’t reduce readmissions to hospital or improve survival. The methods of the rigorous study were lauded by Straus and the finding was important as it helped the Ministry decide not to expand the program. But Irfan Dhalla, a general internist who led the study, says he wishes “we had also done a good qualitative study during the trial to better understand what barriers the virtual ward patients and their healthcare providers were facing. Such a study might have helped us be sure why we weren’t as successful with the virtual ward as we had hoped to be.”
Health program evaluations would be much stronger if they were set up as the program is being designed, randomized and compared against similar, non-intervention sites. This technique reduces the risk that the apparent success of a new innovation is actually due to differences in the characteristics of the test sites, rather than to the innovation itself. Because governments understandably want a new program to be successful, they tend not to randomize implementation, but choose the sites that are most likely to do well. They also often don’t build in comparison sites for evaluation, says Zwarenstein.
For instance, in Ontario’s Bundled Care model, health teams are paid to provide a “bundle” of services to patients as they move from hospital care to community or home care, and payment is tied to patient outcomes. After St. Joseph’s Healthcare Hamilton successfully launched the model, the government selected six other sites to initiate bundled care programs, based on their strong plans for coordinating care and demonstration of their ability to follow through with those plan.
Zwarenstein understands why programs are tested in the most ideal settings. For instance, because the bundled care project is “quite novel” and requires an “entire infrastructure and trusting partnership” between hospital and community care mangers, the Ministry decided to find sites that already had strong relationships, rather than randomize the roll-out, Wodchis explains. Zwarenstein points out, however, that too often a program is only evaluated in ideal sites and then rolled out to “ordinary sites” without further evaluation to see if the program still works in those settings, and what challenges they face.
Ontario’s Deputy Health Minister Bob Bell says he hopes that the evaluation of bundled care will happen in a “step-wise” fashion, whereby with each new phase of roll-out to new sites, the model will be evaluated and adapted in the new areas, where the context and challenges will be different. “This allows us to continue to evaluate on a system basis whether there are unintended consequences, whether we are seeing what we expected to see, and to either speed up or slow down the implementation,” he says.
Moving forward: making evaluations useful
Doing a strong evaluation of a health care program can be difficult. It needs to be initiated in the early stages, and look at not just whether a program is working based on patient outcomes, but why programs are successful and how they can be improved. “You need people with both qualitative and quantitative expertise and to integrate the information,” says Straus.
Typically, provincial governments contract out independent evaluations or researchers seek independent funding to study the effect of a new program. But the separation of implementation and evaluation has led to problems, including out-of-step timelines and priorities.
Straus argues the evaluation and implementation of new programs should be better integrated. If researchers have a better appreciation for the goals of a program, they’ll be more likely to design and adapt studies to be realistic, rather than idealistic. And researchers have more input into improving programs based on evaluation results.
For example, while Wodchis and his team are independently evaluating the bundled care model, they’ve set up their study not only to measure the effect of the programs, but also to look at things like the steps the organizations are taking as they set up the program, and how informed staff are on how the program will work. “This allows for early ‘pivots’,” says Wodchis, if a potential problem, like low staff engagement, becomes apparent. The team is also focussed on reporting early results rather than simply waiting for the complete or ‘ideal’ data to be available. So while data on a patient’s admissions to all hospitals in the province is only released annually, the clinics in the bundled care programs are reporting data on patients’ readmissions to their own facility on a quarterly basis so they can measure that against their own expectations and see early on where they may need to improve.
Still, even rigorous evaluations can be discontinued due to political decisions. “Sometimes, everything gets put on hold because there’s an election going on,” explains Straus.
One way to avoid evaluations being overly determined by the politics of the day is to set up an independent body. Naylor’s report has called for a $1 billion national innovation fund that would integrate implementation and evaluation in order to scale up and improve delivery models. Such a fund is not without precedent. In 2010, the US government launched such an initiative, an “Innovation Center” that has a billion-a-year budget to evaluate programs to see if they save the system money or improve health outcomes and to scale up those programs that are proven to be successful. “The evaluation team is right there from the start of the model,” explains Hoangmai Pham, a director at the Innovation Center of Centre for Medicaid and Medicare Innovation. The Innovation Center has funded dozens of projects, and one so far has met the cost-saving criteria to be scaled up, while the others are continuing to be adapted and evaluated.
While the fund may seem costly, it represents only half a per cent of Canada’s total health care budget – proportionately less than what other industries spend on research and innovation, Wodchis points out. “We need to do the research to refine and improve the programs so we can move from the Model T to the Tesla,” says Wodchis.
The comments section is closed.
Stuff isn’t measured because stuff might not succeed as well as promised/predicted/hoped, and no one who proposes, or operates, or backs a project wants that.
As a matter of science, failure is practically expected.
As a matter of invention, failure is an ‘investment return’ (Edison’s view was that he never failed; rather he discovered 10,000 methods that wouldn’t work).
As a political matter, failure is toxic.
This is the truth. Do you think when the Ontario Government roles out a program they want to consider the possibility it didn’t work several years later. No, they want a program that in the next election cycle they can say, “Look at the great things we did for you”
The sad thing is that new programs are usually set up in places where things are going well, where there is a ceiling effect: it is difficult to improve further. More needs to be directed to the places where things are not going well, so much larger gain might be obtained, and the patients need the help more.
Why? Because evaluation is a thing, which takes money, which takes time, which takes effort. What we see however is basic scientists flooding the funders with aggressive, take-all, propaganda machine that essentially serves their personal and group idiosyncrasies. With no regard to serving the sick and ill. In Canada, how many unversities offer a degree in the science of health care delivery?
Completely agree.
For all the rhetoric that we follow evidence based medicine (and I am biased towards science based, but that is another discussion), we fail miserably to determine the outcomes of our programs.
What was the business case for the program?
How were the results to be measured?
What are the funding consequences if you don’t follow through or the results are poor?
Case in point: Ontario’s Shared Service Organizations (SSOs).
After being formally in place for 7+ years, now the Ministry wants to measure how well they are collaborating. Surveys that take significant man-hours to complete, asking for data SSOs don’t have, pulling resources away from operations, eventually resulting in poor data that will take months if not years to interpret, if at all.
Check out the work being done by the Canadian Foundation For Healthcare Improvement (CFHI) on Pan-Canadian spread of effective models of care and their evaluation http://www.cfhi-fcass.ca/Home.aspx
While I agree with most of the points made in this article, I would argue that we are often trying to measure too much. Our focus on trying to measure “everything – including the kitchen sink” is a big reason for the delays, and has also probably created a bit of an evaluation cottage industry. Perhaps the challenge we (policy makers, stakeholders, etc.) need to take on is identifying the 3 to 5 most important parameters that need assessment and focusing efforts there. This would align well with the iterative approaches identified in the article.
You are spot on. As the article says, this is partly researchers fault as the evaluations are too time consuming, they are too often trying to be academic studies that cover everything and the kitchen sink to control for all variables, collect all kinds of information, and ultimately promote the careers of the researchers for publication in some journal. No one seems interested in rough and ready/tentative research. It seems like we have to always build a Bentley when perhaps a cheap Ford Focus would do.
It’s not so much the economics as the political economy of large-scale evaluation initiatives. Sooner or later, people start to shake their fists at how much is being spent on evaluation as opposed to direct patient care. Unlike the internal machinery of a Ministry, an arms-length agency has no insulation from the political winds of the day. Moreover, there’s no guarantee that decision makers will act on the results of an evaluation.
One raised eyebrow from the press or the Auditor General and that’s that. Sad but true.