Demystifying Scientific Research: Explaining a Complex Study on Personally Tailored Activities for People with Dementia in Long-term Care

Dr. Anthony Close
18 min readApr 16, 2023

Understanding scientific research isn’t easy.

It often contains sub-components of multiple areas of expertise such as mathematics, physics, chemistry, biostatistics, genetics, and more — all stitched together to convey a message.

I am going to use a systematically reviewed paper to explain what things mean in the simplest way possible. Now, I am in no way a statistical expert however this is my way of learning new information, refreshing the old, and solidifying that into memory through teaching.

If there are errors, let me know as it helps me and my readers.

The systematic review paper we are examining for this is a March 2023 study:

Personally tailored activities for improving psychosocial outcomes for people with dementia in long‐term care

Ralph MöhlerStella CaloAnna RenomHelena RenomGabriele Meyer

Version published: 13 March 2023

I have taken the main results (and personally some of the more confusing ones) and have broken them down into sections for easy reference later on.

What Is A Systematic Review?

A systematic review paper is a type of research paper that summarizes and evaluates all the existing research on a specific topic in a very careful and thorough way. This helps to provide a complete and reliable overview of what is already known about the topic. It’s like putting together all the puzzle pieces to see the big picture.

In this review update, a total of 11 studies with 1071 participants were included.

Let’s break down the different types of studies (other research projects) that were reviewed:

The 11 studies included in the review update can be categorized into two types of studies: randomized controlled trials (RCTs) and non-randomized clinical trials.

Randomized controlled trials (RCTs) can be compared to a fair coin toss. Just like how a coin toss is a fair way to decide who goes first in a game, RCTs are a fair way to decide who gets the treatment being studied and who does not. In an RCT, the participants are randomly assigned to either the group that gets the treatment or the group that does not get the treatment, just like how a coin toss is random. This helps to minimize bias and ensure that the results are due to the treatment itself and not other factors.

On the other hand, non-randomized clinical trials can be compared to choosing teams for a game based on popularity or skill. Just like how a team captain may pick their friends or the best players first, the researchers may choose participants based on other factors, such as availability or preference. This can lead to bias and confounding factors, which means the results may not be as reliable as those from an RCT.

Randomization is considered the gold standard in clinical testing. So given that, why would a researcher choose to do a non-randomized clinical trial?

There are several reasons why researchers may choose to use a non-randomized design instead of a randomized design:

Ethical reasons: In some cases, it may not be ethical to randomly assign participants to a control group that does not receive the intervention being studied. For example, if a new drug is being tested that is expected to greatly benefit participants, it may not be ethical to randomly assign some participants to a group that does not receive the drug.

Feasibility: In some cases, it may not be feasible to conduct a randomized trial due to logistical constraints, such as limited resources or time.

Research question: In some cases, a non-randomized design may be better suited to answer a specific research question. For example, if the goal of a study is to examine the natural course of a disease or condition, a non-randomized design may be more appropriate.

While randomized controlled trials are generally considered the gold standard in clinical research, non-randomized trials can still provide valuable information and are sometimes necessary due to ethical or feasibility considerations. However, it is important to keep in mind the potential limitations of non-randomized trials and to interpret their results with caution.

In addition to these types of studies, the 11 studies were also divided based on the type of control group they had. Five studies included a control group receiving usual care, five studies had an active control group (activities that were not personally tailored), and one study included both types of the control group.

To make it simple there is typically a treatment group and a control group.

The treatment group gets the treatment, drug, intervention, etc.

The control group doesn’t.

The Main Results From The Paper — Sections Made For Later Reference

Section One

“We identified three new studies and therefore included 11 studies with 1071 participants in this review update. The mean age of participants was 78 to 88 years and most had moderate or severe dementia. Ten studies were RCTs (three studies randomized clusters to the study groups, six studies randomized individual participants, and one study randomized matched pairs of participants) and one study was a non‐randomized clinical trial. Five studies included a control group receiving usual care, five studies an active control group (activities which were not personally tailored) and one study included both types of a control group. The duration of follow‐up ranged from 10 days to nine months.

Section Two

In nine studies personally tailored activities were delivered directly to the participants. In one study nursing staff, and another study family members, were trained to deliver the activities. The selection of activities was based on different theoretical models, but the activities delivered did not vary substantially.

Section Three

We judged the risk of selection bias to be high in five studies, the risk of performance bias to be high in five studies, and the risk of detection bias to be high in four studies.

We found low‐certainty evidence that personally tailored activities may slightly reduce agitation (standardized mean difference −0.26, 95% CI −0.53 to 0.01; I² = 50%; 7 studies, 485 participants). We also found low‐certainty evidence from one study that was not included in the meta‐analysis, indicating that personally tailored activities may make little or no difference to general restlessness, aggression, uncooperative behavior, and very negative and negative verbal behavior (180 participants).

Section Four

Two studies investigated the quality of life by proxy rating. We found low‐certainty evidence that personally tailored activities may result in little to no difference in the quality of life in comparison with usual care or an active control group (MD ‐0.83, 95% CI ‐3.97 to 2.30; I² = 51%; 2 studies, 177 participants). Self‐rated quality of life was only available for a small number of participants from one study, and there was little or no difference between personally tailored activities and usual care on this outcome (MD 0.26, 95% CI −3.04 to 3.56; 42 participants; low‐certainty evidence). Two studies assessed adverse effects, but no adverse effects were observed.

Section Five

We are very uncertain about the effects of personally tailored activities on mood and positive affect. For negative affect, we found moderate‐certainty evidence that there is probably little to no effect of personally tailored activities compared to usual care or activities which are not personalized (standardized mean difference ‐0.02, 95% CI −0.19 to 0.14; 6 studies, 632 participants).

We were not able to undertake meta‐analyses for engagement and sleep‐related outcomes, and we are very uncertain whether personally, tailored activities have any effect on these outcomes.

Two studies that investigated the duration of the effects of personally tailored activities indicated that the intervention effects they found persisted only during the period of delivery of the activities.”

What’s That Mean? Pretend I’m Not A Genius.

If you were to summarize the above (and I mean highly refine) — it may read more like this:

“We looked at a bunch of studies about how to help older people with memory problems and found some interesting things.

Most of the people in the studies were between 78 and 88 years old and had trouble remembering things. Some of the studies had people doing activities that were chosen just for them, while others had people doing activities that weren’t personalized.

We found that the personalized activities might help a little bit with calming down, but we’re not sure. We also found that personalized activities might not make a big difference in how people feel overall. We don’t think there are any bad side effects from doing the activities.

We’re not sure if personalized activities help people feel happier or sleep better, but we know that any good effects only last as long as the activities are being done.”

Going Deeper. For Those, That Want To Learn The Lingo

So for those just wanting the quick and dirty break down you can stop here. For those that would like to get their heads around it — please continue.

So let’s dissect this study excerpt. Starting with the first section above.

Section One: It Was A Group Effort With Multiple Studies

In the paper referenced, five different study techniques were used.

Perhaps the most pressing question in this paragraph is — what’s all this randomized, control group stuff?

Let me try to explain these study setups in analogy format:

Randomized clusters: Imagine a big group of students in a school. To conduct a randomized cluster study, we would randomly select some classrooms (clusters) and assign all the students in those classrooms to either the treatment or control group. This type of study is useful when it is difficult or impractical to randomize individual students.

Randomized individual participants: In this type of study, individual participants are randomly assigned to either the treatment or control group. It’s like picking names out of a hat, where each participant has an equal chance of being in either group.

Randomized matched pairs: This type of study is like playing a game of cards, where each participant is paired with someone else based on certain characteristics (e.g., age, gender, health status) and then one person in each pair is assigned to the treatment group and the other to the control group. This type of study is useful when we want to make sure that the treatment and control groups are balanced in terms of important characteristics.

Non-randomized clinical trial: In this type of study, participants are not randomly assigned to the treatment or control group. Instead, the researchers choose which participants will receive the treatment based on certain criteria (e.g., the severity of illness, and willingness to participate). This type of study is less reliable than randomized studies because there may be factors that influence who gets the treatment and who doesn’t.

The control group receives usual care: In this type of study, the control group receives the usual care that they would get if they were not participating in the study. It’s like comparing apples to apples — we want to see if the treatment group is doing better than the control group even if they are receiving the same care.

Active control group: In this type of study, the control group receives an intervention or activity that is not the same as the treatment, but is intended to have a similar effect. It’s like comparing apples to oranges — we want to see if the treatment group is doing better than the control group even if they are receiving a different type of care.

Both types of control group: In this type of study, some participants are assigned to a control group receiving usual care, and others are assigned to an active control group. This allows the researchers to compare the effects of the treatment on both types of control groups.

Section 2: Why Was That Set Up Beneficial?

The information that personally tailored activities were delivered directly to the participants in nine studies, and in the remaining studies nursing staff and family members were trained to deliver the activities, which is important because it gives us an idea of who was delivering the activities to the participants.

This is important because it tells us that in some studies, the participants were getting one-on-one attention from trained staff, while in others the activities were being delivered by family members or nursing staff. This difference in delivery can affect the outcomes of the studies.

Moreover, the selection of activities was based on different theoretical models, but the activities delivered did not vary substantially. This suggests that although the activities may have been different in some ways, they were still very similar in terms of the benefits they were supposed to provide to the participants.

Section 3: Using An Analogy & Defining The Terms For The Real Nerds

Imagine you have a recipe for a cake, but you’re not sure if it’s going to turn out great or not. You decide to ask a bunch of people who have made the same cake before how it turned out for them. Some people say it turned out well, but others say it didn’t turn out so great. You also notice that some people had some problems when they were making the cake, like they didn’t have all the right ingredients, or they didn’t know how to use the oven properly.

Based on all of this information, you think that the cake might turn out okay, but you’re not sure. You think that maybe if you add some extra ingredients or bake it for a little bit longer, it might turn out even better. But you’re still not sure if it’s going to be the best cake ever.

And how does this cake represent in that section (3)?

In this study, the cake can represent the effect of personally tailored activities on agitation in people with dementia.

The size of the cake can represent the effect size, which was found to be small (represented by a small cake) with a confidence interval that ranged from slightly negative to slightly positive (represented by a range of cake sizes).

The study also found that there were some concerns about the quality of the studies included (represented by the quality of the ingredients used to make the cake) and that there was moderate heterogeneity among the studies (represented by variations in the ingredients used in the different cakes).

The study concluded that while the cake (effect of personally tailored activities) may have some positive impact on agitation, there is still a lot of uncertainty about how effective it is, and more research is needed to be more confident about its effects.

Section 4: Turning Proxy Ratings Into Toys

Imagine you and your friend both got a toy.

Your friend got to pick out their toy and you got a toy that someone else picked out for you.

Two grown-ups watched and asked how much you and your friend liked your toys.

They found out that it didn’t matter if you picked out your toy or if someone else picked it out for you because you both still liked your toys the same amount. They also checked if any bad things happened because of the toys, but everything was okay.

Section 5: Summaries, The Not So Magic Potion

Imagine you have a magic potion that you think will make you feel happier.

You decide to give this potion to some of your friends to see if it works. You give it to 10 friends and ask them how they feel. After you get their answers, you are not very sure if the potion made them happier or not, because their answers were not very clear.

But, you did notice that when your friends took the potion, they didn’t feel worse or sadder than they did before. So, it probably didn’t make them feel worse, but you’re not sure if it made them feel happier.

You also tried to see if the potion helped your friends sleep better or get more interested in activities, but you didn’t have enough information to figure that out. And, you noticed that when your friends stopped taking the potion, they didn’t continue to feel happier, so the potion only works when they are drinking it.

But, isn’t that called the placebo effect?

The observed effect of personally tailored activities on some outcomes may be due to a placebo effect. However, the control group does reduce the significance of that occurring.

Since, according to the information provided, the studies included control groups.

Five studies included a control group receiving usual care, five studies an active control group (activities which were not personally tailored) and one study included both types of a control group.

The control group allows researchers to compare the effect of the intervention (in this case, personally tailored activities) to a group that did not receive the intervention or received an alternative intervention. This helps to control for any potential placebo effect or other factors that may affect the outcomes being measured.

However, in studies without a proper control group or further investigation, it is difficult to determine the extent to which the observed effects are due to a placebo effect or a true treatment effect.

Additionally, the placebo effect itself can be a powerful and useful tool in medical treatment, as it demonstrates the power of the mind and the potential for psychological interventions to improve health outcomes.

Going Even Deeper. (Nerds Only Beyond)

To do this I want to take each paragraph above and highlight mental sticking points we often are too afraid to admit we don’t know. After all, statistics weren’t the most fun for many of us.

What is a standardized mean difference (SDM)?

So, when people do research, they often want to know if something works or not. To figure this out, they compare two groups of people. One group gets a treatment (like medicine or therapy) and the other group doesn’t get the treatment.

When they’re done comparing the two groups, they usually use some math to figure out if the treatment made a real difference or not. One way they do this is by using something called a “standardized mean difference” or SMD for short.

The SMD is a number that tells us how much the two groups are different from each other. If the SMD is a positive number, that means the treatment probably worked and helped the people who got it. If the SMD is a negative number, that means the treatment might not have worked very well, or maybe not at all.

The SMD number can range from a negative one to a positive one. If it’s a very small number (like 0.1 or 0.2), that means the treatment didn’t make a big difference. But if it’s a big number (like 0.8 or 0.9), that means the treatment probably worked well!

SMD values can be positive or negative, indicating the direction of the effect, and the larger the value, the greater the effect size.

What’s this mean in simple terms “ standardized mean difference −0.26, 95% CI −0.53 to 0.01; I² = 50%; 7 studies, 485 participants)? (Section 3).

This means that after looking at 7 different studies with a total of 485 participants, the researchers found that the personally tailored activities were slightly helpful in reducing agitation.

The standardized mean difference was -0.26, which indicates a small effect size. The 95% confidence interval ranged from -0.53 to 0.01, meaning that there is a 95% chance that the true effect size falls somewhere between those two values.

The I² value of 50% indicates moderate heterogeneity among the studies, meaning that they differed somewhat in their results.

Why “I” Matter

The I² value is a number that helps us understand how much the results of the different studies in the research are similar to each other. It tells us how much variation there is between the results of the studies.

If the I² value is high, it means that there is a lot of variation between the results of the studies and they may not be very similar. If the I² value is low, it means that the results of the studies are more similar to each other.

Think of it like this: imagine you have a bunch of toy cars that are all supposed to be the same, but some of them are different colors, some have different wheels, and some are a little bit bigger or smaller. If there’s a high I² value, it’s like the toy cars are all very different from each other. But if there’s a low I² value, it’s like the toy cars are more similar to each other and they all look more alike.

How is “I” relevant to this study?

The I² value is relevant to this study because it tells us how much the results of the different studies included in the analysis vary from each other.

When the I² value is high, it means that there is a lot of variation between the results of the studies, which can make it harder to draw a clear conclusion. In this study, the I² values ranged from 50% to 80%, which means that there was some variation between the results of the studies, but it was not too high.

The researchers were still able to combine the results of the studies in a meaningful way to try to determine the overall effect of personally tailored activities on the outcomes they were interested in.

Ok, that’s great, if I am a stats wizard. But, explain this like I am a child…

When researchers did a bunch of studies on something, they found a small difference between the group that got special activities and the group that didn’t. There’s a pretty good chance that the real difference is somewhere between a little bit better to no better at all.

Also, the studies were kind of different from each other, but not too much.

Does that mean it didn’t work?

Not necessarily. The standardized mean difference of -0.26 means that there was a small effect size. However, because the 95% confidence interval ranged from -0.53 to 0.01, it is uncertain whether the intervention was truly effective or not. The I² value of 50% indicates that there was some variation among the studies included in the review.

How high can the confidence range be and how low?

The range of a confidence interval can theoretically go from negative infinity to positive infinity. However, in practice, the range is usually set based on the data and statistical methods used in the analysis.

The upper and lower bounds of the range are determined by the level of statistical significance chosen for the study (often 95% or 99%) and the variability of the data.

The wider the range, the less certain we are about the true value of the parameter being estimated. The narrower the range, the more confident we are about the true value of the parameter being estimated.

Think of the confidence interval like a game of darts

A confidence interval is like a target, with the true effect size being the bullseye. The 95% confidence interval is like a ring around the bullseye, indicating the range in which we are pretty sure the true effect size falls.

The width of the ring is determined by the amount of uncertainty or variation in the data. Just like a larger target allows for a wider range of shots to be considered successful, a wider confidence interval means there is a wider range of possible effect sizes that are considered plausible.

So in section 4, what’s this mean: MD 0.26, 95% CI −3.04 to 3.56; 42 participants; low‐certainty evidence?

The MD value stands for “mean difference”. It is a way to measure how much two groups differ from each other on a particular outcome. The range of the MD value depends on the specific outcome being measured, so it can vary from study to study.

Because remember, in a systematic study — were looking at a bunch of different studies and weighing them up. Some studies are positive some aren’t. The systematic review weighs it all up to come to the best conclusion.

For example, in the study you mentioned, the MD value for the self-rated quality of life was 0.26. This means that, on average, the group receiving the personally tailored activities had a slightly higher quality of life score than the group receiving usual care, but the difference was not very big.

The range of the MD value, in this case, was from -3.04 to 3.56, which means that there is a lot of uncertainty around the true difference between the two groups. This is indicated by the wide range of values in the confidence interval.

Meaning…

The difference was a value of 0.26, but we are not very sure if this difference is true or just happened by chance. This is because there is low-certainty evidence, which means we need more studies or more information to be more certain about this result.

So just to be clear, is this a percentage?

No, the standardized mean difference (SMD) is not a percentage.

It is a measure of the difference between two groups, usually a treatment group and a control group, in terms of the standard deviation of the outcome measure.

The SMD is expressed in standard deviation units, and its value can range from negative infinity to positive infinity. An SMD of 0 indicates no difference between the two groups, while a negative SMD indicates that the treatment group performed worse than the control group, and a positive SMD indicates that the treatment group performed better than the control group.

Let’s say you have two buckets filled with water, and you want to know if one bucket has more water than the other. You pour the water from one bucket into the other until both buckets have the same amount of water. The amount of water you poured from the first bucket to make them equal is like the MD. If you poured 0.26 liters of water, then the MD would be 0.26. If you poured more water, the MD would be larger, and if you poured less water, the MD would be smaller.

That’s A Ton Of Info, Remind Me Again What The Differences In The Groups Are.

To refresh your memory at this stage, the control group received usual care, meaning that they did not receive any additional or personalized activities beyond what they would normally receive in their care setting.

VS.

The treatment group refers to the participants who received personally tailored activities, which were interventions designed to improve their well-being and reduce the symptoms of agitation or other negative behaviors associated with dementia.

So given all this information, what’s it mean?

All of that breaks down to this:

Overall, more research may be needed to determine if the intervention is truly effective or not.

In terms of the effect size, the study found low-certainty evidence that personally tailored activities may slightly reduce agitation and moderate-certainty evidence that there is probably little to no effect on negative affect.

To further, as soon as it was stopped, any positive effect stopped and long-term outcomes weren’t significantly different.

In addition, there was very uncertain evidence regarding the effects on other outcomes such as mood, engagement, and sleep-related outcomes. So while some effects were found, the overall picture is somewhat mixed and more research is needed to draw more definitive conclusions.

Ref:

Möhler R, Calo S, Renom A, Renom H, Meyer G. Personally tailored activities for improving psychosocial outcomes for people with dementia in long‐term care. Cochrane Database of Systematic Reviews 2023, Issue 3. Art. No.: CD009812. DOI: 10.1002/14651858.CD009812.pub3. Accessed 16 April 2023.

Test your neurotransmitters at-home here.

--

--

Dr. Anthony Close

Founder and CEO of Lab Me Analytics (www.labme.ai). Creating meaningful narratives around blood test results using AI.