Breaking News

How to e-mental health: a guideline for researchers and practitioners using digital technology in the context of mental health

This is an expert consensus of the work of international researchers in the field of e-mental health aiming to promote the methodological quality, evidence and longer-term implementation of technical innovations in the healthcare system. For this purpose, the original author group from Germany investigated and contacted leading experts worldwide in the field of e-mental health based on Google Scholar profiles, groundbreaking publications and achievements, and personal recommendations. Thirty-six e-mental health experts were invited to contribute with their knowledge, provide an overview of the current state-of-the-art and give practical suggestions, resulting in 25 authors and a think tank contributing actively (Fig. 1). The authors’ expertise covers multiple disciplines (psychologists, psychiatrists, computer scientists and industry) with different working areas (clinical studies, (tele-)psychotherapy, mental health state assessment, development and conducting of digital interventions in the field of mental health, app development and artificial intelligence (AI)) in children, adolescents and adults around the globe. We sought diversity in terms of research seniority, culture and gender.

Fig. 1: Flow diagram illustrating the paper creation process.
figure 1

The following researchers are denoted by initials: P.C., Per Carlbring; N.E., Narges Esfandiari; J.L., Johanna Löchner; A.N., Alexandra Newbold; T.R., Tobias Renner; C.S., Caroline Seiferth; L.V., Lea Vogel.

For finding consensus on relevant recommendations and guidelines for clinicians and for researchers, an adapted, structured Delphi procedure in nine steps based on iterative feedback and co-reviewing by the authors was implemented14, guided by L.V., C. Seiferth and J.L. First, a list of the three most important objectives was discussed within the authors and agreed on: (1) development, (2) study specifics and (3) evaluation of e-mental health assessments and interventions. A total of 15 topics were brainstormed within the objectives (Terminology; Where to start; Content, Participatory research; Target group; Suicidality; Data protection and data security; AI in assessment and intervention; Sensing and wearables; Dropout rates and compliance; Efficacy evaluation; Ecological momentary assessment (EMA); Transfer into (clinical) practice; AEFs). Thereafter, the topic ‘Dropout rates and compliance’ was removed as a separate chapter, and the section ‘Where to start’ and ‘Terminology’ as well as ‘Target group’ and ‘Participatory research’ were combined. Second, authors were grouped into teams due to their expertise and preference (two to four for each section). All author teams were asked to include the latest literature and findings, clear recommendations respecting their topic, potentially helpful links for further literature recommendations, and a list of ‘dos and don’ts’ for each topic (Supplementary Information). After all authors delivered their first drafts, L.V., C. Seiferth and J.L. reviewed the content and checked redundancies, and synthesized all parts into one piece on an online document, accessible and editable for all authors. Consequently, all participating authors were asked to review the whole paper and comment on each section regarding (1) discrepancies, (2) agreement with their own experience, (3) literature recommendations and (4) other comments. First authors of each section finally discussed and/or integrated such comments with the support of C. Seiferth, L.V. and J.L., who again developed a second clean version that was then handed over to more senior researchers in the field (P.C., N.E., A.N. and T.R.) for a global check-up and proof of coherence. Minor issues (typos and references) were resolved by the first and last authors, and specific comments were fed back to the author teams and either discussed, integrated or dismissed. To achieve a final version and common ‘ground truth’, the last issues were syndicated, and all authors reviewed the again cleared paper and consented.

E-mental health development

Where to start?

The implementation of any e-mental health project—assessment or/and intervention—is preceded by the fundamental decision about whether a digital approach is appropriate to address the specific research issue. Researchers need to identify the characteristics of the problem that allow for a digital operationalization (that is, multi-faceted, context sensitive and time sensitive) and the specificities of applying technology (that is, problem definition). Once it is established that a digital solution is the most appropriate approach, researchers can be clear about (I) their objectives, theory and hypotheses, which they aim to investigate. This may guide through several decisions that need to be taken throughout the process (Fig. 2). Furthermore, (II) the specification of the main target group (that is, demographics, mental health disorder and cultural background) and the target group involvement (that is, participatory research) is essential for following decisions, such as (III) the extent (for example, self-guided, partly guided and blended counselling) and nature (for example, on-demand, asynchronous, chat and video-based) of the delivered approach.

Fig. 2: E-mental health study conceptualization process.
figure 2

The figure illustrates nine major stages that need to be considered when developing a study in the context of e-mental health.

Digital technologies can be used to facilitate communication between practitioners and/or patients and can vary in their intensity of communication. Thus, the level of interaction between users and providers needs to be defined (for example, guided by a research team for technical support, or therapists). Furthermore, content transfer may range from passively reading a text versus clicking, and engaging more actively with the digital solution or with a coach/therapist. This, greatly depends on (IV) the chosen type of platform that is used to deliver the e-mental health service (for example, online and offline, browser or app). In this context, sensors can also be used (for example, touch, motion, pulse and gaze) to provide direct feedback about physical and emotional responses. This decision also depends on budget and collaboration with (external, potentially commissioned) technology companies, self-made toolkit supplies for e-health studies, in-house IT support and/or cooperation within a project with a technical partner.

For high-quality assessment and interventions, (V) best-practice and evidence-based components should show the foundation of digital solutions. Furthermore, the definition of (VI) the technical development process, including different disciplines, experiences, work cultures and (project) aims should not be underestimated. Ideally, an agile, iterative process in a multidisciplinary team is set up to develop and transfer psychological content into an attractive digital solution. Especially for the implementation of gamification features, interactive content and delivery logic, an interdisciplinary shoulder-to-shoulder working culture is most promising. Together with the technical experts, (VII) decisions about data flow, data storage, access and transparency need to be taken and the following procedure clearly defined. Study participants should be well informed about such details and comfy their trust in academic e-mental health research (as a quality criterion, diverging from more commercially driven supplies). Following these steps, the (VIII) risk management strategies and dropout prevention may be defined. As a final step, the research team may determine (IX) what study design best suits the proof of objective and hypotheses. Naturally, those defined steps interact, are dynamic and need to be reconsidered during the whole process. In addition, other specific frameworks and guidelines may support researchers and clinicians in their project planning decisions15,16,17,18,19,20.

Intervention content development

The process of content development for a multicomponent e-mental health intervention is two-staged. First, researchers need to select psychological and psychotherapeutic strategies based on existing evidence or best-practice approaches for the selected target group and intervention aim. Second, the components of the intervention need to be transferred to the digital solution. This technical translation poses a range of pitfalls and therefore requires a highly iterative and dynamic research approach that should take place within a multidisciplinary team (for example, mental health professionals, software engineers and design experts6,21).

A pragmatic approach comprises converting existing resources, such as applying psychological content from text-based manuals, exercises or questionnaires in agreement with the original authors. However, information displayed in digital solutions follows a rather different temporal and architectural structure and the user engages with the app with a different ‘user mindset’ because app use occurs at varying times, with varying intensity, in varying contexts. To consider these peculiarities of the digital environment, a substantial amount of time, financial costs, perspectives and tests must be dedicated to the process of transforming specific components of traditional health interventions. More concretely, this means that each piece of content must be condensed to the core aims and elements that are to be conveyed through the digital solution. It is important that the structure (for example, division into modules, sessions/lessons and exercises), delivery logic (for example, temporal availability of content) and complexity of content is always set against the background of the targeted group, and outcomes of the intervention. Once the crucial elements and user needs are determined, a user experience story may be developed.

Engaging elements may enhance a positive and reinforcing environment (for example, text, audiovisual, prompts, quizzes, self-report questions and gamification features). Although forced guidance through an assessment or intervention may be needed to address the research objectives, flexibility and the personalization of features (that is, just-in-time adaptive interventions22) are likely to be beneficial to increase the attractiveness of an app-based or smart-phone intervention23. In general, the content should match the ‘look and feel’ of the digital format (for example, length of a video and amount of text displayed6,24). It is also necessary to consider which resources are realistically available to the development team and if it is possible to develop new, customized multimedia elements. Finally, content development should be specifically focused on the target group.

User-centered design and participatory approaches

The implementation of user engagement participatory research within the development of e-health interventions is currently recognized as a way to increase the ease of use as well as the likelihood to fit the users’ needs. It is therefore recommended to limit common problems such as low uptake, high complexity and poor fit to the user’s needs. Participatory research actively involves end users, healthcare professionals and other stakeholders in all stages of the development and research process (including the formulation of the research question and goal, planning research design, selection of research methods and outcomes, interpretation of results, and dissemination) by taking into account their views, needs, expectations and cultural background25,26. For a participatory approach, it is mandatory that end users also participate in the decision-making processes27.

In the field of e-health intervention development, user-centered design (UCD) has been established in recent years28,29. UCD represents a systematic, iterative process with three phases during development30. First, an initial investigation of the users’ needs should be conducted (for example, differentiating children, adolescents, adults and elderly users). The purpose of the first phase is to identify the needs of the target group, and to identify features and characteristics of the intervention that would be acceptable and preferred. For example, strategies such as personalization, gamification and including a social component have been identified as important for the users’ engagement31,32. Focus groups or interviews with future users or individuals in their environment (for example, therapists) and/or open-ended written survey questions are suitable methods for user needs assessments. Qualitative research methods (for example, thematic analyses) are suitable for establishing UCD guidelines33. Second, a prototype with key features of the intervention should be created, which can be used in usability tests30. During the third step, usability tests, researchers observe potential users interacting with the prototype in a controlled environment, while they are simultaneously thinking aloud34. Researchers take notes about the participant’s behaviors, comments and issues, to uncover and adapt functional and design flaws29,30. This phase is a balancing act between drawing evidence-informed strategies and content from the literature and combining them with ways of delivering this information in an acceptable and engaging way. Continuing to engage with the target group at this stage ensures that, when the product is finalized, the target group has been involved and has provided continual feedback and guidance throughout the process, maximizing the likelihood that the final product will meet the needs of the users. It must be noted that UCD represents a preliminary stage of participative research as participation takes on a strictly consultative role and the project’s decisions are still in the control of the researchers35. To achieve meaningful participation, it is necessary to involve end users as early as possible in the research process and in all decision-making processes.

Focusing on the target group’s specific needs is particularly important when it comes to digital interventions, with notable variability in the aspects of technology that will appeal to different groups of users36. Clinical observation shows less adherence when participants expressed a wide range of needs but the digital treatment addresses a single disorder37. When identifying the target group, specifics that should be considered include age, gender, cultural (racial and ethnic) background, delivery context and delivery format. This information can guide the best ways to engage with the specific target groups at the outset of the project. Additional questions relate to the answer if it is recommended by the mental health professionals. In a UCD, all potential specificities must be explored together with the target group38.

Once the specifics of the target group have been identified, the next step is to conduct an appropriate stakeholder engagement process with all parties involved in the delivery, dissemination and implementation of the intervention, as well as the end users. In addition, the examination of the usual consumer behavior by the specific target group may be helpful, for example, what kinds of health apps are used, how often and what features are more or less appealing.

Study specifics

Managing suicidality

E-mental health research is often conducted with participants recruited via the internet without any face-to-face contact throughout the entire research process. As a result, both researchers and institutional review boards express great uncertainty about how to manage participants who are experiencing severe mental health crises such as suicidal thoughts or behavior (STB)39,40. In common practice (not only in e-mental-health research), individuals with a history of suicidal behavior or who affirm suicide-related questionnaire items (for example, item 9 of the Patient Health Questionnaire-9 (PHQ-9)) are often excluded from trials at baseline40. This practice, however, results in almost no increase in safety for participants, because it overlooks that suicidality often is a highly fluctuating symptom41 and study participants may conceal their suicidal ideation to be admitted to the study42. Moreover, while there is an established association between suicidal ideation and previous suicidal attempts with subsequent suicidal behavior, their practical predictive utility in differentiating individuals who are likely to exhibit suicidal behavior from those who are not is limited43. Indeed, most people who die by suicide do not score in commonly used suicide risk assessments44. Thus, excluding participants who score on suicidality items primarily reduces the external validity of study results45, which poses potential risks to users when these interventions are implemented in real-world care.

Given the impossibility of eliminating the risk of suicidality in e-mental-health research, we propose implementing the following measures to increase participant safety during the intervention, as it has been practiced in prior randomized controlled trials (RCTs) of digital interventions specifically designed for individuals with STB46. The assessment of STB should be expanded, including the use of specifically validated questionnaires47. At any point where participants may potentially report suicidality (for example, in the intervention or questionnaires), it must be ensured that this is noticed by the study team. The study protocol should explicitely outline how to react to reports of STB. It is cruical to ensure that all designated team members receive appropriate training and supervision to effectively implement specified procedures. This reaction can, but does not necessarily need to, include a telephone or other contact by the study team. However, in case of a disclosed immediate and definite plan for suicide, the country-specific emergency services should be informed. Participants should be clear about these procedures as well as about the time frame within which their entries will be seen by a member of the study team. We recommend documenting this in the informed consent. When STB is reported, detailed and visible information on support and contact services (for example, national emergency numbers and 24-hour help lines) should be provided automatically including low-threshold click-to-call links. The use of other forms of treatment should not lead to an exclusion from the trial. Instead, individual crisis plans should be developed together with the participant. For studies with particularly vulnerable study samples, a collaboration with local emergency centres should be arranged in advance. In intervention trials for mental health disorders, optional modules that specifically target STB should be available46,48. In general, help options should be equally available to all participants, irrespectively of their group allocation and the type of intervention35.

Data protection and data security

Substantial deficiencies in data protection and data security may inhibit e-mental health assessment and intervention studies49,50. The focus of data security is to prevent unwanted data loss and the unauthorized manipulation of data. The protection of personal data (for example, patient contact details) is of uttermost importance in e-mental health applications.

In the development of an e-mental health offering, it must be anticipated that users may unintentionally reveal their access data, lose their devices or use the devices for other (harmful) actions (for example, children visiting adult websites). To counter these problems, tools can be installed on devices that lock access to other content. Pre-configured and password-protected study smart-phones should be used. Two-factor authentication prevents mass registrations by fake users that can lead to poor data quality. In any case, users should be thoroughly informed about typical problems and dangers. This also applies to harmful software that the user captured unintentionally (for example, keyloggers and spyware spy on sensitive data).

Further challenges include incorrect programming, which can enable unauthorized access to sensitive data. Therefore, a quality-assured software development process is essential51. If a manufactured app is used, the data should be stored in the healthcare institution’s storage facilities rather than in the manufacturer’s cloud. An external data hosting service provider should be certified. No data should be stored permanently on the device of the user, and a virus and Trojan scanner should be installed. Immediate data transfer instead of data storage on the device as well as automated data backups could also ensure data quality. To prevent an attack where data traffic is intercepted, manipulated or deleted, an end-to-end encryption (via Transport Layer Security/Secure Sockets Layer (TLS/SSL)) should be used to transfer data. There should be brute-force attack protection built into the platform and all information in the database should be encrypted using a high-end algorithm with separate keys for each study.

The most effective measure is the pseudonymization of sensitive data, which makes it worthless for unauthorized persons without any additional information52. The process of pseudonymization and internal de-pseudonymization of the data must take place in a separate system52 and be considered even before the selection or development of an e-mental health system. Data protection and transparency are especially relevant for the use of AI methods.

AI in assessment and intervention

AI holds great promise for e-mental health, largely owing to the advances in affective computing. The latter includes the analysis, synthesis and reaction to the affect and emotions of humans using the former. The past decade has seen major progress thanks to the rise of deep learning as an enabler in (generative) AI53.

Likewise, great progress has been made in the recognition of emotion (for example, in categories or dimensions such as arousal and valence), depression (for example, in ‘dimensions’ such as depression assessment questionnaires as the Beck Depression Inventory-II (BDI-II) or PHQ-9), or other mental health disorders54,55. The means of assessment serve mostly audio (for example, speech), video (for example, facial expression, body posture and gait), text (written or spoken language) and physiology (for example, via heart rate and skin conductance). A series of research competitions (for example, Audio/Visual Emotion Challenge (AVEC))56 have been benchmarking the progress of the community including tasks of the above from these modalities. Additionally, several reports exist on successful emotion and depression analysis from phone usage data, touch and other information sources57. At the same time, readily available toolkits independent of the application domain and target tasks are sparsely available as ‘out of the box’ solutions. Usually, training these to match the target domain and target task is required. Also, robustness of real-world applications ‘in-the-wild’ has increased notably over the past decade58. However, not all free-for-research and beyond solutions include state-of-the-art de-noising, target person verification or features such as subject adaptation. In addition, while such solutions often work largely independently of the subject, most of these tools are mostly geared towards a specific culture or language, or another context, due to the data they were trained upon. For practical solutions, this usually requires re-training of such tools or ‘engines’ on the target data.

Emotion can also be synthesized with increasing realism by AI and recently deep learning approaches—often reaching human-level or close-to-human-level quality for speech and image or even language rendering. This led to effective virtual agents such as the ‘sensitive artificial listeners’ that may be implemented in clinical practice for assessment and interventions. Again, platforms are available open source and free for research, but usually require some adaptation to the target task. Most notably, the AVEC challenge series had recently hosted the first ever ‘AI only’ depression challenge, where interviews were conducted by an AI, and the recognition of depression severity was also conducted by AI reaching competitive results concerning human assessment considering the subjective nature of the task.

The recent past brought further breakthroughs in AI and particularly deep learning by the advent of transformer architectures, and diffusion approaches enabling a next generation of abilities in recognition of affect, and generation. This era is also coined by the ‘foundation models’: these extensive data pre-trained models are marked (1) by convergence, that is, rather than training ‘your own model’ from scratch, the trend is to use these models and fine-tune them to one’s needs, which led to considerable improvements in a field, where data is continuously (too) scarce; and (2) by emergence. The latter is fascinating, as, while these models may not have been trained on tasks in affective computing or such relevant to e-mental health, they may show emergent skills in these stemming from the sheer ‘big’ quantities of data they were trained upon. The well-known ChatGPT (a general pre-trained transformer (GPT)) was recently shown to predict suicidal risk at competitive performance ‘out of the box’ levels59. This is without fine-tuning and training on the task, when compared with traditional and deep approaches fully trained on the task59. Similarly, Dall-E 2—also based on GPT—can paint emotional faces from verbal descriptions, arguably also emergent behavior from the perspective of affective computing. In short, we seemingly enter an era in which e-mental health relevant skills can emerge in AI of the present and the future, whereby these ‘foundation model’ AIs are often trained on internet-scale data such as all of Wikipedia, several years of speech, or millions of images. Such models could render even explicit training of tasks increasingly obsolete. In combination with the increasing power of generative AI (GenAI), interventions could be produced in a rich manner including questioning and chatty communication, potentially including the audiovisual rendering of artificial therapists, which are highly personalized and socio-emotionally empowered. Current foundation models such as GPT-4 or Metaverse as virtual space may be only a sneak preview of the oncoming power and abilities, which may help overcome the uncanny valley of such artificial therapists and help AI get to know patients better than any human depending on their data access. Accordingly, they might also soon be able to influence us in strange ways.

Potential dangers may relate to AI-driven chatbots or generative AI such as ChatGPT, which can be charismatic and appear emotionally involved due to expressing emotions (with emojis or an empathic language)60. This interaction may implicate an image of being a friend or human but if assumed would be highly unethical. Because individuals with mental illness are a vulnerable group, often longing for appreciation and security in social contacts, great emphasis must be put on ethical guidelines. Bot-based interaction must be recognized as non-human to minimize the possibility of manipulation and harm or even dependence on the interaction with such AI—potentially at the cost of human relationships61. Furthermore, such AI may find its ways of behavior, which may be even more persuasive and change the human-to-human behavior of such interacting with it in the long run. Asking participants to use sensing and wearable data collection tools can often provide supplementary data to support AI research methods.

Sensing and wearables

Historically, diagnosing mental health conditions has relied on thoroughly validated self-report questionnaires. While questionnaire-based assessments are an indispensable source of information in this context, they are based purely on introspection, can lack vital information that is systematically neglected by the patient (for example, due to self-other knowledge asymmetry62), are temporally constrained (that is, one-time, infrequent assessments), limited in granularity (that is, in terms of a selection of questions in the anamnesis) and suffer from floor and ceiling effects (that is, lack sensitivity to change at their scale’s extremes). Foremost, it is becoming increasingly apparent that patients are likely to be not able to self-report the fine-grained and complex patterns of behaviors in various situations of daily life that characterize their physical and psychological traits, states and changes in these.

The ongoing evolution in mobile sensing and mobile computing and communication technologies ameliorates this situation. More and more sophisticated and accurate sensors in consumer electronics (for example, smart-phones or wearables) allow for the unobtrusive and automated collection of high-frequency, objective, longitudinal data on human behaviors, states and environmental conditions63,64. Figure 3 provides an overview of the variety of data that off-the-shelf consumer electronics sensors can provide.

Fig. 3: Overview of mobile sensors embedded in consumer electronics and variables they provide117.
figure 3

EEG, electroencephalography; EDA/GSR, electrodermal response/galvanic skin response; EG, electroencephalography; EMG electromyography; GPS, global positioning system; SpO2, peripheral capillary oxygen saturation.

On a growing scale, mobile sensing data are increasingly being used throughout the health and behavioral sciences to understand behavioral aspects of mental health through digital biomarkers65,66,67 to detect health conditions and deterioration68,69 and improve conditions through behavioral interventions70,71.

While mobile sensing is becoming increasingly established as a method in mental health research, its standardization is challenging due to rapid and frequent changes in hardware, operating systems, and ethical and legal frameworks, among others12,72. Participants should be aware of often liberal data storage and access policies of companies. While this circumstance has acted as a roadblock in the past, the main mobile operating systems have started to develop standardized so-called application programming interfaces for researchers to access and use in empirical studies (for example, Android Health Connect (https://android-developers.googleblog.com/2022/05/introducing-health-connect.html), Apple SensorKit (https://developer.apple.com/documentation/sensorkit) or HealthKit (https://developer.apple.com/documentation/healthkit)).

However, the most innovative methods can be useless if they miss the mark. While offering specific new opportunities, e-mental health interventions need to be evaluated properly.

Evaluation

Efficacy evaluation, RCTs and other methods

There is no shortage of available e-mental health interventions, most of which are not well evaluated73. However, despite the young age of the field, high-quality evidence is needed from the start, as unreliable results can stick around in a classic canon of literature74 and lead to low quality of developments, or even harm the patients. This section offers recommendations and ideas for how to produce this high-quality evidence. While there are some unique ways to evaluate e-mental health interventions, which will be addressed below, a good starting point for an evaluation study are the same principles that apply to classic interventions: besides observational or case studies, the gold standard to evaluate (mental) health interventions are RCTs. First meta-analyses of e-mental health intervention RCTs show promising effects, even when compared with face-to-face treatments, but also that primary studies have been focusing on a small range of diagnoses and age groups75. A high variance in types of control groups and interventions further reduces the amount of knowledge that can be gained from meta-analyses. Therefore, the field would benefit from further RCTs addressing these issues.

When setting up and selecting variables for an evaluation study of an e-mental health intervention, past studies of classical interventions can serve as an example. Researchers and practitioners still need to investigate any potential adverse treatment effects76 and the importance of mediators and moderators of treatment effects that apply in face-to-face settings (that is, symptom severity, self-efficacy, motivation, age or amount of therapist involvement). Special attention should be paid to therapist effects, which robustly explain a relevant amount of variance in classical treatment outcomes77. For the evaluation of e-mental health interventions, the type of the application (stand-alone, prescribed after seeing a professional, and continued blended care) can influence which therapist effects are present. There might be none if there is no therapist involved, they might be similar to classical mental care, or they might be even stronger, for example, when negative biases of a professional towards digital solutions are present. Studies should aim for an extensive and diverse pools of therapists, also because estimates of the therapist-level random slope suffer from more bias when there are very few therapists in a study78.

Going beyond these traditional evaluation standards, evaluations of e-mental health interventions offer exciting new possibilities. The underlying technological infrastructure has the potential to extend the classical outcome-oriented designs and measures as it becomes more achievable to measure various process variables. These can focus on psychological content, such as therapeutic relationships (for example, rupture–repair79), sudden gains/losses80 or personalized items and networks81,82. Time series data on an individual level will allow new hypotheses to be answered. Also, by using shifting time windows, one can produce a meta-time series of, for example, dynamic variance or critical fluctuations and use their change as an outcome variable80,83,84. Another possibility is to evaluate individuals’ network parameters (for example, networks of symptoms) and their change over time or recurrence plot quantification85,86. In short, the type and amount of data from e-mental health studies can change the classical approach of aggregating first (across participants) and analyze second to analyze first (on the individual level) and aggregate second. Therefore, e-mental health studies have the huge potential to expand the concept of traditional RCTs. Going beyond RCTs, further methodological approaches (for example, A/B testing and trials of principles) can be used to test small differences within an intervention or to test the efficacy of a general principle of an electronic solution (for example, self-monitoring). These approaches of agile science might contribute to the reduction of the time discrepancies between technical development and evaluation results21, which is especially important when working with fast-changing technologies. As a specific option for evaluation, EMA will be discussed in the next section.

EMA

EMA (synonyms: ambulatory assessment, experience sampling method and real-time data capture) encompasses a range of methods that involve repeated assessments of individuals’ dynamical experiences and behaviors in their natural habitat, thereby increasing both ecological validity and generalizability, while minimizing recall biases87,88. This method can be used in various stages of the therapeutic process (for example, diagnostic process, tracking the course of symptoms during treatment and transfer of therapeutic effects thereafter).

EMA offers the possibility to combine subjective assessments with further methods (for example, psychophysiological and physical activity assessments)87. EMA also allows for integrating continuous mobile sensing (that is, digital phenotyping89) to predict critical phases90 and to improve the timing of EMA enquiries87. By providing a detailed picture of mental state and functioning, EMA promises to be more sensitive to capturing change and, thereby, improving the assessment of the therapeutic effects of interventions91. One of the most promising avenues of EMA is the opportunity to extend treatment beyond the clinical setting into real life using e-mental health applications92.

When setting up a study, the following aspects are very important: there are various sampling designs (that is, time-based, event-based, combined sampling schemes). Choose the one that fits your research question. Carefully balance the length of the questionnaire presented at each assessment, the number of assessments per day, and the assessment epoch to ensure high compliance rates93. Also, allow participants to delay or actively decline alarms. Choose an adequate time frame for the questions. Whereas questions referring to the present moment minimize retrospective bias, those with a specific time interval enhance representativity. When deciding on the order, group items with the same time frames, and ask transitory constructs (for example, emotions) first and questions that are not likely influenced by preceding questions (for example, context) at the end. If you must develop new items, use two, or better three, per construct, to be able to determine the items’ reliability94. A crucial point is that the sampling strategy must fit the temporal dynamics of the underlying process; otherwise, results can be misleading95.

Carefully determine the length of the EMA period that is needed to answer the research questions. However, balancing the lengths is key to ensuring participants’ compliance93,96. Meta-analytic results revealed higher compliance rates in studies offering monetary incentives compared with other or no incentives93. Moreover, linking the incentives to a certain degree of compliance might reduce dropouts during the assessment period96.

Transfer into (clinical) practice

To make e-health interventions feasible for real-world settings, the following criteria should be considered. (1) Research should integrate follow-up measurements to assess long-term usage since there is a lack of meta-analysis on long-term benefits of mental health apps as the handling of follow-up measurements and dropouts is inconsistent75,97. Indeed, reviews showed that too few studies used (long-term) follow-up measurements and many showed huge dropout rates of 47% (refs. 98,99). (2) Researchers, developers and practitioners should consider relevant factors to improve adherence to digital health interventions in real-world contexts100. When looking into real-life settings, ref. 99 found in over 10,000 digital mental health apps only 11 peer-reviewed publications analyzed uptake and usage data in such real-life settings. The completion rate was between 44% and 99% in RCTs but dropped to 1–28% when looking at real-world usage. Furthermore, new (machine learning) approaches showed that a distinction into user subtypes and, therefore, personalization of interventions could lessen the effects of interventions101. Thus, researchers and developers should consider relevant factors to improve adherence to digital health interventions in real-world contexts. (3) Integrate mood monitoring, feedback and human/automated support to lower dropout rates102. For example, dropout rates decreased by 46% when therapeutic support was provided and even minimal care with only administrative support resulted in a meaningful decline in dropout rates103. Further, it has been shown that when specific data of EMA are fed back to clients regularly, the number of missing EMA data is low (<10%) and reduces over time84. Digital health is a global challenge, but the implementation of digital health interventions is based on complex national and local economic and political processes. (4) Hence, when conceptualizing and evaluating the implementation process of e-mental health interventions, researchers and practitioners should always consider the integration of all relevant stakeholders that will be involved in the final roll-out of the digital interventions, such as lived experience users and beneficiaries, companies, health insurance or other political institutions and decision-makers. We argue that for each digital intervention a unique approach for its roll-out should be considered and developed along with its scientific evaluation. Target groups, clinical scope (prevention or intervention), business models, funding strategies, long-term technical maintenance, requirements for quality management, regularity frameworks, data safety, market access and reimbursement schemes are only some examples to be considered. The exploitation of evidence-based interventions may further benefit substantially from the flexibility, variety of resources and agile methods of industrial partners. Even where the process is successful, any ongoing quality control in clinical practice is substantial and very challenging in a dynamic field of tech industries. Furthermore, potential side effects tend to be underestimated, leading to a broad supply of unapproved interventions.

AEFs

As the number and diversity of e-mental health solutions increases, so does the need to evaluate which are most effective and safe. While regulatory bodies are beginning to approach the regulation of primarily mobile and web-based apps but also other sorts of digitally delivered interventions, most efforts remain nascent104,105. This means clinicians and patients must rely on tools such as AEF to help them make more informed decisions. While there are also an increasing number of AEF, there are differences in their approaches with some providing scales versus ratings or subjective versus objective metrics, and others information versus databases. Each approach has a unique value depending on the use case and clinical needs.

Perhaps the largest category of app evaluation is scales or frameworks that provide guidance and information on how to consider an app. For example, the American Psychiatric Association’s app evaluator framework106 provides a four-step process with corresponding questions about privacy, efficacy, engagement and clinical utility. While this framework does not provide scores or ratings, there are other frameworks such as the Mobile Application Rating Scale107 that do. Often these rating systems require training before they can be properly used. A study108 reviewed popular frameworks through the lens of diversity, equity and inclusion and found that only 58% included related metrics that offer a target for future efforts and evaluation criteria, whether subjective or objective.

A related consideration in app evaluation is the use of subjective versus objective metrics. For example, questions about the aesthetics or usability are inherently subjective and will vary between users. Examples of objective metrics may include videos or music in an app. Each approach has merits and subjective evaluations, often in the form of user or expert reviews that can provide rich contextual information about an app. However, it can be challenging to keep these reviews updated and current in the rapid-paced world of apps109. Objective metrics may not offer such context but often provide easier-to-update approaches that may have higher inter-rater reliability by their very nature. One example of such an approach is the Mobile App Index and Navigation Database (mindapps.org), which rates apps across 105 criteria that are derived from largely objective criteria110.

A further consideration is how users can engage with any AEF, whether it offers subjective or objective metrics, frameworks or ratings. Some approaches, such as Psyberguide, Mobile App Index and Navigation Database, and the UK National Health Service Apps Library, maintain websites that users can search, while others provide only the rating scale or related educational material. The impact of either approach remains unstudied although recent research suggests that digital literacy and health app awareness are important related factors for app use111. Some newer approaches such as the adapted Mobile Application Rating Scale have been proposed, with the authors suggesting the need for concomitant support from a coach or digital navigator112.