In the near future, we may gain the ability to extract and continuously monitor data from the human brain.
While this possibility is intriguing, it also presents several challenges. As this is a relatively unexplored field, questions remain about how such data can be effectively and appropriately applied.
Literature Research
User Research
Hi-fi Prototype
Qualitative UX Research
Interface Design Suggestions
The core concept of this project is “Mental Workload.” It’s defined as the balance between the workload required by the task and the actual effort you used when performing a task. In general, different balances can result in 3 different performances and feelings.
In the past, mental workload has been measured to help reduce risks in high-stress jobs, like piloting an airplane. By having participants fill out surveys like NASA-TLX (Task Load Index) or SWAT (Subjective Workload Assessment Technique) while they complete tasks, we can analyze the data to understand how different task processes or environments affect mental workload. This helps in adjusting high-risk tasks to make them safer.
Now, the advance of technology enables us to directly track our brain performance. Through wearing a device that interacts with the nervous system (so called neurotechnology), we can monitor the brainwaves which refers to different brain performance.
There are five types of brainwaves: β, θ, α, δ, and γ brainwaves. These brainwaves coexist in our brains but have different meanings. For instance,
Imagine you wear a device all the day to track your cognitive performance, what do you expect to see?
It may sound futuristic, but this technology is already here. In the near future, tracking your brainwaves with wearable devices may be as common as checking your blood pressure. Several products on the market—such as Muse S, NEUROSITY, and EMOTIV—already use EEG (electroencephalography) signals to provide various MWL-related metrics, such as focus level, cognitive workload, and cognitive engagement.
However, as a novel technology and a new approach to measuring cognitive data, there are many research questions that need to be explored before bringing this technology to market.
This research aimed to explore how people perceive this new technology and whether the current design—particularly its reliance on derived metrics—aligns with their actual needs.
As this study does not aim to create a new MWL metric, but to explore how design can support users in reflecting on and resonating with MWL data, in pursuit of practicality, an existing MWL measurement product will be selected as the reference of this research in order to support the design of MWL data.
EMOTIV MN8 is a headphone-style brain-machine interface that uses dual-channel EEG technology to monitor brain activity.
It provides two types of metrics: performance metrics (Attention, Cognitive Load, and Cognitive Stress) and Cognitive Efficiency (Intense, Optimal, Moderate and Disengaged).
Attention measures how focused a person is during a task.
Cognitive Load reflects how much mental effort is required to complete the task.
Optimal – Balanced state of stress, load, and attention; ideal for productivity.
Moderate – Fluctuating focus; easily distracted.
Disengaged – Low interest and difficulty staying focused.
These core metrics are visualized using four colored light spheres on mobile, and bar charts on both mobile and desktop interfaces.
To discover user opinions/needs on tracking mental workload
To narrow down user needs into several themes
To validate research insights by quick design
Six participants were recruited in total and through snowball sampling. The research didn’t require any specific background as everyone could be the users of Neurotechnology device. Considering diversity of life, values and economic capacity, this research reached out to three students and three non-students.
As this interview was primarily exploring users’ “Attitude” and “Opinions” on MWL, I conducted thematic analysis, which is wildly used to find out people’s views, opinions, knowledge, experiences or values from a set of qualitative data. (Daly et al., 1997).
During the analysis, in-vivo codes would be extracted from interview transcript, and be put into different categories, so called initial categories. Then, initial categories would be clustered to form initial themes. In the end, those initial themes will be refined to generate final themes.
Ten participants were recruited in total and through snowball sampling in TUDelft campus. As a result, all participants are students background. Similar to interview recruitment, the research didn’t require any specific background as everyone could be the users of Neurotechnology device. In total, there are five males and five females participating in this test, with ages ranging between 20 and 30 years old.
During the test, each participant will be wearing a device that monitors their cognitive data. Simultaneously, they’ll be asked to complete revised TSST test, which take approximately 13 minutes in total. After tasks, interview and card sorting are conducted for around 30 minutes.
TSST test
Tasks are designed to simulate stressed scenario. Refer the Trier Social Stress Test (TSST test) (Kirschbaum et al., 1993), participants will go through 3 minute preparation (for speech), 5 minute speech and 5 minute mental arithmetic to induce stress. The primary purpose of these tasks is to induce stress changes in users while they are using cognitive data, instead of focusing on users’ physiological responses under stress. To make the scenario more lifelike, the second task is adjusted to record an motivation video for a job application. Compared to the original TSST test experimental design, this study gives participants more control over the process.
Semi-structured interview
After tasks, one-to-one on-site semi-structured interviews are conducted. The discussion involves their experiences of wearing the device for the first time, the timing of using the data, and their understanding of the current design.
Card sorting
After interviews, card sorting method was used to understand how users perceived the existing functions within the app and what they deemed to be priority functions. Participants were given a predetermined set of categories: must-have, nice-to-have and unnecessary. They were asked to organize different features into these categories based on their expereience.
To widely explore user opinions of self-tracking mental workload, I formed a semi-structure interview script according to Li et al. (2010)’s self-tracking personal informatics theory. Five stages includes Preparation, collection, integration, reflection and action. During this exploratory interview, I focus on exploring preparation, integration, and reflection stages.
Participants often equated MWL with stress and only mentioned peak work or emotional moments when talking about it. This narrow view limited their expectations and made it harder to engage with MWL tracking in a meaningful way. Most also didn’t know how to describe what an “ideal” or “low” mental workload should feel like.
Interestingly, most people see themselves as the experts of their own brain, which makes it difficult for cognitive data products to gain interest. When discussing motivations for tracking mental workload (MWL), participants mainly wanted to validate their brain responses and, if possible, identify sources of stress. However, many expressed skepticism, doubting that the data would provide new insights or drive meaningful actions.
The term accuracy consistently came up across participants’ feedback. Among all concerns, the most frequently questioned aspect was how a single mental state is defined using cognitive data. Unlike physiological data, which often has objective benchmarks, cognitive data lacks a clear standard for interpretation. This ambiguity made participants skeptical about the reliability of such measurements.
2/6 questioned the accuracy of measuring Mental workload
4/6 questioned whether cognitive data reflects real-life experiences.
4/6 doubted the accuracy of defining a mental state through MWL data.
5/6 were skeptical about distinguishing between two mental states using the MWL data.
Most participants found MWL data more useful after a task than during it. They expected it could confirm cognitive states, uncover issues, or offer suggestions — but few could clearly articulate what those suggestions or issues might be. For most, MWL data functioned more as a tool for recording and confirming experiences than for prompting real-time action.
For most users, the idea of tracking cognitive data feels distant. They’re unsure what the data really tells them or why it matters. Since it feels more subjective than things like heart rate or steps, they also tend to trust it less. All of this makes them less motivated to track it.
While users often equate MWL—especially high MWL—with stress, theories suggest otherwise. According to Lazarus et al.(1984) and Gaillard (1993), stress is better understood as a reaction to insufficient cognitive resources, not simply high workload.
Many users question the “accuracy” of cognitive metrics. According to BSI (1994), accuracy includes trueness (how close data is to a reference), and precision (how consistent the results are).
In cognitive metrics, it’s difficult to fully control for both trueness and precision, especially when the true cognitive state is not objectively measurable. However, we can still improve users’ trust by showing what kind of reference values or baselines the data is derived from—emphasizing transparency of trueness.
My design explores several experimental ideas across four main pages—Status Curve, Notifications, Adjust, and Saved. Each page tests different concepts inspired by interview insights and literatures and served as a discussion tool during user testing.
This page is designed to show detailed MWL data. In my design, I focused on how to present metrics in a way that encourages user understanding and reflection.
I referred to Bentvelzen et al. (2023) and their concept of abstraction levels. Research shows that the medium level is most effective for prompting user reflection.
For example, in sleep data:
Raw metric: heart rate
Guided metric: number of awakenings
Derived metric: sleep quality
The theory divides data abstraction into three levels:
Low: includes all three (raw, guided, derived)
Medium: guided + derived
High: only derived
Then, I applied this to mental workload metrics, combining guided and derived metrics to support user reflection
Raw metric: EEG signals
Guided metric: performance metrics- Cognitive load, Cognitive stress
Derived metric: cognitive efficiency metrics- Intense, Optimal, Moderate, Disengaged
To show the balance of cognitive resources, I used two continuous curves for comparison. Curves, rather than numbers, help users reflect more intuitively on changes over time. By presenting them alongside the derived metric, users can better understand their cognitive state and why a certain cognitive efficiency metric was given.
Based on early user research, I offered explanations and suggestions tailored to different metric states—especially to help users know when it’s time to switch activities to avoid fatigue.
This feature is designed to strengthen the connection between data and real-life events, allowing users to interpret metrics within their daily context rather than as isolated numbers.
In early research, I found that users didn’t want to see much cognitive information in the moment, as it could be distracting. However, when designing suggestions and prompts, I realized that if users don’t notice them right away, they may lose their value. So I designed a lock screen widget to show minimal, essential information—just enough to alert users when something needs their attention.
Based on the literature, I learned that MWL data isn’t about judging whether your cognitive state is good or bad, but about recognizing whether the user has entered an ideal state—or stayed in the same state for too long. So I designed the notifications to reflect these two points.
Adjust page is designed to demonstrate MWL data in a different way. It presents MWL data in a more dynamic and engaging way and incorporates simple interactions to help users connect with the data and gradually shift their cognitive state.
Compared to presenting cognitive derived metrics as plain text—which often lacks emotional resonance—I explored ways to make such data feel more meaningful to users. Inspired by the concept of calm technology (Weiser & Brown, 1996), ambient information design uses subtle, non-intrusive visuals to reduce cognitive load and support smoother tech adoption (Ishii & Ullmer, 1997; Chatzitsakyris et al., 2004).
First, I aim to help users better perceive the relationship between guided metric changes and background derived metrics.
Dynamic visuals and natural metaphors—such as flames for overload and drifting stars for underuse—are used to intuitively represent mental workload.
Animated number transitions further strengthen the connection between quantitative data and background states, making the data feel more alive and interpretable.
Second, I aim to encourage users to linger, reflect on their current state, and gradually shift toward a more balanced one.
Interactive prompts, such as screen tapping, are designed to attract attention and invite light interactions.
The Saved page displays all stored data along with the criteria used for each derived metric.
To address concerns from early interviews about the reliability of cognitive metrics, I looked beyond data resonance and focused on increasing transparency. Since many participants believed that cognitive states are subjective, I designed a feature that lets users save their current state and record their own judgment. Over time, this subjective input helps the system gradually adjust and personalize the interpretation of metrics, making them more reflective of individual experiences.
Since this product idea is still quite new, I conducted a user test with a co-creative nature. I used the prototype I created from earlier research and literature as the main tool in one-on-one sessions with 10 participants. They interacted with it and shared their thoughts throughout the process.
During the test, participants wore an EMOTIV headset and completed a 15-minute Trier Social Stress Test (TSST), which includes things like speaking in front of a camera and solving mental math problems—designed to create a mildly stressful situation. While doing this, they were free to use the prototype whenever they wanted—to check their cognitive state or explore any feature.
(Warm-up) First experience of wearing the device
Timing of using the prototype
Functions and priorities
Interview and card sorting results revealed that people care more about the meaning behind Cognitive Load and Cognitive Stress, and what insights they can gain when the two are combined, rather than focusing on Cognitive Efficiency Metrics. While users may not immediately understand these terms, once explained, they expressed strong interest and engaged in various discussions about potential applications.
My design included suggestions for state changes and reminders to reach an “ideal state.” However, users rejected these features, stating that they are not realistic during work and questioned who defines this so-called “ideal.” They emphasized that such advice felt unnecessary and even intrusive.
Though a few users appreciated the attention-grabbing nature of dynamic displays, most felt they lacked a direct connection to the metrics, making them even less trustworthy. One participant even mentioned that just seeing the numbers change was already enough to resonate with the data.
Although there was skepticism around data interpretation and trust, user tests showed that users had little interest in seeing how metrics were calculated or judged. They preferred not to influence the system’s objectivity with their own reasoning.
While early interviews led me to believe users were expecting actionable advice, it turns out that when it comes to the brain, people don’t want to be told what to do. Instead, they want more context—different ways to understand their cognitive state, not instructions.
"To be honest, I didn't really get what "Stress" meant at first, but I found it super interesting when you explained it just now. I actually liked it a lot. It just need more explanations"
Participant #1
"I think I need this chart, but mainly because I want to see its explanation. If you just only show me the chart, it feels like I'm writing a paper and have to figure out the insights by myself."
Participant #10
Interestingly, four participants independently expressed that they would find it more valuable to log and compare their performance data. For instance, they were interested in identifying the times of day when they are most productive or tracking their progress on repeated tasks over time.
“What I'm thinking is kind of like....I want to compare the same thing, so I'd only store data related to that specific event.(E9)”
Participant #9
Participants expressed a clear preference for a shorter time window when reviewing their current performance. While they appreciated seeing data segmented in hourly units, many found a five-hour span overwhelming. One participant commented that for checking real-time performance, they only wanted to see the past one to two hours, with hourly breakdowns.
"I feel this page... has too much information. I understand that the green line represents my current status, and it shows the pattern from many hour ago. But why...? For real-time checking, I’d prefer the most straightforward results—not something that combines multiple hours of data.”
Participant #3
"I assume I'll be using it most often for work. Personally, I tend to break down my work time into hours because my focus lasts for about an hour at most.
Participant #2
During discussions about metric calibration, participants questioned whether cognitive performance should be measured against a public average (like BMI) or against individual baselines. This may explain why derived metrics were less compelling—they represent a generalized standard that may not feel personally relevant.
"I believe this should be about comparing myself to myself... like, comparing how stressed I feel today to how much less stressed I felt yesterday, instead of comparing my stress level to someone else's."
Participant #10
One participant likened this to menstrual tracking apps, suggesting that adding personal notes—such as sleep quality or diet—might enrich the data and help reveal more meaningful patterns for self-reflection.
“I feel like if this app were to give me suggestions, it could be similar to period-tracking apps. They don’t just monitor your cycle—they also let you log your mood and physical reactions each day. Likewise, this app could go beyond simply telling me whether I’m stressed. With some simple input from me, it could give the team more data to analyze my state. A conversational interface could work too. Even a basic Q&A page might help... As you browse, you might think, ‘Oh yeah, that could be something I need to pay attention to.’ It doesn’t need to be super accurate; it just needs to help surface potential issues.”
Participant #5
Ultimately, I believe this discussion should return to the fundamental differences between physiological and cognitive data. For physiological data, derived metrics can be an effective way to capture patterns and trends in the body. However, when applied to cognitive performance, such metrics often raise questions about what standards they are based on, or make it difficult for people to distinguish between different metrics.
After all, the creation of any “metric” involves the setting of a “standard.” When the data itself is highly subjective or open to interpretation, I remain cautious about whether it is appropriate to convert it into a derived metric.
Still, in conclusion, based on the various suggestions brought up by participants, I believe cognitive data is more suitable for personal exploration of performance patterns, rather than being summarized through adjective-based metrics or compared against general population standards.
That said, I want to emphasize that derived metrics still have their value. For one, the literature has already raised extensive discussions around the privacy concerns of cognitive data. Some have even suggested that in the future, employers might use cognitive data as part of their hiring criteria. I believe this points to an important issue: once such technology becomes available, we must consider how the data might be used in interpersonal exchanges and social contexts.
Here’s the problem—we probably wouldn’t want others to access overly detailed or intimate aspects of our personal data. In this regard, derived metrics might actually help reduce privacy invasions compared to raw numbers or data curves.
Similarly, when my study participants discussed their results with each other after the sessions, they tended to refer to summary metrics—saying things like “Mine came out as intense, what about yours?”—rather than going into the specifics of their data curves. I think this shows that derived metrics may be more suitable for human-to-human communication, as they preserve a level of abstraction and personal privacy.
In terms of design, it might be useful to distinguish between two usage scenarios: real-time performance checking and post-task performance comparison. The former is centered around the timeline, while the latter is more focused on events.
When observing and discussing how participants interacted with the prototype, I noticed that although most people checked their data after completing the task, some still chose to view it during the task. The later goals was to confirm whether they were performing well in that very moment.
This led me to a new insight: these two scenarios likely require completely different approaches to display data. Users checking their state in the moment care about their performance over the past 2–3 hours. In contrast, those returning to the data hours or even days after the task may no longer be concerned with the timeline itself. Instead, they might focus on what they were doing during certain periods, or identify which segments of data correspond to similar types of activity.