Just how much better is a Perceptual User Interface (PUI)?

10 min readDec 4, 2021

This past weekend, I finished conducting a head-to-head quantitative comparison between a traditional interface (GUI) and perceptual interface (PUI) when the content and purpose of both interfaces remain the same, but input modalities are changed; the results of which can be found in this research paper.

This article aims to document my process of arriving at the chosen topic of study and how I ended up validating my hypothesis. If you’re here just for the data, you can skim ahead to the section titled, So, what did I find out?

Well, where did the thought come from?

During an academic project in my third year at IIAD, Mr. Shyam Attreya (a faculty at IIAD) discussed how metropolitan cities in India such as New Delhi witness long queues at places such as hospitals & airports. This is primarily because certain processes, for example the check-in counter at an airport, are conducted by a limited number of personnel for a crowd much larger than what they can handle.

I remember sitting in an OPD at Max Hospital Panchsheel as people swarmed in to the reception desk to get information about the doctor they’re supposed to see and other such common questions. I observed that each person was occupied by a similar emotion of panic and asked similar questions, in the same pattern. The answers given by people at the desk were also similar. This led me to wonder whether communication patterns, that are repetitive, can be automated and made more efficient.

A common example of such a scenario, something I first saw in an episode of Good Doctor, was the initial screening done for people at healthcare centres during the Covid-19 pandemic.

As I looked around the internet, I came across a Sample Employee COVID-19 Health Screening Questionnaire by OSHA. These questions were made for employers to screen their employees for Covid-19 symptoms.

The questions had a binary answer possibility; i.e a yes or no. This felt like a simple, achievable task that I could then automate and see how much faster the process could be, with the use of technology.

Designing & prototyping the interface

During this phase, I pondered upon the modalities that could be used to interact with the machine displaying a questionnaire. A commonly used modality, Computer Vision, was extremely desirable.

However, I soon ran into a problem. Computer vision generally depends on the computer being able to understand human gestures. This is usually done by training the computer (an example of which can be seen in this Machine Learning introduction by OxfordSparks) which I did not know how to do. I’d recently started developing, that too on a creative coding environment (Processing) and was still trying to figure out the basics.

I decided that instead of having a model that might work, it’d be a more enriching experience for me to first try and understand how computer vision works and therefore, try out all available elementary models that are easier to execute. During this time, I looked at resources by Daniel Shiffman, Aesthetic Programming by Winnie Soon & Geoff Cox as well as the depths of the internet. By adopting a learning by doing methodology (enforced by Mr. Shaaz Ahmed), I ended up creating a bunch of smaller experiments that could aid in my larger goal.

One of my experiments using Daniel Shiffman’s blob tracking algorithm. This was converted into an object-driven E Book reader which you can find on @arjunsarchive.

As I started to try and arrive at a solution for my screening interface, I came across a wonderful library for Processing known as BoofCV, developed by Peter Abeles. I concluded that for an interface that has binary answer possibilities, I don’t really need a program that can understand gestures. All I need is for the head to be used as a replacement for the mouse, with answer possibilities on either the left or right side of the screen. Upon further tinkering, I managed to create a working prototype for the same.

However, this prototype had a big technical glitch because of which I couldn’t test the interface with users. The program worked on “if” statements where there was a “stage” counter that was incremented with each input. Input was determined by an X-Position variable of the head-tracked rectangle; i.e there was a threshold for yes and a threshold for no which, when crossed, resulted in the desired input. However, if a person crossed the threshold once (say you’re saying yes), then the stage would get incremented all the way to the finish which means your answer for all questions would be the same. This is why you see alternating answers in the previous video.

Now this one took time for me to resolve even though it seemed like a simple problem. Nonetheless, this problem was solved by including a forced neutral state after each question. The algorithm structure now looked like this:

Each question had a forced X-Position recalibration with the inclusion of a neutral state; which is when the next question popped up.

Once I resolved the algorithm, I refined the design a little bit as well to help with basic readability and other such factors. However, no additional gimmicks were added to the interface that could enhance usability. The final interface looked like this:

A breakdown of elements on the interface.

Forming my hypothesis

During this time, I went through a lot of literature on Perceptual Interfaces (Turk & Kölsch, 2004) as well as literature around the making and testing of these interfaces, such as this wonderful project by Lenman, Bretzner and Thuresson at the KTH Royal Institute of Technology.

During my literature review, I failed to find any quantitative data on how much better a perceptual interface might be when compared against a traditional interface, when used in a specific social context. Everyone talked about experiential factors based on the generally accepted hypothesis that it will indeed be better.

At the IIAD Library, I came across The Humane Interface by Jef Raskin. Apart from it being a wonderful book altogether, one particular chapter on Quantitative Analyses of Interfaces caught my attention. Raskin, in that chapter, goes on to explain a simple model called the GOMS Keystroke Level model which is based on the idea that the total time it takes to complete a task on a computer system is the sum of elementary gestures that the task comprises.

This was great! I was now in the possession of a model that could be used to break down interaction with any type of interface into an equation based on a mnemonic system (shown below) and compare different types of interfaces for the same task.

Source: The Humane Interface, Jef Raskin.

I made certain adjustments to this model, which you can read about in my paper, to suit the nature of interaction with a perceptual interface. As this hypothesis developed, I created two interfaces that were to be tested one after the other containing the same set of questions but with different input modalities.

Testing and collecting data

As I arrived at the stage where I could test my interfaces, schools & colleges in Delhi were closed off due to high levels of pollution. Therefore, accessibility to participants became the most important factor for screening. I managed to gather 5 (Why 5?) students of IIAD who contributed to this study.

All tests were conducted as one-on-one tests in a moderately controlled environment. The test was divided into two phases. In the first, participants were asked to enter their answers using a head-tracked interface and in the second, they used a Google Forms form.

An interaction comparison of the same action (Entering input + scrolling/entering to neutral state for the next question)

So, what did I find out?

It was interesting to analyse the data after this test. I broke down actions for Question 1 and Question 3 for all participants. Here’s what I found:

There was a decrease in input time post comprehension of the question by 0.5s in favour of the head-tracked interface. Participants shaved off a little more than half a second to enter their answer for a single question, meaning that in scenarios where time to complete a task is an essential factor (such as screening a long queue), perceptual interfaces are more efficient.
On average, a user performed 1.625 lesser steps on the head-tracked interface. These included the removal of scrolling and other trackpad based gestures between post-input and reading of the next question that a user had to perform on a Google Forms form. Whether this leads to less user effort in a PUI is still unclear as I did not have the means to measure energy expended on all the actions during an interaction.
By using follow up questions, it was found that 100% of the participants found the head-tracked interface easier to use. However, as rightly pointed out by Atreyo Roy and Alina Khatri (two of the participants), using the trackpad felt more natural as they were used to/accustomed to using trackpads on a daily basis. This presents an interesting hypothesis for the future: how much of an impact does digital literacy have in the interface effectiveness of a perceptual and traditional interface?

With these three findings, it can be empirically concluded as to how much better a perceptual user interface might be when compared to a traditional one in a specific setting. Although in this experiment, I must mention that the error rate was 0% which is definitely not realistic for a computer vision based interface in complex social environments.

However, what I can say with confidence is that when we remove the barrier of technological problems, a perceptual interface can lead to more efficient computing systems in certain settings. This is something that we, as humane interface designers, can strive for.

Reflections

My major reflection through this project is better articulated by Neri Oxman in her essay, Age of Entanglement. As interface and experience designers, we often tend to differentiate between domains of engineering and design forgetting that the confluence of them can be so much more powerful. Working in isolation to push technology on one side and design on the other is, in my opinion, a delimiter of potential power.
Through the activity of writing a 4000-word paper, I find great confidence in the taught ability to assimilate large amounts of information, analyse and critically dissect patterns which can then be articulated through a specific medium(s). The ingraining of the Double-Diamond model, first introduced in our foundation year, has found its way into everything I do and it was the same for this project. I observed peers get lost in the sea of information and when asked for help, my solutions often revolved around them using some form of the double-diamond model.
The most important reflection for other students is this one. If you’re someone who is cultivating ideas with passion that is not quite reciprocated in your immediate environment, I know how it feels. During parts of my project, I felt at a loss of directed guidance as no one in my college shared a similar passion for HCI whereas typical areas of graphic design and “UI/UX” were discussed all the time. However, every single person I interacted with, even if they were not remotely interested in my field of study, contributed to my project. Yes, you might not get exactly what you’re looking for but there’s something everybody can offer using their own experiences, knowledge and passion. Knowledge exists around you in forms that you sometimes hesitate to accept. Embracing this acceptance can make you feel not-so-lost. And hey, you have books and the wonderful internet bringing you to a level playing field with everyone else in the world. So, don’t hesitate. Ideate and make fearlessly, but with passion.

I’d like to end this article by acknowledging the people without whom this paper & project would have never concluded. I’m greatly indebted to my mentors, Ms. Prachi Mittal and Mr. Suman Bhandary. My peers: Nikhil Shankar, Pratishtha Purwar, Muskan Gupta, Alina Khatri, Atreyo Roy, Kriti Agarwal, Navya Baranwal, Harshvardhan Srivastava and the librarians: Mr. Natesh Subhedar and Mr. Paramjit Singh have been instrumental in shaping the contents of my argument and the testing of my interfaces.

I’d also like to acknowledge the sessions with Mr. Shyam Attreya during his final year at IIAD without which I probably wouldn’t have had the starting thought in the first place.

The last bit of gratitude goes out to the efforts of Daniel Shiffman, the Processing team and Peter Abeles for their respective contributions to open-source software and the propagation of related knowledge without which all of this would have remained an untestable hypothesis.