Optimal Interface Part 2: Output

I’ve been fortunate in recent years to have tried the vast majority of consumer user interfaces and also the software running on each platform that’s widely regarded as best in class for each interface. I’ve written previously about going Back To The Mac and podcasted about using a Microsoft Surface Pro and even tried going Phoneless with just an Apple Watch.

One aspect of my job has been user interface design, conceptualisation and controls and in this series of posts I’d like to explore inputs, outputs and devices in turn, looking at what has worked well and why I think that is as well as what the next inflection points might be.

Part 2: Output

Output from a device to a person must be in a form the person can receive and interpret and hence has to be via one of our senses:

Sight
Smell
Taste
Sound
Touch
Neural

Sight

Visual information works for most of the population and those that are visually impaired, sound and touch are the next most common output mechanisms. Visual information can come in shapes, lines, colours, or written language and can be conveyed by a single flat representation on a flat surface or can mimic human stereoscopic vision with a representation for each eye independently, capable of appearing as a 3-dimensional image in our mind.

The distance between the eye and the target are ultimately the deciding factor about the optimal interface. For tasks below optimal interfaces suggested by use case:

Watching movies, TV shows, Entertainment

Large television screen, viewing whilst seated. User is relaxed while seating, minimal physical fatigue, screen must be sufficiently large such that at the selected seating distance, eye strain is not a concern. Average dwelling size and most common room sizes dictate the optimal screen size between 40-55". As costs reduce larger screens may be possible however become sub-optimal due to increased pixel count requirements. Once wall-sized (maximum height 2m) drives for additional pixel count will become pointless beyond the human ability to discern pixel boundaries at a standard seating distance. Screens that require too much head movement for the user in order to observe every detail from their comfortable seating position are sub-optimal. The lack of adoption of IMAX / OmniMax Theatres which support a much larger format (approximately 1,500 globally) compared to standard cinemas (multiple hundreds of thousands) is in part due to this issue. (Beyond that licensing and format issues also contribute)

Handheld devices are sub-optimal for maximum comfort however being handheld allows for smaller screens, though only useful for short-term viewing periods. Portability of devices results in more damage and shorter lifespans. Head-mounted devices (VR Headsets) are sub-optimal due to the least comfort due to neck strain and poor air circulation around the eyes and face. Whilst some of these concerns can be improved with lighter devices, stronger materials and better design current technological limitations will delay improvements for some time to come.

Information Dense Tasks

Small to moderate screen size, viewing whilst seated or standing is best for this use case. The user is within arms reach of the screen, which leads to a high resolution and display brightness and sharpness for detailed language and text display, reducing eye-strain. Seating position allows minimal long-term fatigue (though standing desks are increasing in popularity to address RSI concerns for some people) and closer viewing position allows more economical screens with higher pixel densities which in turn further reduces eye strain. More modern screens are now UHD (4K) and at 28" diagonally in size have a 157ppi whereas traditionally popular in the 2000s and early-mid 2010s, a24" HD (1080p) screen had only 92ppi.

Handheld devices are restricted by their screen size for portability and handheld fatigue and hence can be optimal for when all required information can be displayed efficiently on a smaller screen, however long-term use drives fatigue and is sub-optimal.

Immersive Gaming

The definition of what is or isn’t immersive can be debated, however Virtual Reality is the best example of being fully immersed for most of our senses. Ultimately though gamers still prefer the comfort of console gaming for long periods on a large television whilst comfortably seated and others also prefer high resolution and higher frame rates afforded by dedicated gaming desktop machines. The ultimate solution is Virtual Reality however the technology will remain sub-optimal until it becomes lighter with improved air circulation and is then able to worn for longer periods.

Glancable Information

Small screens where the visual target is known and direct with minimal other information to distract the user - smartphone screens or watches. The more information on small screens the less glanceable it becomes. Larger screens may have subsets of glanceable information however due to the visual seeking time they are less optimal for this application. For glanceable information to be truly glanceable it must be instantly presented at the moment of the glance, and information must be clear, concise and readily locatable. In this manner a standard wristwatch that displays the time on its watchface is the ultimate expression of glanceable information. Modern smartwatches that do not have always on displays introduce delay as the screen lights upon turning the wrist which is sub-optimal. In addition some watchfaces (eg the Infograph watchface on the Apple Watch Series 4 and 5) can become too information dense in such a small area and becomes less glanceable as a result.

Even the Apple Watch Series 5 implementation that includes an “Always On” does not actively display any second by second information including notification indications when in part-asleep mode; choosing to update the minute hand only as each minute passes. Whilst this is glanceable for the time in minute-increments it is still not ideal compared to a mechanical watch for telling the time to the second or even sub-second, reliably at a glance.

Smell and Taste

Admittedly some have argued the inter-relatedness of these two senses, but considering them together is reasonable from the perspective that scent generation and taste generation are both technologies in their infancy and therefore there are few examples of commonly used applications that can utilise these senses.

Future inflection points might be a scent-generation to provide a sample of how a bunch of flowers will smell based on an online order, or similarly for choosing a meal ordered online about how it might taste before ordering. No prior knowledge of the flowers or food would be required.

Sound

Output sound from device to a person can be via tones, music or speech. The method of delivery of this audible information can come via several technologies: Speakers (broadcast), or personal/individual-only devices such as Over-ear Headphones, in-ear Headphones or Bone Conduction Headphones. Whilst each can have different features such as noise-cancelling technology, sealed, non-sealed, open back and so on the key features of each will be mentioned where relevant but not all will be discussed.

Speakers

The oldest and most common method for conveying sound, these provide the most flexibility in terms of accurate frequency response and when multiple speakers are positioned around a room they can also provide the most realistic reproduction of real world recorded sounds when replicated. They are the most comfortable to listen to as they do not require anything to placed on the head or in the ear so are ideal for longer listening tasks with minimal fatigue. They would be used exclusively if not for the fact that they can only be used when all of the people that can hear them are interested in listening to what they are playing. Hence for multi-person situations they have fallen out of favour with preference given to personal-devices, such as headphones.

Over-ear Headphones

The oldest style of headphone encapsulates the entire outer ear with a padded cup to contain and seal the sound within that cavity against each ear. This allows for full stereo separation and large speaker elements providing the best audible spectrum of sound reproduction whilst fitting over the widest range of ears possible, however they are bulky, can be heavy and have poor air circulation over the ears leading to discomfort when worn for longer periods or even short periods in hot or humid conditions. Optimally suited for temperature controlled environments where accurate sound reproduction is a priority and where isolation from the outside world is desirable. Unsurprisingly these are the optimal interface for podcasting and radio and many prefer these in noisy open-office environments.

In-ear Headphones

These are sometimes called bud headphones, ear-buds, and in-ear monitors and have become the most widely used of all headphones due to their low cost, portability and disposability. However other than IEMs they do not fully seal the ear canal which causes sound to leak out such that passers-by will hear some audio. For sealed ear-buds as they exert some outward pressure on the ear canal, whilst they can far better isolate outside sounds they are also less comfortable for longer wearing durations. The variable nature of individual ear-canals can make in-ear headphones problematic being either too loose or too tight for some individuals. Some model include different tips, foam or silicone or for some models allow moudling services to perfectly match the wearers own ear canal. In addition the significantly smaller speaker possible by in-ear designs restricts accurate sound reproduction particularly at low frequencies. More recently fully wireless ear-buds have gained popularity due to their small size, lightweight and portable features albeit with a limited lifespan and at a significant cost.

Bone Conduction Headphones

A newer technology consisting of two pads that press gently against the temple above and in front of the ears that vibrate at complex frequencies, accounting for changes in sound due to bone and fluid density impacting resonance in the skull beyond 10kHz. At low volume levels the sound is imperceptible to the passer-by however as volume increases they are not silent to non-wearers. As they do not impede the ear canal nor the ears themselves in any way, the wearer can clearly hear the world around them and these are widely considered the optimal interface for listening to audio as a background sound without disturbing others nearby whilst still maintain full awareness of surroundings. Examples include busy city streets and some work environments where interruption by others is a job requirement.

Touch

Touch output from devices to a person is most common via haptics, vibrations, or in more advanced interfaces intended for the visually impaired via a braille device such as a Refreshable Braille Display. Whilst RFBs have limited use for those that are visually impaired and can read braille they are extremely useful. Those that can read Braille can become extremely fast and can average between 125 and 200 words per minute. Whilst this is still slower that sighted word reading rates that average just above 200 it’s close enough to demonstate that Braille is an effective replacement for the visually sighted word however it is not faster to consume.

The first pagers popularised by Motorola in the 1980s use a rotation vibration mechanism to indicate when a paged message was received, altering the wearer to the message after which they would call back that provided number (if given) via the nearest landline phone. As mobile phones gained popularity they also added this and introduce different stepped vibrations to indicate different events, like an SMS message vs an incoming phone call or an alarm. Haptics are driven by linear actuators and can be more precisely controlled and are increasingly found in smartwatches.

The main advantages of haptics vs rotational vibrational messaging is the reduction in noise generated, though the complexity in the haptic sequence can allow for more subtlety in messaging types. The amount of information that can be conveyed is extremely small and therefore only useful for notification messaging of events.

Neural

Technically not a sense but rather the brain that collects and makes sense of the senses, however progress in direct neural technology is improving each year. Current technology allows a user to train a cursor to move by thinking (as mentioned in Part 1: Input), though it is slow and inaccurate it’s inevitable that neural interfaces will someday allow all sensory information to be provided directly out to controllers or computers rather than via our body physically.

Future inflection point will be when that interface performs the equivalent function of our body via gestures or touches without the need for us to move. At that point we may well truly be in the simulation because we won’t be able to tell the difference. Will be both interesting and concerning when that happens.

Devices

For final conclusions about a subset of devices may people use, relative to optimal interfaces for inputs and outputs, refer to Part 3: Devices.

TechDistortion

Optimal Interface Part 2: Output