Herein you’ll find articles on a very wide variety of topics about technology in the consumer space (mostly) and items of personal interest to me.
If you’d like to read my professional engineering articles and whitepapers, they can be found at Control System Space
To be released shortly
To be released shortly
This article is posted in conjunction with Episode 93 of Pragmatic.
I’ve been fortunate in recent years to have tried the vast majority of consumer user interfaces and also the software running on each platform that’s widely regarded as best in class for each interface. I’ve written previously about going Back To The Mac and spoken about using a Microsoft Surface Pro and even tried going Phoneless with just an Apple Watch.
One aspect of my job has been user interface design, conceptualisation and controls and in this series of posts I’d like to explore inputs, outputs and devices in turn, looking at what has worked well and why I think that is as well as what the next inflection points might be.
Part 1: Input
Input to a device from a person must be in a form the person can send to a device and hence has to be via a mechanism we can perform via:
We shall exclude attempts to convey meaningful information utilising smell by projecting a scent of some kind since that’s not a trick most people can do and likewise for taste.
The first popular device to perform control inputs from sound was the Clapper. “Clap on, Clap off” to turn lights on and off. Spoken word has proven to be significantly more difficult, with many influencing factors: local accents, dialects, languages, speaking speeds, slurring, variable speech volume and most difficult of all: context. The earliest consumer products that were effective were in the early 1990s from Dragon Dictate, that used an algorithmic approach that required training to improve the speed and accuracy of the recognition. Ultimately algorithmic techniques plateaued until machine learning, utilising neural network techniques finally started to improve the accuracy through common language training.
Context is more complex as in human conversation, we infer much from previous sentences spanning minutes or even hours. For speech input to track context requires consistently high recognition accuracy and the ability to associate contexts over long periods of time. The reliability of speech recognition must be consistent and faster than other input methods or people will not use it. Sound commands are also not well suited in scenarios where discretion is advised, nor in noisy environments where isolating a subject is difficult even in a human conversation, let alone for speech detection by software.
Despite improvements the Apple Siri product ‘feature’ remains inaccurate and generally slow to respond. Amazon Alexa, Google Assistant and Microsoft Cortana also offer varying degrees of accuracy with heavier use of Machine Learning in the cloud providing the best results to date at the expense of personal privacy. As computational power improves and both response time and accuracy improves sound will become the preferred input method for entering long form text in draft (once it keeps up to average human speaking rate of about 150 words per minute) since without additional training on a physical keyboard this is faster and more convenient. Also once these things improve it will also be the preferred method for short commands, such as turning home automation devices on or off for example, for scenarios where no physical device is immediately accessible.
Touch involves anything that a person can physically push, tap, slide across or turn and encompasses everything from dials to mechanical sliders, to keyboards to touch screens. Individual buttons are best for dedicated inputs whereby that button represents a single command or very similar command, with a common example of a button grid being a keyboard.
Broadly touch can be grouped into either direct or in-direct. Examples of direct movement include light pens, resistive and capacitive touch screens. Light pens needed the user to hold them and they were tethered, slow, and weren’t very accurate. Resistive Touchscreens still needed a stylus to be accurate although some could use the edge of their fingernail, however the centre of a finger wasn’t very accurate. It was also not possible to detect more than a single touch point at a time. Capacitive Touch had better finger accuracy and allowed multiple finger touch detection simultaneously which allowed for pinch and other multi-finger gestures. Although no stylus was needed, to achieve high levels of accuracy a stylus was still recommended.
Indirect inputs include keyboards and cursor positioning devices such as mice, trackpads, trackballs and positioning sticks. Keyboards mimicked typewriter keyboards and have remained essentially unchanged from the first terminal computers through personal computers, apart from preferences for some key-switch mechanisms between users little has changed in decades.
Cursor pointing devices allow for precise cursor positioning with the ability to “nudge” a cursor which is not possible without zooming on a touch interface.
Hence for precision pointing, indirect methods are still more accurate than a stylus due to “nudging”. However precision pointing is generally not a strict requirement for most users in most applications. Non-precision pointing therefore for most tasks benefit from the simplicity of direct touch, which is faster and requires no training making direct touch the most accessible method.
For bulk text input, physical keyboards remain the fastest method however training is necessary to achieve this. Keyboards will remain the preferred bulk text data entry method until speech recognition improves noting that the fastest English typing speed record on a computer is 212 wpm in 2005 using a Dvorak simplified keyboard layout. The average typing speed is about 41 words per minute, hence speech recognition that’s any faster than this at a high degree of accuracy will be the preferred dictation method in most use cases.
Movement requires no physical connection of the body to the input device and includes gestures of different parts of the body. Some early technology like the Playstation Move ball was a recent example where the user held a device that wasn’t tethered to the machine but directly tracked their movement. Other examples are in Virtual Reality systems that use a handheld controllers with gyroscopes and accelerometers for tracking movement of hands and arms.
The most popular natural free-standing movement tracking device so far has been the Microsoft Kinect that was released for both the PC and the XBox. The movement tracking had issues differentiating backgrounds and was thrown off by people walking past, in front of or behind those people it was tracking at that time. The room size and other obstructions also created a challenge for many users whereby in order to use movement tracking reliably couches, chairs and tables needed to be moved or removed in order to accommodate a workable space within which it would function reliably.
This form of movement tracking is useful for individuals or small groups of people in enclosed environments with no thoroughfare, though the acquisition time of precise positioning even with an Xbox One Kinect 2, was still too slow and the Kinect 2 was discontinued in 2017. The newest development kit for the next generation of Kinect is the Azure Kinect which was announced in February 2019.
Current technology is still extremely inaccurate, easily confused and immature with a limited set of standalone use cases. Extremely accurate natural free-standing position tracking is unlikely to be useful as a mass input device, however in conjunction with speech recognition could provide vital contextual information to improve command interpretation accuracy. It also has applications in noisy environments, where an individual is isolated in front of a device such as a television and wishes to change channels with a gesture without using a phyical remote control.
Brain Computer Interfaces (BCIs) allow interaction through the measurement of brain activity, usually using an Electro-Encephalography (EEGs). EEGs use electrodes placed on the scalp and are cheaper and less intrusive than a Functional MRI (fMRI) that tracks blood flow through different parts of the brain and whilst it is more accurate it is not straightforward.
In the Mid 1990s the first neuroprosthetic devices for humans became available, but they took a great deal of concentration and the results were extremely difficult to reliably repeat. By concentrating intensely on a set thought it was possible to nudge a cursor on the screen in a certain direction, however this wasn’t very useful. In June 2004 Matthew Nagle had the first implant of Cyberkinetics BrainGate to overcome some of the effects of tetraplegia by stimulating the nervous system. Elon Musk invested $27M USD in a company called Neuralink in 2016 that are developing a “neural lace” to interface the brain with a computer system.
It remains extremely dangerous to interface directly with the brain however in order to become useful in future it is necessary to explore since the amount of data we can reliably extract from sensors sitting on our scalp is very limited due to noise and signal loss through the skull. We therefore need implants to directly connect with neurones before we can get data in and out at any rate that will ever be useful enough to overtake our conventional senses.
Attempting to guess how far off that inflection point is at this moment is extremely difficult. That said, when it comes it will come very quickly and some people will decide to have chips implanted and that will allow them to out-perform other people for certain tasks. Once the technology becomes safer and affordable, even then there will always be ‘unenhanced’ people that choose not to have implants however mass adoption might still take a long time depending on rewards vs the risks.
Despite many claims, no one really knows exactly how fast a human can think. Guesstimates are somewhere between 1,000 and 3,000 words per minute as our brains refer to speech however this is very broad. In terms of writing as a task, there’s word-thinking-rate but then when you’re writing something conventionally you will be reading back, reviewing, revising and rewriting as these are key parts of the creative process, otherwise what you end up with is most likely either gibberish or just not worth publishing.
Beyond that there’s an assumption that descrambling our thoughts is possible to do coherently, though more than likely some training will likely be necessary in the same fashion in which we currently have to rephrase our words for a machine to interpret a command initially at least re-ordering our thinking might be required to get a usable result. All this plus multi-lingual people may think words in a specific language or mix languages in their thinking, and how a neural interface could even begin to interpret that is a very long way off and not in our lifetimes most likely.
More in Part 2
Next we’ll look at outputs.
It’s been a long series of experiments beginning in the mid-2000s when I moved from Windows Vista to MacOS Tiger, then to the iPad in 2011 running iOS, back to Windows 10 on a Surface Pro 4, back to an iPad Pro in 2016, trying a sole-Apple Watch LTE as my daily device and finally now back to a Macbook Pro Touchbar running Mojave.
Either I’m completely unprincipled in the use of technology, or then again perhaps I’d prefer to think of myself as being one of the few stupid and crazy enough to try every different mainstream technological option before reaching a conclusion. Whilst I admit that Everything is Cyclic it is also a quest for refinement. Beyond that sentiment naturally as the field of technology continues to evolve, whatever balance can be found today is guaranteed not to last forever.
If you want the TL;DR then skip to the Conclusion and be done with it. For the brave, read on…
Critical Mass for Paperless
Ideally computers would replace paper and ink for communicating ideas in smaller groups in person, and replace overhead projectors and whiteboards as well for larger groups, but they haven’t. The question is simply: which is easier?
We are all able to pick up a pencil and write as we are taught to at school and despite typing being an essential skill in the modern world, many people can not touch type, and with keyboards on small glass screens now all non-standard sizes, even that 80s/90s typing skill presents difficulties for skill level equalisation among the populace. (I’m now beating most 15-25yr olds in typing speed tests as they’ve learned on smartphones, away from standardised physical keyboards)
The iPad Pro with the Apple Pencil represented the best digital equivalent of an analogue pen or pencil and hence for nearly 2-1⁄2 years now, I have not needed to carry an ink-based pen with me. At all. An an engineer I’m not interested (generally) in sketching and whilst that’s something I can do I’m not particularly good at it, so I use the Apple Pencil to take notes. Unlike an ink pen on paper notes though, I can search through all of my notes easily with handwriting recognition.
The use of iPads for this purpose has increased significantly in our office (no, not entirely because of me though I was the first I am aware of to do that in our office), and it has increased because it is so much better than ink on paper. The amount of photocopier and scanner usage has dropped significantly and it’s only a matter of time before there is a transition away from them altogether. Like the fax machine shortly there will be one photocopier per floor, then one for the building, and then none at all in a matter of a decade.
The paperless office may finally arrive; a few decades behind schedule, but better late than never.
Fighting the Form Factor
A term I’ve come across in programming is “Fighting the Framework” which is meant to illustrate that Frameworks and APIs are written with an intent, with data structures, methods and objects within all cohesively designed around a specific model, view and/or controller, inter-object messaging and so on. If you choose to go around these structures to create your own customised behaviours, doing so represents significantly more work and is often far more error-prone as you are going against the intended use and nature of the frameworks.
I’d like to propose that there are people that love technology that are obsessed with taking devices with a specific form factor and making them “bend” to their will and use them in ways that fundamentally conflict with their design intention. Irrespective of whether you believe pushing the boundaries is a good practice or not, there are limits to: what is possible; what is practical; and what can be expected realistically when you fight the form factor.
Examples include the commentary around the iPad or tablets in general, still “just being a tablet” meaning that they are predominantly intended to be used as consumption devices. Of course that’s a reductive argument since content comes in many forms, written, audible, visual at a very basic level, and within each there are blends of multiple including newspapers, comic books, novels, TV Shows and Movies. The same argument works in reverse whereby according to the currently popular trope, it’s “too hard” to create content on a tablet and therefore it is and can only be a consumption device.
The fundamental structure of the iPad (iOS more specifically) and the constraints of a single viewport, the requirement to cater for the lowest common denominator input device being a human finger makes the form factor difficult to directly copy ideas and concepts from desktop devices which have 20 years or more of trial, error and refinement. As time goes on more examples of innovation in that space will develop for audio (eg Podcast Audio) Ferrite and video Luma Fusion and although these will not satisfy everyone, only a few years ago there were no equivalent applications on iOS at all.
In the end though there is no easy way for the iOS form factor (both physical and operating system) to permit certain important, proven aspects to all a specific class of application designs and use cases. For these unfortunate classes, fighting the form factor will yield only frustration, compromise and inefficiency.
You can’t beat pixels (or points). Displaying information on multiple screens on an iOS device in a way that allows a user to display information side-by-side (or in near proximity if not perfectly aligned) and importantly to visually compare, copy and paste seamlessly between, is a feature that has existed and been taken from granted from desktop computers for decades.
On larger-screened iOS devices this feature has been added (to an extent) with slide-over and side-by-side views, however the copy and paste between the applications isn’t widely supported, comes with several caveats, but most importantly there aren’t enough pixels for a large number of side-by-side review tasks. The larger the documents or files you need side by side, the worse it is on an iPad.
iPads have supported application-specific monitor output which isn’t just a mirror of the iPad screen, however support for this is rare and bound to the application. There’s no generic way to plug in a second, independent monitor and use it for any general purpose. Then again, there’s no windowing system like on the desktop so without a mouse pointer or a touch-interface on the connected screen, how could the user interact with it?
Some have proposed in future multiple iPads could be ‘ganged’ together but apart from this being cost-prohibitive, it’s unlikely for the same reason that ganging iMacs together isn’t supported anymore (Target Display Mode ended in 2014). Beyond this no existing iPad (even if it supports USB-C) can be chained to support more than one additional monitor. If you have a laptop or a desktop currently, most support two additional displays with a combined cost of significantly less than the multiple ganged iPad Pro solution.
Scrolling and navigating around large documents is slow and difficult on an iPad with few short cuts, many applications lack search funtionality, loading large files can take a long time and there’s a lot of fast-flick-swiping to get around a document. These issues aren’t an issue on a desktop operating system, with search baked into practically every application, Page Up/Down, scrolling via scrollbars, trackpads and mouse wheels all of which are less obtrusive and overall much faster than flicking for 30 seconds to move a significant number of pages in a document.
The capacitive touch screen introduced with the iPhone and subsequently with the iPad made multi-touch with our highly inaccurate built-in pointing devices (our fingers) a reality for the masses. As an input method though it is not particularly precise and for that a stylus is required. The Apple Pencil serves that function for those that require additional precision, however pixel-perfect precision is still faster and easier with an indirect positioning mechanism like a cursor.
My efforts to make Windows work the way I needed it to (reliably) weren’t successful and the iPad Pro met a great many of my computing needs (and still does for written tasks and podcast editing). However I was ultimately trying to make the system do what I needed, when it fundamentally wasn’t designed to do that. I was fighting the form factor and losing too much of the time.
Many see working on the iPad Pro exclusively as a challenge, with complex workarounds and scripts to do tasks that would be embedded or straightforward on a Mac. Those people get a great deal of satisfaction by getting those things to work but if we are are truly honest about the time and effort expended to make those edge-cases function, taking into account the additional unnecessary friction or resistance in so doing, they would be better off using a more appropriate device in most cases.
For all of the reasons above I came back to the Mac and purchased a Macbook Pro 13” 2018 model and I have not regretted that choice. I am fortunate the my company has provided a corporate iPad Pro 2, which I use every day as well for written tasks. I feel as though I am no longer fighting against the form factor of my machines, making my days using technology far less stressful and far more productive. Which in the end is what it should be about.