Contextual Hierarchies

29 July, 2013 04:46PM ยท 4 minute read

Artificial Intelligence has fascinated me for a very long time and how it applies in the real world with current technology bears some discussion. For me the ultimate goal is to achieve fully interactive conversation with a computer: to ask it questions with a wide variety of contexts and have it respond correctly. Some may question the need for such an advanced interactive interface at all however simpler commands are steps along the same road.

The biggest barrier to interpretation of intent is understanding context. Without the technology to tap directly into the cerebral cortex the next best solution is that which people rely on day to day: context. In relationships that exist we have a historical context about the person we talk to: what they like and don’t like such that we are guided in how to respond. For computers to achieve similar results they must have the first level of contextual knowledge: about the person they are talking to. For this they must have absolute knowledge of who they are interacting with. To do this we consider that in the case of a computer or smartphone, just because a person typed in the right password (if there was one at all) it doesn’t mean that they are the same as the person who is logged in as the current registered user on the system. (To say nothing about systems with generic logins) For the purposes of this discussion, assume this problem has been resolved and we definitively know the identity of the user.

I see an individuals context in multiple layers. Clearly the first is the facts about them: What they look like, who their friends or family are, what sports they follow and so on. For a computer to understand the context of a sports question it goes beyond simply understanding what was said. On that note, technology has progressed significantly however converting spoken words into written text is still not 100% accurate. Again for the purposes of this discussion, assume this has been resolved and all that is spoken is understood correctly by the computer.

Computer: “What’s your favourite sport?” Human: “Cricket.”

At this point the computer can add this information to a ‘personal context database’ and can derive their current location based on geolocation (again assuming this is possible) and can then respond as another knowledgeable human might by saying, “The Poms are giving the Aussies a hiding in the Ashes test aren’t they?” To assemble this response it requires knowledge that:

The local computer can only locally store information about the person and can use location to reduce the seek time for current events and subject specific information such as that from Wikipedia or similar online resource which would require a low-latency internet connection (the faster the better).

The layers in the contextual hierarchy in order to strike up a casual conversation about sport with a computer are: Personal, Subject Specific and Current Events. Each database must be cross-referenced against location and must be kept current at all times to be truly useful. Current Events will change constantly whereas Subject Specific matter will change much less quickly. The problem with publicly collected data is that ‘facts’ vary from country to country and based on the person(s) inputting them in the first place. The simplest example is a pie. If you’re in Australia a Pie usually means a hot Meat Pie, whereas in certain parts of the US it means a Pizza, but other parts of America means a cold Pie that usually doesn’t contain meat. Hence facts need to be duplicated in many cases to account for regional differences and can not simply be ’learned’ by an automated system.

The point of this is that there is no easy answer. It’s not just how much data you hold it’s about how well it’s organised. Google Now and Siri are Google and Apples attempts to bring AI to the mainstream with each system giving mixed results. There is no doubt in my mind that for basic tasks like setting a timer they work fine as context is extremely limited and curation isn’t necessary, but for anything approaching conversational contextual understanding, we are a very, very long way off.