Augmented Reality - A Developer's Perspective
I've been amazed at the various reactions to my last post. Since then I've been collecting my thoughts about Augmented Reality (AR). What follows is a first attempt to distil them into a coherent commentary on AR and its potential.
What is Augmented Reality?
Augmented Reality is a means of superimposing digital assets upon the "real" world as seen through an appropriately configured device.
For example, in my last post I described how I put the locations of "real world" geocaches as digital representations into the Wikitude AR viewer provided by Mobilizy.
Alternatively, an AR enabled device might recognise the face of a new acquaintance, retrieve their contact details and display them or import them into your address book. Something similar to this is demonstrated in the following video (from the Swedish company TAT):
In both cases, within an augmented reality digital assets have two essential qualities:
- A location in the real world – identified by longitude/latitude/altitude (as with the geocaches) or by some other means (such as facial recognition).
- A context to give them meaning – provided by the digital asset's representation in the augmented world.
How does it work?
The basic recipe for Wikitude is simply…
- An AR enabled device (like my Android based mobile phone) has GPS capabilities, a compass and accelerometers that enable it to work out where I am, where I'm pointing and how I'm moving / holding the device in the real world.
- Given this information it is possible to work out what digital assets are close by, if I'm facing in their direction and if the device is being held in such a way that it is looking at them.
- Finally, by capturing the output of the device's camera it is possible to add such digital assets to the image displayed on the screen of the device.
...but there is more…
Digital assets need to be understood as representing something. This can be achieved in several ways:
- The context of the application. For example, it's obvious that you're looking at geocaches in the AR view of GeoBeagle.
- Visual clues. To continue with the geocaching example, one might represent different types of cache with different icons in the AR view. One might even represent distance with alpha compositing – as an asset gets further away it becomes more transparent until, finally, it disappears. Other dimensions that could be represented include relative speed of an asset via blue/red shift and an asset's "importance" related to its on-screen size. I suspect conventions to emerge as AR technology matures.
- Layers/feeds/channels that filter assets. Imagine you're looking at a scene that is cluttered with many assets but you only want to see telephone kiosks (how ironic). One might filter out the "shops", "attractions", "hotels" and "transport" layers leaving only the "utilities" layer that shows things like public toilets and telephone kiosks. Companies should be able to make money by providing subscription based layers.
The current state (for developers)
I've only had experience of using Wikitude so I'll limit my comments to that platform.
Wikitude is beta software but if you want to play with an existing version to see how it performs then download the Wikitude World Browser application available in the Android market.
Wikitude is also in "closed" beta – meaning you'll have to register in order to get the documents and associated code / libraries. It is my understanding that eventually one will need a developer key to use the API.
Wikitude already seems to be very stable – I've not had it crash (yet) but I'm sure as more people start to use it more opportunity will exist for breakage.
The API is very simple. As I explained in my previous post, this is both good and bad. To paraphrase Albert Einstein: "As simple as possible, but no simpler". Wikitude is currently too simple (but in a good way). I'd like to be able to:
- Define the menus and associated event handlers associated with each PoI (Point of Interest – a digital asset within the AR world) so that I can customise what happens when a user clicks a specific PoI.
- Define areas rather than just points and choose how such areas are "filled" – colour, texture etc…
- Have an elegant solution for missing altitude information. For example, geocaches only have a longitude and latitude but no altitude. Wikitude assumes 0 meters altitude if none is provided so an early version of the geocaching application had nearby geocaches appearing underground (as I'm at an altitude of 157 meters). My solution was to make the world flat by giving everything within 6000 meters the same altitude as the current user – although this isn't at all ideal.
- Be able to display pathways based on data from OpenStreetMap.org (for example). It'd be good to superimpose public rights of way, footpaths and other navigation information. Something like an AR SatNav.
- Add 3d models to the AR. Imagine visiting an ancient monument and being able to see an artist's impression that could be viewed in situ. What a great educational resource that would be and Architects would find it useful on site during the pre-build phase.
Nevertheless, Mobilizy have the right attitude because "simple" is a good place to start. I can only assume they have various features up their sleeves that they'll add when finished and properly tested.
I don't want to give the wrong impression because one can already do quite a lot:
- Add points according to longitude / latitude / altitude.
- Change the label associated with the PoI and the description that is displayed when the item is tapped.
- Change the icon displayed in the AR view.
Potential
So what happens now..? I can imagine all sorts of uses for this technology and I'll work my ideas out into a blog post in the not-too-distant future.
However, I'd caution against making every location based application viewable via AR. Often the top-down Google Maps view is all that is needed. For example, why show houses for sale in AR when houses for sale (in the UK at least) always have an estate agent's "For Sale" sign placed prominently outside the property? One should only use a technology because it is the best fit for a problem, not because it is the latest and greatest.