Skip to main content

Vista and AIMLBot

James Ashley has emailed me to let me know that he has used my AIMLBot library in his Sophia project featured on the Code Project website.

What sets this project apart is James's use of the new speech recognition and synthesis libraries that come with Windows Vista. His excellent article demonstrates a framework for interacting with AIMLBot chatter-bots and classic Z-machine text based adventures. This literally means that you can talk to the bot and the bot will talk back to you (or you can talk and listen your way through classic interactive fiction like Zork – a truly noble achievement!).

As James explains:

"The Sophia project is simply an attempt to bring speech recognition and synthesis to the text-gaming experience. With Microsoft's speech recognition technology and the API provided through the .NET 3.0 Framework's System.Speech namespace (formerly SpeechFX), not only is the performance fairly good, but implementing it has become relatively easy."

Unfortunately for me, I'm holding off buying Vista until Apple releases their Leopard operating system. I'll then purchase a new laptop (probably a MacBook Pro) and install both operating systems (and Linux too) to give me as comprehensive a development platform as possible (especially useful for testing web-based applications and various Mono projects).

Until then I'll have to wait with twitching fingers anxious to get my hands dirty playing with the code demonstrated in James's project.

The Guide for the Perplexed

This month's issue of Linux Format magazine features some work of mine: The Guide for the Perplexed – a small e-book for those wishing to get started with Linux.

They describe it as:

"... an excellent introduction to Linux [...] a great guide. It's 70 pages of easy-to-read text, along with some excellent screenshots."

In 2002 I moved to Shropshire in order to start an MSc in Computing at a local university. One of the first things I did was seek out my local LUG (Linux User Group) to meet like minded technophiles.

Soon after joining I wrote the Guide for the Perplexed for new Linux users and beginner members of the group. Unfortunately, it is very out-of-date and should be revised to include the latest-and-greatest developments.

Perhaps when I get more time… (the source file for OpenOffice can be found here)

TalentTool and Technology

Since releasing the open-source version of Program# I have been planning and developing the commercial version, writing a web-based expert system and working on the TalentTool project.

This post is a summary of my vision of how TalentTool works, how various emerging web-based technologies are incorporated into the application to support this vision and the progress made so far.

The Vision (How it Works)

First, TalentTool.com is designed to be simple and helpful so managing candidates is easy. This is manifested in several ways:

  • The design of the user interface is clean and uncluttered (see the mock-ups described below).
  • Important core processes (the famous standard and reverse funnels) have been reduced to only the most essential tasks and have been implemented as such. However, the system is flexible enough to allow added complexity should this be required.
  • Navigation through the different processes and between the various pages of data is intuitive:
    • Context sensitive menus always display the most useful links to pages in the system that are related to the page you are currently viewing.
    • It should be possible to get from any page to any other in the system within three mouse clicks.
    • Links are always descriptive and take you to where you expect.
  • A page's URL makes sense and is easy to remember (no more addresses such as:
    http://talenttool.com/dp/0262610744/ref=pd_sim_b_3/203-6257453-2344762).

Secondly, in addition to being a traditional candidate management system for HR departments, TalentTool is designed to inhabit an online world that commentators such as Michael Arrington describe as a job board bubble. He explains how various sites are offering "job listing" applications that plug into niche websites / blogs for targeted job advertising (for example 37 Signals have a bespoke job-board advertising web-development jobs). This is evidence of the Long Tail and a renewed interest in simple web-based applications within the recruitment sector. Now, TalentTool isn't a job board (although it could be used as one) – rather it fulfils the role as candidate / talent management system for all the applications and inquiries advertised on these niche job boards. Furthermore, my to-do list includes a means of feeding job-listings managed by TalentTool to the appropriate blog-based job boards (as explained below, ATOM / RSS and Microformats will play a big part in this).

Finally, through the use of REST and Microformats I intend TalentTool to provide an API as a limited form of service oriented architecture. By this I mean external third parties should be able to create mashups from the publicly available information provided by the site (such as job listings or candidates who have designated that their resume be publicly searchable). Put simply, just as TalentTool might want to push information out to third party blog based job-boards using ATOM and RSS, such sites might also want to pull information from TalentTool with REST and Microformats in such a way that they can include it in their site.

The Technology

I've mentioned various new (and not-so-new) web based technologies in the section above. The reasons for choosing to use them (rather than keeping with the regular way) are not the result of blindly jumping on some sort of band-wagon for the latest "shiny" new technology. Rather, each technology adds to the value of TalentTool and helps to differentiate it from the entrenched "corporate" products currently in use.

So, what are these technologies and what is their impact on TalentTool?

REST

Roy Fielding (the originator of REST) explains:

"Representational State Transfer is intended to evoke an image of how a well-designed Web application behaves: a network of web pages (a virtual state-machine), where the user progresses through an application by selecting links (state transitions), resulting in the next page (representing the next state of the application) being transferred to the user and rendered for their use."

In plain English this means: web-sites should act so that things (referenced by a URL) are changed by actions (usually the HTTP standards GET, POST, PUT and DELETE methods).

An excellent summary of this concept (in layman's terms) is the article How I explained REST to my wife.

Alternatively, consider this example from real-life. I store a collection of my favourite web-pages (also called bookmarks) at a site called del.icio.us. You can see my bookmarks by going to http://del.icio.us/ntoll. Notice that the URL is that of del.icio.us followed by my user-name – this is me represented by del.icio.us. By clicking on the link you are (HTTP) "GET"ting my list of bookmarks. Furthermore, say you want to see what bookmarks I have filed under the subject "REST" then visit http://del.icio.us/ntoll/REST. Of course, when I've authenticated myself to del.icio.us I have the authority to change my bookmark list by (HTTP) "POST"ing new web-sites to http://del.icio.us/ntoll (for example).

To summarise, RESTful sites use simple and logical URLs to refer to items in the application (they are, in effect, a sort of electronic noun) and provide a limited vocabulary of verbs (HTTP methods in brackets) that allow you to view (GET), add (POST), update (PUT) and delete (DELETE) items.

As a result TalentTool will have URLs like this:

  • talenttool.com/dashboard – the user's homepage.
  • talenttool.com/jobs – the current user's jobs or (if un-authenticated) a list of the most recent jobs.
  • talenttool.com/candidates/smith – a list of all candidates in the system with the surname Smith.
  • talenttool.com/candidates/skills/REST – a list of all candidates with skills and knowledge of REST based web-applications.
  • talenttool.com/jobs/2354/candidates – a list of all candidates for the job with reference number 2354.
  • talenttool.com/candidates/45765 – the candidate with the reference number 45765.
  • talenttool.com/jobs/2354/events/meetings – all the meetings for job number 2354.

This is both intuitive and simple – allowing people to work easier and faster.

Microformats

As the website explains:

"...[M]icroformats are a set of simple, open data formats built upon existing and widely adopted standards."

Others have described them as "simple conventions for embedding semantics in HTML to enable decentralized development" and "codification[s] of convention".

For the non-technical this means that web-pages can use Microformats to describe the information they contain in both a human and machine form (currently only we humans understand the information in web-pages). This is often described as being the semantic web (or euphemistically Web3.0).

The W3C (Tim Berners-Lee et al who write the specifications for how the web work) invented several complicated ways of embedding semantics within web-pages. Unfortunately, these technologies have not been enthusiastically embraced (if at all) by web-developers. Microformats – although less powerful than the W3C's technology – have the advantage of being based on current best practices, simple and easy to learn. As a result industry heavyweights are advocating their use and many companies and organisations are starting to implement them.

Why is this so important for TalentTool? Well, there is already a microformat for your CV/resume and work is under way for a microformat for job-listings. This means that when browsing a page on TalentTool your software will be able to automatically import contact details into your address book, add meetings and other events to your diary and make use of all sorts of other useful machine-readable data found on the page.

Interestingly, when combined with a REST based architecture, TalentTool provides a means for third-party software to access content. For example, should an application GET the URL talenttool.com/jobs/London it will retrieve a list of all the currently open jobs the system is managing in London. The application can use the job-listing microformat to consume information about various aspects of the roles such as their job titles, benefits packages and required skill sets.

ATOM / RSS

Both ATOM and RSS are examples of web feeds that publish up-to-date summaries of the contents of a web-site. In the case of TalentTool, job-listings are likely to be syndicated by web-sites or third-party feed-reader programs that subscribe to the various feeds on offer.

Such feeds contain entries that can be anything from news items (a headline, date, and summary for example) and blog entries to podcasts and video diaries. Such feeds contain links back to the source of the content and often include additional meta-data such as semantic markup provided by Microformats.

As mentioned before, this is a means of pushing job-listings out to third parties (thus adding to the value of the site and differentiating it from regular talent management applications).

Ultimately, these technologies allow TalentTool to be placed within a decentralized yet very dynamic system for advertising jobs and hiring candidates. Truly an example of the whole system being greater than the sum of its parts.

The Story So Far…

I have completed:

  • A comprehensive functional specification.
  • A database schema and implemented the appropriate object-relational classes (although this needs checking and updating in light of my decision to use microformats).
  • HTML based mock-ups to act as templates for what will be the dynamically created user interface.

Nota Bene: I am not a designer – the look and feel of the site will change when I get someone with graphical-design flair to update the cascading style sheet (all the pages are XHTML (strict) and the visual presentation is entirely done with style sheets). The mock-ups also link to each other correctly (although some parts have not been done and the "create job" and "create candidate" pages have not been finished yet). The pages are static examples and contain mock data marked-up using Microformats. Should you have the Tails or Operator extensions for Firefox you will be able to view and export the encoded data into third-party applications.

This leaves the following tasks to be completed in the short-term:

  • Usability testing – using the mock-ups with Human Resources professionals to check that the application works in the way that they expect. This is currently under way.
  • Finish the test plan – based upon the functional specifications and the feedback obtained from the usability testing.
  • Alpha build – using Ruby-on-Rails for speedy prototyping and development. Ruby-on-Rails was chosen because as of version 1.2 I can easily implement a REST based architecture.

Thus, TalentTool encapsulates the story of my life. Too much to do and so little time to do it in. :-)

Philosophy and Natural Language Processing

Engaging in philosophical analysis is an essential (and difficult) activity for shedding light on those aspects of a problem that do not obviously fall within the realm of software engineering. This is especially true when trying to understand concepts such as "meaning", "understanding" and "thinking".

This article provides a flavour of philosophical analysis by engaging with two chat-bot related problems.

Can Machines Think?

Most people assume that my motivation for developing chat-bot technology is to create a "thinking" AI such as can be found in many a Sci-Fi movie. I'm afraid nothing could be further from the truth. I am simply perpetrating a trick and attempting to pull the wool over user's eyes. Unfortunately, the trick demands great skill to pull-off and (as yet) no one has ever performed it successfully.

The trick is called the Turing Test and is named after Alan Turing, the British mathematician who first devised how it should work:

A human judge engages in a natural language conversation via a computer terminal with two other parties, one a human and the other a machine; if the judge cannot reliably tell which is which, then the machine is said to pass the test and the trick is a success.

It is proposed as a means of deciding the question, ‘Can machines think?' Unfortunately, passing the test only proves one thing: that humans can be fooled by computers. That there is an absence of "thought" is often illustrated by the famous Chinese Room argument proposed by the American philosopher, John Searle.

Put succinctly, Searle claims that minds cannot be identified with computer programs because computer programs are defined syntactically in terms of the manipulation of formal symbols whereas the mental world has meaningful semantic content. As Searle explains:

"Imagine that I, a non-Chinese speaker, am locked in a room with a lot of Chinese symbols in boxes. I am given an instruction book in English for matching Chinese symbols with other Chinese symbols and for giving back bunches of Chinese symbols in response to bunches of Chinese symbols put into the room through a small window. Unknown to me, the symbols put in through the window are called questions. The symbols I give out are called answers to the questions. The boxes of symbols I have are called a database, and the instruction book in English is called a program. The people who give me instructions and designed the instruction book in English are called programmers, and I am called the computer. We imagine that I get so good at shuffling the symbols, and the programmers get so good at writing the program, that eventually my ‘answers' to the ‘questions' are indistinguishable from those of a native Chinese speaker. [...] I don't understand a word of Chinese and – this is the point of the parable – if I don't understand Chinese on the basis of implementing the program for understanding Chinese, then neither does any digital computer solely on that basis because no digital computer has anything that I do not have." [From Searle's autobiographical entry in "A Companion to the Philosophy of Mind"]

One need only replace the processing of Chinese characters with the text processing of ALICE or any other chatter-bot currently in existence to accurately describe how these systems work: they are no more than formal symbol manipulators without any meaningful or coherent understanding of the world in which they are placed nor do they have any understanding of the content or meaning of the symbols and data that they are manipulating.

It is certainly hard to argue with Searle's conclusion unless it can be shown beyond reasonable doubt that:

  1. The physical structure of the brain is sufficiently understood to explain how mental processes arise, and
  2. The means by which this structure is modelled artificially does not reduce to formal symbol manipulation, OR
  3. The way the physical structure of the brain gives rise to mental processes is, in fact, shown to be reducible to formal symbol manipulation (i.e. Searle was wrong).

However, the power of this argument diminishes if one introduces what I shall call "scope". In computer science "scope" denotes the context and visibility of an identifier (the name of some sort of asset that the program is using). An identifier is created within an enclosing block of code (that may contain inner blocks). The identifier is visible (usable) within the enclosing block (and any inner blocks). Outside the enclosing block the identifier is said to be out of scope; meaning that it is not available to the rest of the program (i.e. it is invisible). Interestingly, "scope" also signifies an instrument for observing something with a particular focus (microscope, telescope etc). When I use "scope" I mean something like the two definitions mentioned above: focusing on an object or concept within a specific context with clear boundaries.

Scope can be applied to Searle's Chinese Room in the following way: the context is the room (which certainly has very clear boundaries) and the focus is specifically on the person in the room.

Now, problems arise because although the person in the room does not understand Chinese, Searle does not address the view that the "system" taken as a whole and composed of the person, rules, symbols and so on appears to understand Chinese.

With this view the scope has changed in the following way: the context is the world containing Chinese speakers and the focus is the hatch relaying meaningful Chinese characters in and out of the room. In fact, the original person in the room is now "out of scope" (to borrow liberally from computer science). This means that it doesn't matter what is inside the room to a Chinese speaker on the outside. The room might contain an instance of Searle's experiment or a very shy Chinese speaker. A person conversing "with" the room won't be able to tell and probably won't care.

To turn this example upside down one might argue that individual neurons are no more than electro-chemical relays with no meaningful or coherent understanding of the world in which they are placed nor do they have any understanding of the content or meaning of the signals they are transmitting. Yet the "system", taken as a whole brain and nervous system seems to be capable of understanding and producing meaningful conversation. The physical world "out there" remains exactly the same but the scope (how we choose to look at the physical world) changes the way we describe it.

In both examples above, due to a change in "scope" meaningful conversation arises. How such conversation comes about might be different, but one cannot deny that there is conversation.

Meaning

Generating (at least) the appearance of a meaningful conversation is central to fooling a human user conversing with a chat-bot.

How does one tackle the problem of meaning? What is it? How does it arise? How might one devise strategies to bring about the appearance of meaningful conversation?

Engaging with this area certainly opens a huge philosophical can of worms (that I hope to explore in later articles). Nevertheless, for the sake of providing an example, I will examine one potential strategy for tackling these issues suggested by Ludwig Wittgenstein and succinctly introduced in a quote from the opening remark of his Philosophical Investigations:

"Now think of the following use of language: I send someone shopping. I give him a slip marked ‘five red apples'. He takes the slip to the shopkeeper, who opens the drawer marked ‘apples', then he looks up the word ‘red' in a table and finds a colour sample opposite it; then he says the series of cardinal numbers—I assume that he knows them by heart—up to the word ‘five' and for each number he takes an apple of the same colour as the sample out of the drawer.—It is in this and similar ways that one operates with words—"But how does he know where and how he is to look up the word ‘red' and what he is to do with the word ‘five'?"—Well, I assume that he acts as I have described. Explanations come to an end somewhere.—But what is the meaning of the word ‘five'?—No such thing was in question here, only how the word ‘five' is used."

Wittgenstein is claiming that to discover the meaning of a word one simply examines how it is used in ordinary conversation. Put another way, words are not defined by reference to the external objects or things which they designate nor by the internal thoughts, ideas, or mental representations that are associated with them, but rather by how they are used in effective, ordinary communication as part of our everyday life.

This way of thinking about meaning generates some interesting implications for current chat-bot technology:

AIML based bots such as Program# are severely limited in potential as they don't include a mechanism for the bot to "learn" new examples of meaningful conversation from respondent's use of language in previous conversations. The bot will always only generate meaningful responses for the limited set of situations that the bot's administrator has accounted for (although Zipf's Law and the original Alice bot show that this isn't as limited as it might first appear).

Any bots that implement learning algorithms (such as MegaHAL or Jabberwacky) are limited because the source for learning new meaningful conversation is limited to a one-dimensional stream of characters (i.e. typed sentences). The bot is simply processing new patterns of characters rather than experiencing a man counting five red apples and seeing how certain words have meaning in such a situation (to continue Wittgenstein's example).

Conclusions

Machines cannot think – they can merely give the appearance of meaningful conversation.

On reflection, wondering if machines can think is like asking "do snowflakes dream?" It is a nonsensical question because thought is not an attribute of machines. Only humans "think" when one considers humans within a particular scope.

The problem is that "can machines think?" is a grammatically correct sentence and our culture is full of examples of imagined "thinking" machines (Hal9000 and friends). As a result, the question seems both legitimate and important.

However, I can imagine talking mice, a frightened teddy bear and a Gruffalo with a wart on the end of his nose. I can discuss these imagined things with grammatically correct sentences that make perfect sense yet I don't take them seriously because I understand that they are simply figments of my own or other's imagination. Philosophical analysis confirms that "thinking machines" are a figment of my own and other's imagination. Nevertheless, the potential for a machine to produce the impression of meaningful conversation is certainly possible – although we'll probably have to invent a new term to describe what it is doing when it is generating such meaningful conversation (not thinking).

Finally, with regard to understanding "meaning": perhaps the solution is simply to examine and pin-point how the word "meaning" is used in everyday conversation (i.e. use Wittgenstein's strategy to discover the meaning of "meaning" itself).

Program# 2.0

My original .NET chat-bot was written over three years ago and was based upon the AIML specifications. It was also my first project in C# and became a vehicle for me to learn about the .NET platform.

Although many people found it useful (it has been downloaded many thousands of times from this website) it was slow, not completely reliable and lacking in features.

Now that I have extensive experience and knowledge of .NET, I have re-written this project from scratch to implement several modifications and improvements. These are:

  • Better cross-platform compatibility. Support for .NET 1.1, 2.0 and XNA as well as the open source MONO project (tested under version 1.1). Testing on Windows Vista with version 3.0 of the .NET platform is pending.
  • A completely new modular architecture to make it easier for developers to extend and add functionality.
  • A simpler and more logical API.
  • Standards compliant AIML support with the option for custom tags.
  • Very small size (currently only 52k).
  • Very fast (over 30,000 categories processed in under a second).
  • Inclusion of a comprehensive test suite including over 200 unit tests (based upon nUnit).
  • A means of saving the bot's "brain" as a binary file (Graphmaster.dat).
  • Some simple code snippets and examples for developers to get started (simple windows and console based applications as well as a sample custom tags library).
  • Appropriately commented code.
  • Comprehensive documentation.

The project now has a home on Sourceforge (the open source development site) that provides bug tracking, documentation, mailing lists, public forums, source control (subversion), file space and project management capabilities (among other things).

Downloads, documentation, help and advice on implementing and using the library in your own projects can be found at the Sourceforge project page:

http://aimlbot.sourceforge.net/