A few times in the last year or so an IBM sales-person has come around to the company I work at spruiking products branded with the IBM 'Watson' label. To be frank, I came out of these meetings without a clear idea of what, exactly, the Watson suite of products was, or how it could be used in our business. The problem was that there were too many products sitting under the Watson brand, and the sales rep kept changing between different things.
So, I've just spent a couple of days in Melbourne, getting a better handle on what is available, what it can be used for, what it costs, etc. And here is a quick summary:
What is Watson?
I still can't definitively answer this. It seems to just be a brand/label for a range of products/APIs, focussed an analytics of some sort (speech analysis, text analysis, image analysis, sentiment analysis, etc). There are a few things on offer from IBM, and I think it helps to separate those:
- Services: IBM has a range of analytical services (language processing, image recognition, and so on), all exposed through APIs, and you pay per API request to use these. So you can submit (say) an image to the Watson image classification service and it tells you the probability that the image contains a cat (say). More on this in a moment.
- Application hosting: you can build an application that uses the IBM services, and IBM can also host that application for you. I'm not going to say more about this, as it is of less interest to me right now.
I have to say that the communication and documentation around all this from IBM is not good, and to start with it's all a bit bewildering as you work out how tightly coupled the app hosting is to the services (do you want an app, a virtual server? a container? a service? do you want to 'bind' your hosted app to your service? Too too many things and no coherent presentation of what they are and how they work together.... for example, it's still not clear to me when you would bind a service to an app and when you wouldn't.... it seems that you can use a service from an un-bound application, so maybe it's just a performance thing?). But anyway, I digress... from now on I'm just going to talk about the services, not the apps.
What Services are available? What do they do?
Below, I list the services that are available under the Watson brand, as at time of writing. There is an API that exposes each of the services, and the model seems to be that you are changed per API transaction, for each service. So for instance, there is an image classification service that you can train with some data, and then submit images to that service to get classification results. Or you can submit free text to the tone analysis service and it will return you information about the 'tone' of the language (aggressive, friendly, etc). Each such submitted sample would cost you an API call, which costs between $0.2 and $0.05 (depending on the volume of calls, you get a discount for volume). Pricing varies by service. You can get up to date info on pricing if you dig down from here.
Here are the services on offer, each of which you can call via a simple API (more on this in a second). The description next to each is the IBM-provided one. I'll say a bit more about a few of them underneath.
Concept Expansion: Maps euphemisms or colloquial terms to more commonly understood phrases
Concept Insights: Explore the concepts behind your input, identifying associations beyond traditional text matching.
Dialog: Enable your application to use natural language to converse with users
Document Conversion: Converts a HTML, PDF, or Microsoft Word™ document into a normalized HTML, plain text, or a set of JSON-formatted Answer units.
Language Translation: Translate text from one language to another for specific domains.
Natural Language Classifier: Natural Language Classifier performs natural language classification on question texts. A user would be able to train their data and the predict the appropriate class for a input question.
Personality Insights: The Watson Personality Insights derives insights from transactional and social media data to identify psychological traits
Relationship Extraction: Intelligently finds relationships between sentences components (nouns, verbs, subjects, objects, etc.)
Retrieve and Rank: Add machine learning enhanced search capabilities to your application
Speech To Text: Low-latency, streaming transcription
Text to Speech: Synthesizes natural-sounding speech from text.
Tone Analyzer: It helps people detect, understand and revise the language tones of emotions, social propensities and writing styles from their writings.
Tradeoff Analytics: Helps make better choices under multiple conflicting goals. Combines smart visualization and recommendations for tradeoff exploration
Visual Recognition: Analyzes the visual content of images and videos to understand their content without requiring a textual description
Cognitive Commerce, Graph, and Insights: These are third-party provided.
Some comments
Exposing analytics as a service could be great, but I don't think Watson quite gets you to where you want to be. Maybe it will soon, but not right now. For example, suppose you want to classify images into one of (say) 5 categories. Right now, you cannot do this with Watson except in a clumsy way -- the API appears to only support binary classification for images, so multi-class classification is not possible. Sure, you can build multiple binary classifiers but this is silly cludge to have to work through. There are some strange other restrictions too: you are limited to 15,000 training samples when you train the natural language classifier. Why? Training a deep network to do multi-class classification would certainly benefit from having more than 15,000 training samples, so why put this limit in there? Also, it's not clear to me what sort of training is going on under the hood. I know that putting things behind an API is meant to mask all that, but training neural network's can be quite fiddly and task specific... and I am really quite doubtful that the default network architecture, learning rate, batch size, etc are anywhere near optimal. It would be nice if IBM did hyper-parameter tuning for you behind the scenes too, but I'm pretty damn sure this is not happening -- you just get a trained classifier that may or may not have good settings for your particular problem.
In general, the Watson API offerings seem to occupy a space that will appeal to smaller organizations without a lot of analytics capacility, and may wow some managers who want 'insights' but don't care if those insights are accurate or not, or have a clear idea about what, exactly, they want to do with those insights. The prime example of this is the personality analysis API -- you can feed in free text (from politicians speeches, or a twitter handle, or wherever) and get back a 'personality model' of the person who wrote that text. The personality model is a bit like a horoscope though: almost anything it returns sounds half-plausible. So I can see managers saying 'great, this tells us which customers are open, or angry, or supportive, and this will let us target marketing to them!', but are the scored personality traits returned by the Watson API actually useful in targeted marketing? Who knows, but I'm certainly sceptical, and I'm scared that the richness of the outputs will be mis-taken by managers as something that is both accurate and useful, when it's far from clear that it is either.