AI research has come of age and we no longer just read about it in newspapers. A lot of investment and effort are being infused increasingly in all branches of AI. 2018 report published by aiindex.org states that almost 60000 research papers were published worldwide alone in 2017. The relative growth in publishing compared to 1996 was 9 times!
However, to an applied computer scientist only relevant research is the one which translates into consumable technology. Even though a lot of algorithms on AI are open and there is ton of data available to train those algorithms with, it is generally hard to utilize the available data, tune the algorithms for underlying use case. It usually requires a lot of expertise and effort and can drive project costs up. Thankfully, another paradigm in computer science is already helping in democratization of the AI – cloud computing.
There are simply too many cloud service providers, but we will only consider the public cloud providers whose services and APIs are immediately available for consumption and can thus be evaluated. And these are none other than usual suspects – Google, Microsoft, Amazon, IBM. The other heavyweights Apple and Facebook although invest heavily in AI and use a lot of algorithms in their own products, their research is not available as services for third party development.
There are two broad sets of cloud AI products and services. One is AI Platform or infrastructure services where cloud providers make available compute services and environments optimized for custom AI model training and machine learning. Google Cloud AutoML or Microsoft Azure Machine Learning are some of the examples. These platform tools provide the control over model training process and scalability in terms of computation power needed to train with large amount of data. But it is more or less the same do-it-yourself approach which project managers would like to avoid if asked. It is not readily consumable per se.
The other set is cloud AI APIs. One does not need any training data, nor does she need to perform any training or hyperparameter tuning activities. AI APIs in cloud manage to bring together enormous data and right algorithms tuned ready for consumption, so that users can focus on their application. We will review in this article, what options of ready-to-use AI algorithms are available to us for our own application development.
A Forbes article mentions what all things AI can already do for us. Even though reading your mind is one of them, currently it is not mature enough for public use. Normally, consumable AI API services fall into following categories – sight or vision, voice or speech and natural language. Sight AI would help to annotate or classify images, differentiate cats from dogs, extract key-frames from a video or perform automated video analysis. Voice or speech AI is another important field. How much research has already gone into automated speech recognition and annotation is only evident from all the digital assistants like Apple Siri, Microsoft Cortana or Amazon Alexa and more like them. Cloud providers are now offering this perfected research in form of speech to text (STT) and text to speech (TTS) APIs. Another emerging use case in voice AI algorithms is speaker recognition or speaker diarisation. Natural language processing (NLP) and analysis has also been an area of interest for long time. Most common NLP algorithms revolve around machine translation and natural language understanding use cases like keyword/entity extraction, sentiment analysis. All the cloud API services boast use of state-of-the-art algorithms and continuous improvement through new data. The services are categorically listed below per provider.
Amazon Rekognition is image and video analysis offering from Amazon AWS. Some of its features are object and person detection as well as tracking, text or activity extraction and facial recognition and analysis on both images and videos. One of the novel features is unsafe content detection.
Google provides its vision services through two separate APIs – Vision AI and Video AI. Google provides image classification/annotation as well as similarity detection features through Vision AI. Rest of the features like optical character recognition (OCR), object detection etc. are also standard. Google also provides option of using custom algorithms and models through AutoML. It’s probably worth mentioning that every Google cloud AI API has pairing with AutoML and hence can be assumed in future. Again, video classification, scene change detection and automated transcription are some of the noteworthy features of Video AI.
Microsoft has distributed their vision API in various specific use case API sets. Their analysis APIs are different for image and video. Computer Vision API provides standard features of image analysis except face recognition or OCR which have been bundled into separate APIs. Microsoft has two innovative services – ink (digital or handwritten) recognition and form recognition which can help extract tabular and/or structurally formatted data from images.
Amazon Transcribe and Amazon Polly are STT and TTS APIs from AWS. Google STT claims to understand 120 languages and Google TTS can generate speech in 32 different voices. Apart from STT and TTS services Microsoft Speech portfolio also provides real time speech translation and speaker recognition APIs. Along with standard functionalities, IBM Watson STT offers many nuanced capabilities like acoustic and language model customization.
All the four cloud providers one or the other form of dialog capabilities which can help build custom bots.
Natural Language API
NLP and NLU functionalities are broader and can have many sub-capabilities. Translation, keyword and entity extraction and document analysis are some of the important capabilities. All the providers have translation capability as standard. Keyword/entity extraction algorithms try to extract syntactic and semantic relations within a text. Features here differ but baseline is the same. Sentiment analysis is sometimes clubbed with keyword/entity extraction or is sometimes offered as independent feature. It depends upon whether sentiment analysis is done on keywords or entire document. There is nothing specific to mention for Amazon or Google APIs here. Microsoft has a new feature Immersive Reader which helps embed read-aloud and text comprehension facilities for applications.
IBM Watson has a strong portfolio with natural language classifier and natural language understanding. IBM has two special offerings tone analyzer and personality insights something which seem unique amongst all the providers. It is interesting to note that IBM has decided written text as a medium to implement both these algorithms rather than voice.
Equipped with these choices the application developers certainly have enough power to reap the fruits of AI.