MITCSY: Our Endeavor in Cloud Robotics

October 10, 2017 | Comments(0) |

CloudThat has always been a major proponent of emerging technologies such as the Cloud, Big Data and AI. Although technologies and concepts such as Big Data and Machine Learning are considered new, IoT is a concept that is newer still.

Numerous Cloud service providers such as AWS and Azure have stood up and taken notice of this emerging field. This is evident in the plethora of new services launched specifically targeted towards the requirements of IoT based solutions. Apart from major tech players of the world, small to large scale IoT conventions and events are being held that act as a platform for IoT enthusiasts to showcase what they have to offer in the field of IoT. One such event was the IoT Show 2017 where CloudThat showcased its telepresence robot prototype: MITCSY.

Figure 1: Basic architecture of MITCSY


While the MITCSY prototype was designed to be a telepresence robot, it offered additional functionalities that allowed one to use it as a general office assistant and administrator.

Although this article details the software aspect of MITCSY’s architecture, its hardware aspect will also be touched upon. By utilizing the power of Microsoft’s Azure Cloud platform, we were able to construct a robot endowed with cognitive and operational intelligence.

Figure 2: An early version of MITCSY



The main purpose of MITCSY was to serve as a low-cost, viable alternative to existing telepresence robots for the Indian market. To achieve this goal, MITCSY was primarily built atop open source technologies. The hardware components consisted of the following:

  1. Arduino Mega
  2. Arduino Nano
  3. Microphone
  4. Raspberry Pi
  5. Raspberry Pi Camera Module
  6. Servo Motors
  7. LCD Display
  8. Bluetooth Module
  9. NRF24L01+
Figure 3: Despite its small form-factor, the Raspberry Pi held its own against what we through at it



    • Keyword Spotting Engine

      Similar to Google Now and Siri, our robot’s primary mode of interface was through voice activated commands. For this, a KWS (Keyword Spotting) engine was required. Initial builds of the robot used the popular PocketSphinx speech recognition engine. However, the system was designed primarily for speech recognition and not keyword spotting. This led to its eventual discontinuation in our endeavour.As an alternative, Kitt.AI’s Snowboy Hotword Detection engine was employed. Snowboy offered immense customization, multiple regional language support as well as the ability to crowd source the training aspect of hotword detection engine. Its small footprint also enabled quick and simple deployment on the Raspberry Pi.

      Figure 4: Snowboy proved to be an excellent and reliable keyword spotting engine


  • Speech to Text & Text to Speech

    Once the command was recognized based on the keyword, the audio immediately following it was recorded as the command. This command was sent to Microsoft’s Bing Speech API for speech to text conversion.Microsoft’s Bing Speech API offers a wide range of language support and a straightforward set of APIs for production use. The Speech API offers two functions:

    1. Speech to Text (STT)
    2. Text to Speech (TTS)

    The availability of both STT and TTS ensures that a singular service is required for two critical functions.

  • Natural Language Processing

    Once the command has been issued, the required actions need to be taken. For this stage, Natural Language Processing (NLP) needs to be performed on the command. Post processing, appropriate actions need to be taken based on the command. Microsoft’s LUIS (Language Understanding Intelligence Service) offers NLP as a service. However, the LUIS service does not come integrated with natural language generation capabilities. As such, the downstream processing and response need to be generated via alternate services & frameworks.

    Figure 5: Microsoft’s LUIS service offers an intent-entity model

    LUIS works on the intent-entity model. For the service to perform as per requirements, intents and entities need to be created and the model must be trained. The advantage of using LUIS over an all new system is the general availability offered by LUIS via the Azure Cloud as well as the non-programming interface for model creation and publication.

    Based on the type of intent deciphered by LUIS, response handling was either done locally on the Raspberry Pi or the Cloud (Function App). Less resource intensive commands such as the weather forecast were performed locally to minimize latency. However, resource intensive or time-consuming tasks such as small-scale Cloud Deployments or database calls were performed on the back-end.

  • Face API

    The prototype was capable deploying small-scale Cloud resources such as VMs and Storage accounts on Azure. To ensure that unauthorized deployments were not possible, a two-stage authentication process was implemented. The first stage involved facial recognition wherein the user issuing the command was identified as an authorized candidate.

    Figure 6: Face API offers both facial recognition as well as identification

    For facial recognition to occur, we employed Microsoft’s Face API service. The intricacies involving Face API’s usage ensures that the system handling the biometric data is robust and secure.

  • Function App

    Azure’s Function App served as the robot’s back-end. Function App is akin to AWS’ Lambda. It is a serverless programming environment that can be developed in a variety of languages such as Node.js, Python, C# and BrainScript. The advantage of Function App is that the hardware handing the processing is taken care of (PaaS) and developers need to focus only on the logic.However, Function App is still a work in progress and several capabilities offered by a traditional server based system are lacking. The Function App was primarily used to service requests that were too resource intensive for the Raspberry Pi or required additional resources.

    Figure 7: Function App served as the robot’s serverless backend

    Apart from these major services, numerous other auxiliary services were employed.

    Figure 8: Twilio was one of the numerous auxiliary services employed for the prototype


Overall, the development of MITCSY was a challenging and enriching experience. Despite the short development time (2 months) and many hurdles along the way, the time spent on this project enabled us to grow as individuals and IoT enthusiasts. With version 2.0 coming soon, we hope to up the ante once more and offer MITCSY as a full-fledged product very soon.

Figure 9: The team at the IoT Show 2017


Although the services covered here may seem vast and varied, our IoT course covers every base to create a full-fledged IoT / Robotics solution such as the one you read about here. Our course ensures that one need not have a programming background to fully appreciate IoT and its implications. Do take a look at our offerings in IoT to learn more.

  1. Fundamentals of IoT – Level 1
  2. Working with Electronics in IoT – Level 2
  3. Cloud Robotics and Advanced IoT Architecture – Level 3
  4. Designing and Implementing IoT Solutions – Level 4

Are you excited about MITCSY 2.0? Do you have any suggestions about the architecture that we used? Let us know in the comments!

Leave a Reply