EngagerBot

I work with Emerging Tech – Build and Deploy an Amazon Lex Chatbot and Convert it to an Alexa Skill

Hey everybody, welcome to this talk on build
and deploy an Alexa chatbot and convert it to an Alexa Skill. My name is Sohan Maheshwar,
and I'm a Developer Advocate for AWS working in the Benelux region. Let's dive straight in. I'm so excited to be here today to talk
to you about conversation interfaces, and that's what the focus
of today's talk will be. We will talk a bit about Amazon Lex
and the technology behind it, how it works, and what you need
to do to build a Lex chatbot. We will talk a little bit about a few concepts
related to chatbots like intents, utterances, and slots.

We will talk a little bit
about Amazon Alexa and how it is
the next major disruption in computing, and then we'll end with a small demo. So let's dive straight in. Now, I'm sure all of you have
in some capacity or the other worked with personal computing, or else you probably
wouldn't be watching this talk today. And all of the personal computing
that we have interacted with so far has looked like what you're looking at
on the screen right now. It's been either a mobile phone
or a laptop or a desktop computer, or you could even have interacted
with a thermostat or a remote control, a car entertainment system. Now, if you really notice
the interfaces and all of these have sort of been similar, and to give you
a small history lesson on interfaces, well, it all started in the '70s, when you had
these small black and white monitors like the one
that you see on the screen, and you had just this keyboard,
and you enter text-based commands to do something.

Things evolved a little bit
a little later on in the mid-'80s when for the first time
you had graphical user interfaces. You could actually
see a file on the desktop, and you could move
this piece of hardware around that would translate onto the screen, and I'm talking about a computer mouse. And things evolved
a lot more in the 2000s when for the first time you had
touchscreens in your pocket, and that was mind-blowing back then. You know, you had this amazing device
that would be in your pocket that could give access to the wealth
of information in the world, and you could use a touchscreen,
so you could zoom by doing that pinch to zoom, and all of these are interactions
we have on a day to day basis. But, guess what, these interactions
and these interfaces are interfaces that we have actually
learned over a period of time.

It's not like any of these interactions
really came naturally to us, but we learnt it to be able to talk, to be able to communicate
with computing. But thanks
to all the advancements in tech, especially in the field of machine
learning and speech recognition and natural language understanding, we are at a stage now
where we can actually communicate with computing via a conversation, and we call these
conversational interfaces.

Now, conversational interfaces
are set to sort of change how we think and operate and behave, because, for the first time, as humans, we can actually communicate by chatting or talking, which was obviously
not possible earlier. And this topic is very dear to me, because I worked in the Alexa team
for over two years, and now I'm working in the AWS team. So I've worked
very closely in that team, and I sort of know the cutting edge
technology that really goes into this. One thing I should always say is
that this paradigm really excites me, because I think it lowers
the barrier to access technology. Right now I'm sure
we've all gone through this phase where we've helped
either a parent or a grandparent or an elderly person with tech, and usually the conversation goes
something like this where you say, hey, click this, do this, swipe here,
and this is what would happen.

You're sort of teaching them
a series of steps to do something, but with conversation interfaces,
all they need to know is to sort of know the language. And they'll be able to access
technology that would have probably been
a little difficult earlier. When it comes to conversational
access and tech, I think the most important tenant
is for it to be natural. We as humans
have to be able to speak naturally, and it's the job of the computer
to really understand what's going on, and it's the job of you
as a developer or a technologist to design the chatbot or to design the Alexa experience
in such a way that the skill or the chatbot's
able to understand what the human is really saying.

Now, a line I often use
is that I think so far we as humans have been forced
to think like computers. We shouldn't be thinking
in terms of drop down menus and radio buttons, but, you know, that's
how we've interacted with our tech. But with conversational interfaces, I think computers are being forced
to think like humans, which I think is very powerful. You also want your conversational
access to be on demand, especially in cases
like customer support or informational chatbots. You want it to be on demand. You don't want it to be like, say,
email where you send an email, you wait for two to four days
for a response back.

Conversation allows that instant
sort of transfer of information that you really want. You also of course
want it to be accessible, so you don't want it to just be on, say, a platform or a website where you need
like five levels of access to get to. You need it to be
where the people really are, and I think all of this entails in the fact
that it has to be efficient. It has to be efficient
from a tech point of view and from a design
point of view as well.

So Amazon Lex is a service that Amazon has built for building conversation interfaces
using voice and text, and we will talk
a little bit about how Lex works, its benefits, its features, and we'll talk a little bit about
how you can actually build something using Amazon Lex. Amazon Lex basically
offers the complete solution when it comes to building
a conversational interface, so you don't need to have
a machine learning background to be into speech recognition
or natural language processing. All you need to know
is to know a little code so that you can build a really nice
conversational experience, so Lex takes care of things
like speech to text and speech recognition and so on
and also the dialog management. You can actually deploy it
to different places, and we'll talk
a bit about that in a while.

Lex is completely scalable and connects to a whole bunch
of AWS technologies as well, so you don't have to worry about, okay, catering to one user
versus a million users. It has a lot of security services
that it works with, so you don't have to worry
about things like personalisation
or even authentication, and of course you get great
analytics about how people are actually chatting or how people
are talking to your interface or your chatbot,
so you can really iterate on making it a lot better. So let's go to some
of the features of Lex, and I think one of the best features is the fact that you can build
this chatbot just once, right, the logic of how the front end
and the back end works, and, again,
we'll talk about that later, and you can deploy it
to multiple platforms.

So you can deploy it to mobile, you can deploy it to web, also popular messaging platforms
like Slack and Facebook and Kik and Twilio SMS as well. And the demo's
going to show you just that. Lex is of course
designed for builders, it's efficient, it's intuitive, in fact, very recently,
the Flemish government actually built a chatbot using Lex to answer questions from the citizens
about the COVID-19 situation. So, as you can see,
it's designed for builders. you can really use it to really
scalable chatbots very quickly. Also, Lex is Enterprise ready, so, if you have a bunch of SaaS tools
that you want connected, maybe you want
a chatbot that's internal that just connects to sort of metrics
within your organisation, or maybe it is something to help
new joinees in your organisation, it can automatically connect them.

You'll have to connect it of course,
but it has options to connect it to other SaaS systems as well. And of course there is
continuous learning, so, as more people use the chatbot and more people use this experience
that you've build, the better it gets. So over time you can make
a really, really powerful chatbot that really understands
your customers really well. All right, so that was about why you need
to think about using Lex. So let's talk a little bit
about their design workflow and how Lex actually works. So on one side you
have your customers, your customers could be on mobile,
they could be on the web, maybe it's via IoT, or they could be on any popular
messaging platforms such as Twilio and Kik and Slack. Now, when they use a Lex chatbot,
a few things happen. One is as a developer
you can actually choose whether you want to authenticate
them before they start the bot.

Now, sometimes you
may want to authenticate them, especially if there is
an account associated with what information
they're trying to get. But a lot of informational chatbots
don't need authentication, so this is something
that you can absolutely choose to do. Cognito is an Amazon service that actually takes care of a lot
of this authentication and identity, so you can use Cognito
to authenticate your users, you can even use
CloudWatch to get metrics on how people
are actually using your chatbot.

Now, here comes the interesting part. So, when someone talks to your chatbot, Lex basically uses two technologies, both of which are based on the same
deep learning that powers Amazon Alexa. The first one is if your user
uses speech to talk, then there is something
called speech recognition or automatic speech recognition, ASR, which converts speech to text. Now, it just doesn't convert
speech to text, but it does so with a lot of context, because when it comes to speech, two words can sort of sound the same. You really sort of need
contextual speech recognition to make your speech recognition
that much more accurate. Once you've converted
that speech to text or if someone speaks
to your chatbot only via text, how does the bot actually
understand what you've said? The bedrock
of all conversation interface, right, chatbot
and Alexa and what have you is something called
natural language understanding.

Now, this is a pretty popular field
in computer science right now, because it really sort of drives the whole conversational
experience home. So what natural language
understanding does, and this is really simplifying it is it converts unstructured
conversational data, you know,
human conversation's not structured, that is grammar, but it's unstructured. It converts that into a structure that computers can understand. How it does that is it converts
like a sentence that I say or an utterance, as we call it, into a bunch of things
called intents and slots, and we'll get to that in a bit.

So what happens is Lex uses this NLU,
natural language understanding, to convert what the user said into a structure which is
what the intent is, what the slots are, and that is sent to a place to fulfil, you know, whatever service
you're providing. Typically that is a Lambda function, this is your back end essentially. Your back end you can choose
what you want to do with it, you can hit an API, you can hard code some data,
you can query a database. All you have to do is send
some structured data back, and Lex takes care
of showing the output back to the user on the platform that they have chosen
to interact with you. So that was the workflow
behind how a chatbot actually works. Now, let's take a closer look
into a couple of terms that I actually mentioned earlier, which was what is an intent,
what is an utterance and what are slots. For the purpose of this presentation
and the demo, I build a simple ordering flowers bot.

I'm based out of Amsterdam,
and it's tulip season here, so the tulips are blooming
all over the city, so I built a simple bot
that simulates a conversation where a user or a customer
wants to order flowers. So here's how the conversation
between the customer and the bot goes. So the customer finds the bot and says something
like I would like to buy flowers. Now, let me break this down. What the customer says is
essentially what an utterance is. Now, a customer—the thing
with conversation interfaces is that a customer can say the same thing
in many different ways. A customer could say something
like I would like to buy flowers, I want to buy some flowers.

Hey, can you help me buy some flowers? Hey, can you help me
purchase some flowers? So there's so many different ways
of saying the same thing. To give you another example, something as simple
as asking for the weather. A customer can say anything
from what's the weather, tell me the weather, to, is it how outside
or do I need a coat today? Just different ways
of saying the same thing. Now, in any conversational interface, all these different utterances are matched
to something called an intent. An intent basically performs an action that is a response to any utterance. Why do you really need this intent? Well, like I said,
there are so many different ways of saying the same thing, so every utterance has to be mapped to something called an intent.

All right, so a customer said
I would like to buy flowers. The bot responds with, hey, what type of flowers
would you like to order? The customer says tulips, and the bot responds and continues
the conversation with, hey, what day do you want
the tulips to be picked on? What time? Can you confirm?
So on and so forth. You'll see here that the chatbot is actually asking
for certain pieces of information. I've underlined tulips there. A tulip is the type of flower. A user could say
something like rose or lily. And similarly the bot is also asking
for the day and the time at which they want the flowers
to be delivered. These pieces of information
are called slots, and think of slots as variables
within an utterance, any piece of data
that can change from user to user within an utterance.

For example, going back to the weather
example that I spoke about, I could ask
for the weather in Amsterdam, but someone could ask
for the weather in Paris or in Delhi or in New York, so, in that case, the city there,
Paris, Delhi, Amsterdam, etcetera, becomes this slot, and the slot value
becomes what I just mentioned. So that is what a slot really is. You'll also see that this is how the conversation
would typically end where the bot says something like, hey, your tulips will be ready
for pickup at 9:00 A.M. Is that correct?
And the user says, yes, and the bot says, thank you,
your order is placed, and there's a nice
little emoji there as well. So this is what we call the fulfilment where after all of this conversation, your back end is sort of fulfilling
the user's request. And again your back end could
typically be a Lambda function which talks to, you know,
your services that have already built in your start up or your Enterprise. All right, so the core of any
of the conversational experience is basically
how the conversation is handled and how the dialogue is handled.

So we have something
we called slot elicitation, which is basically how Lex
asks for pieces of information. Now, for something
like the flower delivery bot to work, it needs a few pieces of information. It needs what type of flower,
it needs the city, the date, and the time. These four pieces of information
it absolutely needs. Now, you would hope that all your users and all your customers
would talk like this where your bot says something
like, hey, what type of flower do you want, and the customer says
I would like tulips that I can pick up tomorrow
in Amsterdam at 9:00 A.M., so in one shot they're giving you
all four pieces of information, which is great.

But unfortunately or fortunately
this is not how we talk or how we communicate as humans. Imagine if you were to call up
your local flower seller and they say, hey,
what flowers do you want or, hey, how can I help you? You wouldn't give them all these
pieces of information in one shot. Typically there is a back
and forth between the user and, you know,
the person providing that service, so we modelled— we've given the option to model
a similar sort of paradigm with the bot as well, so in this case you can see
the bot is asking questions like, hey, what type of flowers
would you like? What day do you want it?
What time of day and so on? So essentially what's happening
is what we call slot elicitation where the bot is basically
asking the user for the pieces of information it needs to complete that request, and it's doing so by using
something we call a prompt, which is basically
a spoken or a typed face that invokes this intent,
or that gets that piece of information.

Like I said, the core of providing
a great conversational experience is managing this dialogue. Now, what you saw earlier is
what we call a multi-turn dialogue, where there is different turns
between the user and the bot. The bot says something,
the user responds. The bot says something,
the user responds. The other way of actually talking is what we call
a single turn conversation, where the bot says something,
the user says something, and the conversation has done. Most conversations that we've seen
are actually multi-turn conversations, and these lead to better user
experiences and happier customers, so Lex takes care
of this multi-turn conversation and what we call dialogue management. So, as you can see,
there are a few types of slots, there's a type of flower, there's a pickup date
and a pickup time, and each of these slots
has corresponding prompts as well. So, if a user doesn't mention
the type of flower, the Lex bot will remind them saying, hey, what type
of flower would you like? Or if they don't mention
the date or the time, the corresponding prompt will be, hey, what time would you like
the flower to be picked up? The great thing about this is
it's not completely linear as well.

So suppose a user says something like I would like the tulips
to be picked up tomorrow at 9:00 A.M., the users answer two questions there. So the next question by the bot
is not again going to be, hey, what time
should we pick up the tulip, because the users already
answer that question. So there is enough intelligence there
for that dialogue to be managed and for each of those slot values
to be picked up.

All right, so we're going to be talking
a little bit about how you can customise
conversations. Now, conversations are not easy
if you really think about it. As humans, it comes naturally to us, but for a computer to mimic
a conversation, it's not as easy. And the reason
is I think because conversation and we as humans
what we do really well is hold context. We hold context amazingly well. I can meet an old friend of mine
and refer to some good times we had like maybe 10 years in the past, and that reference is immediately
picked up by the person, because both of us
have that shared context. So we have tried to give that sort of contextualisation to Lex and to bots being built on Lex as well. And having that sort
of contextualisation and personalisation is the key to building
a good conversational experience. So, for instance, if a user says something
like I would like to buy flowers, and we have been doing that
over the last past week, the bot's next response could be, hey, would you prefer
to buy tulips again? So you're giving
an added contextualisation to that particular user.

There's a good chance
if a user has bought tulips on four consecutive days
and the user bought the fifth day, they're probably
going to buy tulips again, which is why we've given you
that option to actually do that. Similarly you can also validate
the user's input again, if you want. So maybe you don't have
tulips available on the day that the user's asked again. You can say, hey, sorry,
I don't have availability, would a later day
actually work for you? Like I said, conversation is not easy,
and you need context, and you need to sometimes
store that context to have a nice conversational
experience with the bot, so for that you've been given
the option of storing something called
a session attribute.

Now, a session attribute
is some piece of data that helps with storing context. This could be context of a session, a session is defined from when a user
starts using the bot to stopping. So maybe the user
logs in and says, hey, I need help with my order, this is my reference number, and that reference number
is stored as a session attribute, so until the bot
actually helps the user, that reference number is stored. Sometimes session attributes
can be permanent as well. Maybe this is a bot which requires
some sort of login and authentication, and the user, you know, logs in, so then you maybe know the user's name. So you can store the user's name in a session attribute,
so that the next time the user logs in, you can welcome them,
hey, like a welcome back, which sort of gives you that nice
user experience at the end of the day. As you can see,
Lex sort of maintains this context by storing data
throughout a conversation. This data could be anything from a slot value to a confirmation, and all these are actually
stored in session attributes.

We give you, the developer, the flexibility in figuring out
what to do with the session attributes. Now, conversation
is not always linear. All right, if conversations
were always linear, they'd be boring, but they'd be easier to maybe build
for when you're building a chatbot, so typically a conversation
with a chatbot could go one way. So, if you're building or rather
if you were building a bot that helped with buying flowers, maybe the user's almost done
with the entire process. At the end, the bot says something like would you like a small
or a large bouquet? Now, at this point,
the user doesn't know how many flowers
are there in a small bouquet, like how small is small
or how large is large, so the user says something like how many flowers
in a large bouquet? This is a different intent
at this point in time.

When a user says something like this, this would match to a whole new intent. It wouldn't be part
of the order of flowers intent that we spoke about earlier. So, in this case,
you'll have to switch context and store the current context
in a session attribute and answer the user's question. So maybe you say, hey,
30 flowers in a large bouquet. Then if the user says
something like, oh great, place my order or confirm, if you've stored the context
in the session attributes, you don't have to ask
the previous questions again, because that context
is really maintained, and, again,
Lex gives you that option to do so. So this way you can switch
seamlessly between intents and come back and not have
a bad experience for your user. If you didn't store
the session attributes there and the user asked a question like how many flowers
in a large bouquet, you bot answer is saying,
hey, 30 flowers in a large bouquet, and then when they came back, you'll have to go through
the entire process again, which is not a good look.

So you need to store
that session attributes when you're switching context. You can also chain different
intents together sometimes, so, for instance, say a user
has gone through the entire flow to buy flowers and the bot says something
like, hey, anything else today, and the user says, hey, you know what,
I want to update my address, this is for someone else.

You can actually chain
the update address with the order flower's address and have a seamless conversation so that the user
doesn't feel very confused. This also leads
to a good user experience and gives you flexibility
as a developer as well. All right, so, when it comes
to specifically text bots on Lex, you can take advantage of the medium. It's always good to really take
advantage of the medium and to use rich message formatting. So, for instance,
there will be a lot of times when you actually want to show
visual feedback to your users, for instance, this is an example
from a car rental bot where a user is looking at three cars before they choose
which one they want to rent.

Maybe a user is buying t-shirts
via your amazing chatbot, and they want to look at like the three
different colours that you offer before they actually place the order. So you can take advantage
of the rich messaging formats on different platforms to give
a better experience for your user. Now, each platform has its own way
of sort of figuring out formatting. So Slack versus Facebook
versus Kik will look different. Similarly, you can really customise
the experience on your mobile app and on your web app as well, but make sure you use the medium well to provide like a good experience
for your user. When it comes to the fulfilment
of what your chatbot's doing, there are a couple of ways
you can do it. Most people will choose
to use a Lambda instances, so, again, that is a call that's made
to your Lambda function, and you can choose
what you want to do with it. Maybe you make an API call,
maybe you hit a database, you can hard code some data.
It's really up to you, and your Lambda will return whatever text
that your user's choosing to see.

A lot of times you can return
that to client as well, so the output can be returned
to the client for their processing too, and a lot of times their dialogue part
is sort of taken care of by Alexa. So until the slots are elicited,
all the prompts are thrown in, and then the fulfilment is made to your Lambda function. Like I said, Lex takes care of the entire life cycle
of the chatbot, so, if you can actually save your bot and it preserves
the current state on the server, you can build your bot,
which sort of builds the binary, and you can build like test dev
and prod versions as well, and you can test it out right
in the Lex window.

And, once you think it's ready, you can publish
it out to different platforms. You can publish it out to the messaging
platforms as well as mobile and web. I think one of the greatest things
about just using chatbots is the fact that you can implement
continuous learning on the chatbot. This helps a lot, especially in the cases
of things like customer service bots. Just to give you an example,
most customer service clients I mean not just bots,
but if you take even like a call centre, 80% of their queries are— 70 to 80% of their queries are from the same bank
of questions or queries. Typically customers have
the same bunch of queries. With a chatbot, there could be one
that's outside of those usual queries, and, at that point of time, you
can choose to maybe switch to a human at the back end where you're really
augmenting this experience with the human.

The human answers the question, and that question and answers
are again fed back into your system. The next time someone
actually asks that question, the bot is answering the question, and that process is becoming
a lot more efficient over time. You can also use CloudWatch
to monitor your metrics, so you can get really good metrics
for how people are using your bot, when they're using it, what are the intents
that are being hit the most, what are the responses
that you're getting the most and so on. And another thing you can also do is sort of manually look
at the utterances that were missed. And this brings us to an important
part about conversation interfaces, is that testing becomes
so much more important for all the artificial intelligence
in testing and automating your testing, when it comes
to conversation interfaces, you really want to look
at a lot of beta testing and manual testing for your bot. The reason is you might have thought
of maybe ten different ways someone could say a certain thing, but there could be a legitimate
eleventh way of saying something.

So you really want to look
at your missed utterances to see and to make your bot
that much more understanding and that much more efficient. Like we mentioned earlier, Lex is multi-platform, so you can build
that one bot just the one time, and you can deploy it to a mobile app. You can deploy it to Android,
iOS, all of that. You can deploy it to a host
of messaging platforms, mainly Slack, Kik, Facebook Messenger, and for your SMS as well. You can deploy it on the web using
all the commonly used web SDKs like React and Java Script
and Python and so on, and you can also integrate
with AWS IOE. With use cases, and I'm sure
you already have amazing ideas of how you can implement Lex
in your organisation or your startup, but right now we're seeing
a lot of popularity in contact centre and informational bots, and these are extremely popular, and they help make the system
so much more efficient, especially when it comes to things
like asking for information when you have a lot
of information on your website, sometimes a bot
is just that much easier.

You can also use it
to build applications, I know of people
who have built great bots, even in things like their dev ops, so it's contributing
to Enterprise productivity, and, like we mentioned earlier,
you can also integrate with AWS IoT, so you can build IoT bots as well. All right, like I mentioned earlier, I worked for more than two years
in the Alexa team, and it was an exciting time for me. I want to talk a bit
about Amazon Alexa, how it ties into this
and then show you the demo. So Alexa, in case you don't know, and I'm guessing a lot of you watching
this right now have a device at home, but for those of you who don't know, Alexa is the cloud-based service that powers devices
such as the Amazon Echo, which you see on the screen.

Now, these devices that you see
are very, very powerful in the sense that they each
have a microphone array, so you can get far field recognition, I can speak to a device
like almost 20 feet away. And these devices really help with everything
from day to day to entertainment to weather to news to music, and there's so much more. And the real vision for Alexa is this whole Alexa everywhere
sort of vision where people are interacting
using their voice because it's such a natural interface not just at their home,
but also on the go at their workplace
in their car and so on.

Now, when it comes to Alexa, there is something called
a skill on Alexa. Think of it as similar
to how the mobile ecosystem has apps, Alexa has skills. So anybody can build
a voice-based experience, it's called an Alexa skill,
and upload it to the skill store. Right now there are
100,000+ skills worldwide, and that number is just growing, and like you can see
most of the big brands in the world have amazing Alexa skills
that they have published so that users can use them. The skill store is very popular,
and anybody can build a skill for free and upload it to the skill store, I will also show you a demo
of how you can do that. So once you build a Lex bot, it's actually fairly simple to take
the front end of that Lex bot, so I'm talking about the intents,
the utterances, and the slots and the prompts, there is an option to export them. Just hit the export button, and make sure
you choose Alexa skills kit. This is a framework
basically to build a skill, so make sure you choose
Alexa skills kit as a platform before exporting your bot.

Once you have done that,
you can go to developer.amazon.com and create a new skill. It would ask you
for a few pieces of information like the name of your skill
and what language model, because it's there
in multiple languages like German and French and Italian, etcetera. So it would ask you that. And then, as you can see,
there is a drag and drop JSON option, so you just drag and drop that file that you've just downloaded, and it sort of builds the front end
for the Alexa skill for you from the bot that you
have created in Lex earlier. Okay, so let's do a quick demo
on how Lex actually works, how you actually build
a front end for your skill, how you deploy to a web UI and then to a messaging platform and then export that to a JSON file, which you can import to an Alexa skill. So I've just built
a simple chatbot here, and, as you can see, this is a simple bot
that has just the one intent, which is web UI order flowers.

Now, every intent has a bunch
of utterances associated with it. These are utterances that you,
as a developer, have to earn. Essentially you sort of put yourself
in your customer's shoes and think, okay, what are
the different ways they could in this case ask to buy flowers? And, as you can see,
there's a fairly comprehensive list of different ways to buy flowers.

There's everything
from 'may I get flowers', to 'I want to order flowers',
'I want to buy flowers', 'I want to put in an order', just different ways
of saying the same thing. All of these utterances
really map to this particular intent that you see here. Now, we also had slot values
in this particular chat bot, and there were
three specific slot values, the type of flower, the pickup date,
and the pickup time.

Now, as you can see,
the flower type is the first slot, and the slot type is something
that I have defined on my own. It's what I call a custom slot type, and I've called it
the web UI flower type. What you can do is
there are two types of slots, there's a custom slot type when it's a data set that is custom
to your particular chatbot. In this case,
I can limit the values of that slot to maybe the types
of flowers that I sell, and you can also add validations. So, for example,
a customer asks for hydrangea and the seller doesn't sell that. I can throw in another saying,
hey, I don't sell that, but these are the types
of flowers that I actually sell. Every slot is associated
with a prompt as well, you can see the prompt which says what type of flowers
would you like to order.

So if your customer doesn't mention the type of flower that they want, the bot actually throws
that particular prompt and says, hey, what type of flower
would you like to order? Now, the other two slots are
pick up date and pick up time, and you'll notice that both of them
start with Amazon Dot. These are essentially
built in slot types, which are slot types provided by Amazon for you to build your bots easier. Now, these slot types have well-tested. They're very comprehensive, and they exist for lots
of common data sets that you might use
while building a bot, so things like date, time,
currency, city names, place names, and so on. The two built in slot types as well have their corresponding prompts, so you have what day do you want
the flower type to be picked up, and you'll notice
that we have referenced another slot within the prompt, which is something
that you absolutely can do.

You'll also see here that there is
something called a confirmation prompt, which is something that you see
at the end of whether or not— at the end of the process, so once the bot has got
the pieces of information, the bot'll actually confirm saying,
hey, is this what you wanted? Do you confirm?
This is a good practice to have when you're actually asking
for a lot of pieces of information from your user. If it's just the one shot, then probably
not a good idea to have that. All right, so now
we're going to actually publish the bot. I can choose the alias,
and I can actually publish this bot.

I can choose
where I want to publish it to as well. I can choose to publish it
to either web or mobile, or I can go to all the different
channels that Lex has connections to, which is namely Kik,
Facebook, Slack, and SMS. Now publishing it to these channels
is very simple. You have to create an app or a bot on those platforms and then just enter some details
like a client ID verification token and a couple of other details
which just links the two together. So in this case I have created
an app on Slack, and I have published
this bot to Slack as well. So first let's see how the bot
works on the UI framework that I've built here as the bot, and as you can see there is
a chat window here, so I'm just going to chat with it
and say I want to buy flowers.

That is the rich messaging type
that we spoke about, so there's a photo
of some flowers there, and I'm going to click
on the tulips button. So what day do you want
the tulips to be picked up? The great thing
about this conversation experience is I don't have to just enter
like a date and a DDMMYY format. I can specify a date, or I can even say two days
from now or tomorrow. For now though
I'm just going to say June 20th, it asks me for what time.
I say 10:00 A.M., and there's a confirmation prompt,
so I say, yeah, sure, let's do it. Thanks for your order.
And it was literally that easy. Of course this goes to your Lambda
as a structured JSON, and your Lambda does
the part where it fulfils this order. It maybe talks to your API and so on. Now, I've done the same
on the Slack channel as well. I've added the app order flowers, which is of course a bot.

And I can have
the same conversation here. So I'm going to say buy flowers, and, as you can see, it is the buttons are native
to how it would look on Slack. If I gave you this demo
on Facebook Messenger, it would look native to how it would look
on Facebook Messenger, and Lex actually
takes care of all of this, which is another plus point
about why you should use Lex. So I'm just going to click on tulips. It says what date, I say June 20th, okay, I say 10:00 A.M., looks like there's going to be
a lot of flower orders on June 20th, and I say yes. And this is done. So this was an example
of the web UI and on Slack. Now, I'm going back to my bot, and, yeah, this is where all the bots
that you created would be listed, and, if I just choose that
and click on actions, there is an option
to actually export this bot.

So I'm going to click on that,
choose the version. When it comes to platform,
make sure you check Alexa skills kit. This is the framework on which you
need to use to build Alexa skills. So you can actually export
the front end of your bot, which is the utterances,
the intents, the prompts, etcetera, as a JSON file, which you
can import into your Alexa skill. So I'm just going to export this. And this gives you a downloadable
which is a ZIP file. I can just download that. You can then go to developer.amazon.com
and create a skill. You have to enter a skill name and what we call
a skill invocation name, which is the phrase a user says
to start talking to your skill, so for a customer to start talking
to us, they have to say open order flowers.

In the JSON editor pane, I can just drag
and drop that same JSON file, which basically gives you
a JSON representation of all that we built earlier,
so, as you can see, all the sample utterances are here, slot types and the prompts
are here as well, so it's essentially just
a simple JSON file that I've imported to Alexa, which makes it so easy
to start building.

I'm just going to click
on save and build model, which actually builds
and trains the model as you go. Now, with Alexa, I recommend building
a subnet backend in the sense you build
a different Lambda function. Alexa gives you the option
to host your own Lambda function within your skill itself, so the core option you see here
is precisely for that. It's basically a Lambda function
that is just for the skill. And you have the ability to test
your skill in the browser as well, so, even if you don't have a device, you can test your skill in the browser, which is what we're going to do
as soon as our skill is built.

It has. So I'm just going to build
the skill now. Yeah, there we go. I can use my voice or text,
but I'm just going to use my voice. Open order flowers and buy flowers. What type of flowers
would you like to order? Tulips. What day do you want
the tulips to be picked up? June 20th. Pick up the tulips at what time
on the 20th of June, 2020? 10:00 A.M. Okay, your tulips will be ready
for pickup by 10:00 on the 20th of June, 2020. Does this sound okay? Yes. Thanks for using Order Flowers. And that was it. So, as you can see, we used the same bot
that we built on Lex, and we exported that bot,
and we imported it to Alexa to start building the skill. Of course I have built
my own back end for the skill, but it's a great way to sort
of convert that Lex bot that you've been building
into an Alexa skill. So you saw in the demo
how we built this one chatbot on Lex, we deployed it to a web page,
we deployed it to Slack, and then we used the same,
the JSON file from the Lex bot, and we exported it into Alexa skill
to build an Alexa skill out.

Of course for this demo, I built two different back ends
for the Alexa skill and the Lex bot. Now, theoretically you can use
the same back end, but it becomes a little easier, if you actually have two separate
back ends for the two, because the main difference between
building a Lex bot and an Alexa skill is that with a Lex bot, you'll actually
get access to full transcriptions of what the user said. But with an Alexa skill,
all you get is the intents, the slots, and so on. You actually don't get the entire
transcription of what the user said, so there is that slight
subtle difference which probably makes it
more practical for you to have two different Lambda instances for your Lex chatbot
and your Alexa skill.

The one thing to really keep
in mind why building a chatbot and then building the same
conversational experience on Alexa is that there is a certain
difference in designing for text versus designing for voice, and this is purely a design
conversation we're having right now. So just to, you know, show you
and illustrate those differences, when you're designing a chatbot, you're sort of designed
for reading and writing.

You're designed for people
to be reading something, and whereas for Alexa you're designed
for people to be listening to it and for speaking out loud,
and that makes a huge difference. Just to give you an example,
take the differences between how the Harry Potter books read when you're reading the book
versus actually looking at the film. You'll see big differences
in the dialogues that are spoken out, because, one,
you're actually reading out another reading, and the other
you're actually listening to, so make sure you keep that in mind
while designing for the two. When it comes to text in a chatbot, you can personalise your brand
by the use of rich messages and emojis. You saw that we used
like a nice flower emoji when the order was complete, and we also used rich
messaging which of course with voice you can do
in different ways. You can use something
called speech cons, which are basically
things like sound effects or words that are common
to a certain area that you hear a lot, like hurrah or congratulations
and things like that.

Alexa gives you the option
of using these phrases to make it sound
that much more interesting. You also get access
to a huge sound library on Alexa that you can actually use
as you saw in the demo that we created. Like I said earlier, designing
for reading and writing means that people have
the ability to skim read. I think we've all read textbooks, especially when there's a lot of text,
where we're able to skim read and just pick out the piece
of information that we want. So, when you're building
a Lex experience, it's okay to be
a little more informational so that people can pick out the things
that they really ask you for. When it comes to voice,
it's not quite the same. People don't have the— we don't have the option to sort
of skip through what Alexa is saying. So try to be as brief as possible,
and this is imperative. In fact, in the Alexa team
we should say the thumb rule was the one breath rule. If it takes longer than one breath
to say that entire sentence out, it means it's too long.
So keep it as brief as possible.

And lastly, and this is very subtle,
but when presenting choices, with something like Lex in text basically you can present
multiple options just fine, so you can say something like, hey, would you like fries
or salad? And people would reply
with either of the two. In voice, make sure
those choices are definite, because saying something like, hey,
would you like fries or salad informally especially and we often hear
the response to that is yes, because people don't know
if that is a choice in itself or they have to choose between the two. So make sure you have
very definitive choices like, hey, which one would you like,
fries or salad? These are the subtle differences
between designing for text versus designing for voice. I think keeping in mind
some of these differences actually leads
to really strong experiences that your customers will come back to, and it is all about building
such engaging experiences, so keep that in mind
while you're actually building it out. And that was it for my talk, so first of all thank you so much
for attending this conference. It is a Virtual Summit, of course, but do check out the discovery zone.

We have some machine
learning competency partners, like Accenture, Deloitte,
and Snowflake, so go have a chat with them, see maybe if you can work with them
and get to learn something as well. I had a great time doing this.
Thank you so much. I would love to hear
the sort of conversational experiences you are building. So hit me up on Twitter.
That's my Twitter handle right there. And I'd love to hear from you. Thanks again and bye..

Leave a Reply

Your email address will not be published. Required fields are marked *