Hello World, it's Siraj, and in this video, we're gonna make our own chatbot via TensorFlow. 2016 has been the year of the Chatbot. Messenger, WeChat, Skype, and a bunch of other popular messaging platforms now host chatbots that developers have built for them and brands are increasingly using chatbots to engage their customers because the data doesn't lie. 90% of apps that get downloaded are only used once. With a chatbot, there's no need to download anything. It lives inside of the chat app that you open up a dozen times a day. And competing for space on your phone's home screen is really hard, but for space on your next most used screen, your chat app, that's more doable. You can now chat with CNN to get the news or chat with a bot to get flowers delivered to your boo or even chat with a matchmaking bot.
-What the— -But even though the chatbot space is getting really hot, it's nowhere near saturated. Just think of an app that you like and build a chatbot for it. Chatbots ARE the new apps. Before deep learning hit the scene a few years ago, all chatbots had hardcoded responses. A programmer would try to predict everything you would say and build a huge list of responses for every question they could think of. All of them were pretty terrible. Deep learning changed everything and still is as new discoveries are being made. Instead of telling the computer what to do, you can say "This is what I want as an outcome. Make it happen." Some chatbots that have used deep learning do so by taking different components and applying it to each of them. I could create a deep learning-based system to interpret the language and another one to track the state of a conversation and then another one to generate a response.
Each of these systems would be trained separately to do its own task, and the chatbot would collectively use the results from each. This is unnecessarily complex to build. [ALARM] A better type are called "end-to-end." These are chatbots that use ONE system that's trained on ONE dataset. They make no assumptions about the use case or the structure of the dialog. You just train it on the relevant data and say "I want you to be able to have a conversation with me about this data." End-to-end systems are what we should all be striving for.
Intuitively, they make sense, and they're starting to outperform all other systems. So let's talk about how to do this with deep learning. The most simple type of neural net is feedforward. That means that as it trains data just flows one way from the input node all the way to the output node. It only accepts data that is a fixed size like an image or a number. Give it a labeled dataset like whether or not a temperature is hot or cold and it'll be able to predict if a given temperature is hot or cold.
But a conversation isn't a fixed size. It's a sequence of words. We need a network that can accept sequences as an input: a recurrent neural net. In a recurrent net, we feed the data back into the input while training it in a recurring loop. So we're going to build a chatbot in TensorFlow using recurrent neural nets. Our steps will be to download our dataset, create a model, train it on that dataset, and test it out by chatting with it. The first thing we want to do is decide what dataset we want to use.
If we were creating a chatbot for a specific use case like customer service, we want to use a dataset of conversation logs from a real human representative, but for this demo, we just want to make a fun conversational bot, so we'll use a movie dialog dataset compiled by Cornell University. It contains conversations between characters from over 600 Hollywood movies. Hopefully "Transcendence" is not included in that list. We'll download our dataset and put it in our data directory. Next we'll want to split our data into two different sets for training. We'll call one set "encoder data" and the other set "decoder data." The encoder data will be the text from one side of the conversation.
The decoder data will be the responses. Then we'll want to tokenize our data and give each token and integer ID. "Tokenizing" means taking each sentence, like "ayyy lmao," and chopping it into pieces so that it's easier for a model to train on, and giving each token an associated ID will make data retrieval faster. Once our data is properly formatted, we can create our model. We can define our own function for this that takes our tokenized encoder and decoder data as its parameters. Our function is going to return TensorFlow's built-in sequence-to-sequence model with what's called the embedding attention mechanism. Let's break down what the F this means. A sequence-to-sequence model consists of two recurrent neural networks.
One recurrent net is the encoder. Its job is to create an internal representation of the sentence its given, which we can call a "context vector." This is a statistical value that represents that sentence. The other recurrent net is the decoder. Its job is to, given a context vector, output the associated words. The type of recurrent net we'll be using is called a "long short-term memory network." This type of network can remember words from far back in the sequence, and, because we're dealing with large sequences, our attention mechanism helps the decoder selectively look at the parts of the sequence that are most relevant for more accuracy. So our model will be able to create context vectors for existing questions and responses and it'll know to associate a certain type of question with a certain type of response.
So, once we create our model, we can train it by first creating a TensorFlow session which will encapsulate our computation graph. Then we'll initialize our training loop and call our session's run function which will run our computation graph which is our sequence-to-sequence model and we'll use it as our parameter. Now we can save our model periodically during training.
Using the TF train.saver function This will save our model as a checkpoint file which we can later load once we're done training using the saver's restore function. When we run our program, it'll take a few hours to fully train. We can periodically test what kind of responses we get from our bot in terminal if we like, and, as you can see, responses are pretty meaningless at first, but, as our model improves through training, eventually it becomes more coherent. So, to break it down: Deep learning allows us to make chatbots that are way more humanlike than any kind of handcrafted chatbot we've made before. End-to-end systems are systems that allow us to use a single model to give us our desired outcome, and we can use sequence-to-sequence models using two recurrent neural nets to create conversational chatbots. The winner of the coding challenge from the last video is Georgi Petkoff. He implemented three different methods to estimate a solution to the travelling salesman problem and benchmark the results in an IPython Notebook. Badass of the week! And the runner-up is Mick Van Hulst.
He used both the nearest neighbor and simulated annealing algorithm to estimate a solution. The coding challenge for this video is to use TFLearn to write a script that generates sentences in the style of "Lord of the Rings." [GANDALF LAUGHS] It'll take, at most, 50 lines of code and details are in the README. Post your GitHub link in the comments and I'll announce the winner in my video one week from today. Please hit that subscribe button. For now, I've got to hack Snapchat Spectacles, so thanks for watching..