Machine Learning Magic for Your JavaScript Application (Google I/O’19)

[Music] my name is Sanjeev Gupta I am a product manager in Google and I'm Jana cassava and I'm a software engineer on the 10th of her chess team and we are here to talk to you today about machine learning and JavaScript so the video that you just saw this was our very first AI inspired google doodle and it was able to bring machine learning to life in a very fun and creative way to millions of users and what users were able to do with this is that you were able to use a machine learning model directly running in the browser that was able to synthesize a back style harmony and what may what made this possible was this library called tensorflow Jas so tensorflow Jas is an open-source library for machine learning in JavaScript it's part of the tensorflow family of products and it's built specifically to make it easier for JavaScript developers to build and use machine learning models within their JavaScript applications you use this library in one of three ways you can use one of the pre-existing pre-trained models that we provide and directly run them within your JavaScript applications you can use one of the models that we have packaged for you or you can take pretty much any tensorflow model that you have and use a converter and run it with tensorflow JavaScript you can use a previously trained model and then retrain it with your own data to customize it and this is often useful to solve the problem that's of interest to you this is done using a technique called transfer learning and then lastly it's a full-featured JavaScript library that lets you write and author models directly with JavaScript and so you can create a completely new model from scratch today in this talk we will talk a lot about the first and the third one of these for the retraining examples there are a bunch of these on our website and in the code labs and you know we encourage you to sort of take a look after this after the talk the other part is that javascript is a very versatile language and it works on a variety of platforms so you can use tensorflow J's on all of these platforms we see a ton of use cases in the browser and it has a lot of advantages because you know browser is super interactive you have easy access to sensors such as webcam and microphone which you can then bring into your machine learning models and also we use WebGL based accelerations so if you have a GPU in your system you can take advantage of that and get really good performance tensorflow J's will also run server-side using node.js it runs on a variety of mobile platforms in iOS and Android using mobile web platforms and also it can run in desktop applications using electron and we'll see later in the talk more examples of this so we launched tensorflow J's one year back last March and then earlier this year at a developer summit we released version 1.0 and we have been amazed to see really good adoption and usage by the community and some really good sort of popularity numbers we are really really excited to see more than a hundred external contributors who are contributing to and making the library better so for those of you who are in the audience or those of you listening thank you very much from the all of the tensorflow Jas team so let's dive a little bit deeper into the library and see how it is used okay I'm going to start with looking at some pre trained models first so I want to show you a few of these today so we have packaged a variety or a collection of retrained models for use out of the box to solve some of the most common types of ML problems that you mountain you might encounter these these work with images so we you know for tasks such as image classification detecting objects segmenting objects and finding boundaries of objects recognizing human gesture and human pose from image or video data we have a few speed smart audio models which work with speech commands to recognize spoken words we have a couple of text models for analyzing understanding and classifying text all of these models are packaged with very easy-to-use wrapped api's for easy consumption in JavaScript applications you can either npm install them or you can directly use them from our hosted scripts with nothing to install so let's take a look at two examples the first model I want to show you is an image model it's called body pics so this is the model that lets you take image data and it finds whether there is a person in that image or not and if there is a person it will segment out the boundary of that person so to label each pixel as whether it's belongs to the person or not and you can also do body part segmentation so it can further divide up the the pixels that belong to a person into one of 24 body parts so let's take a look at what the code looks like and how you would use a model like this so you start by loading the library and by loading the model using the script tag from from our hosted scripts you choose an image file you can load it from disk or you could point to a webcam element to load it from the webcam and once you have an image then you create an instance of the body pics model and you call its person segmentation method on the image that you have chosen because this runs asynchronously you wait for the result and we do that by using the await keyword so once you get back the segmentation result it returns an object and this object has the width and the height of the image and also a binary array of zeros and ones with the pixels where the person is found are labeled and you see that in that image on the on the right you could also use the body part segmentation method instead of the person segmentation method in which case you would get the the sub body part classification the the model is also packaged with a set of utility functions for rendering and here you see the example of the drop accelerated mask function which produces this this image on the right okay so this is how you could use one of these image based models directly in your in your web application the second model I want to show you is a is a speech commands model so this is an audio model that will look for that will listen to microphone data and try to recognize some spoken words so you can use this to build voice controls into interfaces or to recognize words and for translation and other types of applications so let me quickly switch to this to the demo laptop so we have a small glitch application written which uses the speech commands model and we are using a version of a pre trained model which is trained on a vocabulary of just four simple words up down left and right so when I click start and I can speaks these words this application will display a matching emoji so let's write out left up left down down okay right left up there we go we can go back to the screen this actually points to what you down term with machine learning models there are a lot of other factors that you have account you have to account for things like background noise and just training data representing adequately that the type of data that it is encountered in real life so let's again take a look at what the code would look like again you use the script tag similar to before to load the model and to load our library and now we create an instance of this speech commands model and we initialize it or we use a version of the speech commands model that's trained for the specific vocabulary of interest so in this case it's this directional for word vocabulary we have packaged this model with a couple of other vocabularies that you can use and also you can extend this model to your own vocabulary using transfer learning and we have a code lab which shows how to do that so once you have initiate initiated the model then we call its listen method which starts listening to the microphone data and then once it recognizes these words then it returns a set of probabilities for the matching score for each of the spoken words in its in its set of label classes and then once there once you figured out what the spoken word is then you can use that to display emojis for example in that particular example okay so I'm going to turn it over to Yannick who will show you how to do training using this library thanks Sandeep hello so sonic showed you one of the simplest ways to get started with machine learning with tensorflow Jeff and that's to take one of our pre trained models incorporate it into your app and if you noticed you won't even have to think about tensors but there are situations where there won't be a model that works out of the box for your use case and that's where training comes in so tensorflow jess has a full api support training custom models write in javascript so last year here at AIA we showed a demo of training a game controller using webcam input in this example your face to control this pac-man game and this year we're gonna look a bit more closely at the training process focusing on training in no jess and what it looks like to bring your own data now some of the advantages of training in nodejs increased include generally increased access to memory and storage increased performance in certain situations and importantly being able to browse the internet while you wait for your model to train in the browser when you're training a model you have to keep the tab focused else many browsers will throttle performance on that tab so it's quite handy that you're able to do something else all right so let's train a custom text classifier and they're really two main things I'd like you to take away from this exercise the first is generally how to work with text in tensorflow jess and the other is a general principle of using an existing building block to bootstrap your machine learning project this is referred to as transfer learning and it's really helpful when you're getting started with machine learning and we'll see more about that in the example but to step back a bit what can you do with a text classifier so there are classical examples such as sentiment analysis or spam detection but you can also do things like log scrubbing where you may look through your logs for maybe personal or private information that you don't want to keep and obfuscate it or remove it before you store it but you can also do things like analyze product reviews or do document clustering but today we're gonna build a component for a chapter a chatbot and in particular we're going to look at classifying user intents so for example given the sentence will it rain in the next 30 minutes we want the model to detect that that's a get weather intent or something like play the latest Bach album be a play music intent so any machine learning project needs data to learn from and today the data we're gonna use come from comes from the snips AI NL you benchmark and it's an open source data set that's available on github and for our first task we're basically gonna start with a spreadsheet as you can see it has the query sentences on one side and the intent on the other however one thing we need to do is convert this text into numbers so that we can feed it into our neural network because neural networks don't really understand text natively and that is where the universal sentence encoder comes in it's a deep neural network created by Google that I like to think of it think of as NLP in a box it takes sentences and turns them into lists of numbers that encode the meaning and syntax of those sentences and we'll take a look at an example so let's think of this example what is the weather in Cambridge Massachusetts the universal sentence encoder we'll take that sentence and turn it into an array of 512 numbers and it will always be 512 numbers regardless of the length of the sentence which is actually quite nice because it gives us a regular sort of structure to work with and this is what the code looks like to create those numbers so similar to what sandy showed earlier we load our pre-trained model the universal sentence encoder we wait for its weights to finish loading and then we call this model dot embed with the sentences that we want to pass in and this process of turning these numbers these sentences into numbers is often referred to as embedding in machine learning terminology and you'll kind of hear that term a bit so this is a bit of what that looks like wait for the result and that's our our set of 512 numbers okay next is the intent we also have to convert these into numbers so since these are categories and we have a small number of them we can use a scheme called one hot and coding to turn a label into a small array that has a 1 in the position corresponding to that label so in this example we have or get weather intent and notice the first element of the array is a1 and the rest are zeros in the corner you'll see the other two intents are using in this demo the play music one and the add to playlist and there's always just a single one in the array representing which category this represents and here's the code to do that and it is basically an in-depth look up into a list so we call the method TF dot one-hot given the index of the label we want to encode as well as the total number of categories we have so here get weather and three and then it's going to return that compact array that is our numerical representation of the label alright so now we have our inputs or often as referred to in machine learning our exes and our targets which are also referred to as wise often and now our goal is to take the 512 numbers representing the input and predict that smaller array that's 1 0 0 that represents our particular intent and we're gonna train a model to do that so let's code up a model so this is the entire code for the model it's not too much code but it does a lot of work and as you spend time with this it becomes more and more familiar at the top we see our embedding dims 512 and num classes that represent the size of the input and the output of the model respectively so 512 numbers coming in 3 numbers going out the part that's highlighted is the entire model definition and I won't dwell on this too long but this is a common building block you'll see in neural networks and it's known as a dense network so we start with a sequential model and this network just has one layer and it's the one dense layer and it's job is to convert 512 numbers to 3 numbers that's this entire job the deep learning part will be of take will have been taken care of by the universal sentence encoder finally we compile the model to get it ready for training here we pick an optimizer which is an algorithm that's gonna drive the weight updates during the training process as well as a loss function to tell us how well we're doing and here we're picking one that's commonly used for classification tasks so on to the training loop so model dot fit is the function that actually runs the whole training process and here are calling it with a few parameters X's are our input sentences y's are those targets from our training set and then two extra parameters epochs is a fancy word that refers to the number of times you go through the data set before you're done training and you can set that kind of to whatever you want validation split is a fraction between 0 and 1 and it indicates a portion of the data that we're gonna set aside and not trained with but we can use it to see how well we're doing and making predictions so before we look at the demo itself let's take a quick look at what the code to deploy that train model would look like in the browser and let's just take a quick sample so we'd first load the models in the metadata but for now I just want you to focus on the three lines in the middle so the basic process is that we take the input query from the user we're gonna use the universal sentence encoder that to embed it into that numerical representation then we're gonna call our model with model dot predict to get our final prediction we're gonna call the dot array to get the values out and we're gonna use that to drive our UI and in this case our UI is gonna look something like this and it's about a hundred 50 lines of JavaScript and about a hundred lines of CSS so not terribly large so let's get our hands into it so if you just switch to the demo laptop please all right so we're just gonna sort of take a quick tour of the of the code it's all available on github so here is that spreadsheet that we started with our queries in our intents and I've done a bunch of the pre-processing beforehand so that we didn't have to wait for that people complete so the first step was converting it to tensors and it's just those long lists of 512 numbers JSON isn't the most efficient format to store this but it's quite readable so you can actually just look at what that what a tensor is it's just the long list of numbers we also have some metadata for our model so these are the three classes that we're gonna train and we have about 6,000 sentences that we're gonna learn from were model itself looks pretty much like what we saw in the slides there's no surprises there and finally our script that's gonna run the training its main job it takes a bunch of options but its main job is to load the data and train the model so it's gonna do that with model dot fit as we saw earlier we're gonna wait for that to be complete and then we're going to save that model to disk so that's gonna save a JSON file and a binary file the JSON file contains the sort of structure of the network the binary file contains the weights in an efficient format both of those are loadable in the browser and that's our basic process so let's see what that kind of looks like so here I'm gonna run the training script for this wall also in your intent it's gonna load up load our script and use tensile and CPU and we're off to the races so each one of these lines that it prints is an epoch and that's a trip through the entire data set of about 6,000 sentences and you notice it goes pretty quickly it typically finishes in about 20 seconds and boom nineteen point nine zero today and we've trained a model it's done we can now load that in the browser and use that to make predictions you can look at the the file that's produced it's a really small model file so I'm gonna start this demo app which is here in our app folder and it's just like a client-side JavaScript app and this is really where all the machine learning happens where we make our predictions so this is going to copy the model that we just trained over into the folder for the client setup and launch it in dev mode actually I'm gonna list this a bit higher and now we can try and make predictions so we can ask it like what is the cambridge suite and it responds with a nice cloud for like it has detected a weather query or we can say play the latest and it's correctly classified that as a play music one or even things like put the sick beats on my running list not that I run terribly often but we get the right response back of add to playlist but what happens when you give it something surprising like get me a pizza well it is gonna throw its hands up in shrug and that's actually quite useful you generally don't want your model to do things when it's not the right thing that has been trained to do and how we've set this up is that we set a threshold for confidence and classification it so it should be pretty confident in one of these classes before it takes that action and that's very useful to think of when you're building a machine learning driven app it's it's sometimes good to say I don't know or not this so sweet that's our classifier let's head back to the slides for a bit alright so we built our custom classifier yay and in many instances that might be all that you need it may be the final step of your pipeline or once you have the specific intent you can apply some handwritten rules to extract information and do the rest of your processing however we can train models to do more than just whole sentence classification so given our original query what is the weather in Cambridge Massachusetts we may want to know what's the location related aspect of this sentence and that's what we're gonna look at next so we can reformulate our problem a bit to this so given a sentence we want to tag each word in the sentence with a tag like T okay for generic token or LOC for location we could have other tag types but for now we're just gonna focus on these two and the weather queries yeah like before we need to convert our text into numbers and like before we're going to use the universal sentence encoder to do that we're just gonna give it one word at a time so now each word becomes an array of 512 numbers and in addition we're gonna add these special tokens at the end of our sentences and that's the underscore underscore pad tokens these will have two purposes they'll let us know at the end of the sentences but but more importantly we're gonna add enough of them so that all of our sentences are effectively the same length and this will be useful for us because it gives us a nice rectangular matrix that we can use during training and that's just way more efficient string so this is roughly what our input will look like they'll be sort of enough pad tokens to make everything a given that what about our targets so now we want to predict something for each word and we're gonna use one hot encoding again conveniently we have three categories like before so we have our tok category or loc for a location and the special pad the category that will tell us when we've reached the end of the sentence and we see the one hot encoding scheme just like before so once we've done that for our inputs and our outputs we now have them represented as sequences so now each sentence is a sequence of those arrays of numbers and we need an appropriate model to handle them so you can use a dense Network like before though it advised maybe adding some capacity to that but today we're going to look at a special kind of network known as a recurrent neural network and in particular a special kind of layer known as an LS TM that is geared towards handling sequences so let's take a look at that so here is our new model function first time I do you notice that the start and end of it is pretty similar to what we saw before embedding DIMMs is still 512 and the number of classes is still 3 we still start with a sequential model and the end of it is pretty similar we're going to compile it with the same optimizer and the same loss function so really the meat of it is in the middle so instead of starting with a dense layer we're going to use an LS TM layer and this is a special kind of layer that's it's designed to learn across sequences and here think of each sentence is a sequence so we're gonna configure it set a maximum sequence length and then after that we're gonna do one more special thing we're gonna take the LS TM layer and wrap it in a bi-directional layer to give us a bi-directional LS TM and this is useful because it allows the model to learn context in both directions so you can think of it as reading the sentence left to right and then right to left and trying to learn from that finally we end with a dense Network a dense layer and this is very common in classification problems to end with a dense layer that has your number of output classes the num classes that you see in the in the slide but because of the LS TM stuff we did previously we do have to wrap it in this time distributed layer that's gonna unroll some of the sequence stuff that happened earlier so that's our entire model definition again let's get our hands into it again let's see what that looks like so I'm going to go back to the demo machine suite let me close that and head to the code again I've pre-prepared the data so we can see what that looks like here are here's our input data and it's just the sentences broken up into words with the tag for each word and somewhere in there there's some loc ones so that's our input another thing I've done is I've pre embedded all of the words and just written them to a file so we can just look that up instead of calling the universal sentence encoder each time so with that pre-processing done we can look at our model definition and up sir this is the tagger model and if you look at this on github you'll see that it's a little more involved in what's in the slide and that's because the example we put up allows you to sort two train three different kinds of models you can train a one directional LS TM the bi-directional LS TM we're talking about today or a dense network and that's just to let you compare and see how they behave but other than that it's pretty much the same our training script is very similar the data is a bit bigger so there's a little bit more data management so we call fit data set this time and then we save that to disk just like before so that's the outline of our process and we can run that so I'm just going to yarn Train tagger and you'll see what that looks like so we'll start it and all of this training brother is just using CPU so you don't necessarily need a GPU to do any training but it does speed things up so we've started training and probably the first thing you've noticed is that it's a lot slower than before the data is much bigger this time each word is 512 numbers and the model is more complex so it will take more time to train and I'd say on average it takes somewhere between 10 and 20 minutes depending on what options you pick so we're not gonna wait for the whole thing to speed up the presentation I did train a model last night so we're just gonna use that and look at a demo app that's designed to sort of show you the process so I'm just gonna start that yarn so we copy our model over just like before and start up our front-end application and this is a demo app just designed to give you a sense of what is the pipeline that inputs going through so we can now enter a query like what is the weather in Cambridge MA so the first line is our input sentence that's been tokenized the sort of grid thing is a representation of those 512 numbers from the universal sentence encoder and then below that is our top category that comes out and you can see the Cambridge Massachusetts is nicely classified as being location related one nice thing about these models you can try somewhat more complex ones like what is the weather you know White River Junction Vermont that's a place I have actually been to and it does get it correct we have this longer location related sequence and it's correctly tagged the tokens is belonging to that you'll notice the the confidence on the VT is a little lower if we use the more traditional capitalization so what is though whether in white you're a junction VT you'll notice that the classification score for VT the abbreviation for Vermont goes way up but because we've used a bi-directional LS TM you'll also notice that the words before that their confidence scores go up because it's reading in the context in both directions so that can be super handy another thing that's important to realize is that it's not just memorizing place names so if we try just typing in White River Junction it's not going to detect that as locations or even White River Junction VT and that's because it's learned to find these in the context of these weather related queries that's the training data so it's not just gone and memorized a bunch of the location names for example so it's important to keep in mind like it's really based on what you gave it to train all right and we can switch back to the slides sweep that worked so yeh we've trained intent classifier and a model that can extract information from that identified intent sweet so is it time to ship this into production I would caution against this so you really should take care to first test that your model is robust to different situations and that I will match your users expectations machine learning models are probabilistic and behave differently based on often subtle differences in the training data used so it's super important to have a good set test set including some tricky cases and just validate that with your users to make sure it matches those expectations and Google has a number of great resources online on this topic and I also recommend checking out the designing human centered AI products talk by some of our colleagues in Google pair later this afternoon at i/o so now you've seen a bit of what the workflow looks like to train a model you can check out the full code on github I've included a short link to it here and it's part of our larger repository of examples and next I'm gonna hand it back to Sandy to talk about different ways that tensorflow jess has been used Thanks Thank You Annie so so we saw how the library can be used for using and training machine learning models I want to just take a few minutes and quickly show you a variety of applications of what people are doing with tensorflow Jas we saw earlier that it runs on a bunch of different platforms so these examples will sort of span many of these create ability this is a project that's being developed by the creative labs team in Google and this consists of a set of experiments where they're exploring how to use AI and ml to make these creative tools more accessible these run machine learning models in the browser and in this particular case you see a person controlling a keyboard with head gestures and head motion they have some really cool examples and I encourage you to check out the experiment sandbox which is showing many of these or on the or on the website uber uses machine learning in a very significant way for a wide variety of problems at a very large scale and manifold is a browser-based application that over uses to visualize and debug their machine learning models and sort of data pipelines so this application runs in the browser and they're using tensorflow js4 a lot of numerical computations that they want to use here so for example distance calculations and visualization as well as clustering of data etcetera they were able to find that because of the WebGL acceleration they could accelerate these computations more than 100x compared to just natively using JavaScript Airbnb has an interesting use case where you know their trust team when when a user is trying to upload a profile picture to the Airbnb website sometimes people accidentally use like a driver's license picture or a passport picture which may end up containing personal sensitive information so Airbnb runs a machine learning model directly on your client site in the browser or on device so that if you were to choose a picture which may have some such sensitive information it will alert you before you upload that picture and prevent you from doing that on the server side clinic georgous is a really nice example of an application that near forum has built this is a node.js based application which is used for profiling node jobs or node processes and you can and they're using tensorflow j/s to look for anomalies or spikes in CPU usage or memory consumption of these node applications so it's a really nice example of server-side application of tensorflow J's in the desktop tensorflow J's can be used with electron and magenta studio is a set of plugins that has packaged a collection of machine learning models for music generation that use tensorflow js– and for some very fun and creative music music applications I think magenta has a talk later today that that you might want to check out and they also have some demos in in the sandbox we have the Ableton Live plugin for this in our sandbox that that you can see tensorflow jeaious also runs on mobile platforms on mobile web both on iOS and Android and recently we have been working on adding support for the V chat application so V chat is a very popular social media platform messaging application and it has a mini program environment which lets developers build these small JavaScript applications called mini programs and easily deploy and share them so let's take a quick look at a prototype of what this could look like so in this video what you will see is a game a pacman game that shared as a wee chat mini app and it will let the user control this game using head motion from the phone's camera so you see the V chat application being launched here and then one of my friends has shared this mini program via a link and I just click on it and I launch this game and now this game is a little JavaScript program that's running within the we chat environment after I do a very quick calibration step by looking straight up at the phone I am ready to play this game and so this pac-man game is loaded and you will see in a moment my head motion is driving that little pac-man character so really fun way of interacting with device and the nice thing is that you can do a variety of things using webcams using text using speech and have a very convenient way of sharing these applications without having to install anything so we saw a bunch of examples just want to quickly show you that the community has been building some really interesting applications and use cases beyond all the examples that I've shown you so far and for those of you in the audience who have been using tensorflow js a big thank you we have a collection of these examples that you can check out on our gallery page which are extending tensorflow jeaious and using it for things like reinforcement learning for that self-driving car example or generative models and a variety of other interesting applications we also have a bunch of developers who are building add-on libraries as extensions on top of tensorflow J's and these are extending tensorflow J's in very useful ways for libraries that let you track hand gestures or face facial movement and facial gestures or also do more things like hyper parameter tuning of machine learning models etc so they're bunch of these resources also available on our website so in closing I want to just point out a couple of resources to help you get started there is this new textbook that is coming out it's called deep learning with JavaScript and it's written by our colleagues and the tensorflow Jas team it's an excellent resource for learning about deep learning and machine learning and all examples are written in JavaScript using tensorflow Jas for our audience here and people listening we have a really nice sort of an offer of a discount code so that might find that useful tensorflow recently also launched a new Coursera a new Coursera course with deep learning AI to introduce tensorflow and machine learning and as part of this course we will have a module on tensorflow j/s which we'll be launching in a couple of weeks so just want to point out a few more useful links on our website for our models repo for our gallery and then also we have an office hours session right here in Google i/o tomorrow you can come by and ask your questions and meet the tensorflow jeaious team you can check out many more demos at our demo station in the AI ml sandbox and also there are a few hands-on code labs that you can try interactively in the code labs area so thank you so much for coming out here today and have a great rest of the Iowa [Music]

Leave a Reply

Your email address will not be published. Required fields are marked *