Deep Learning with BERT on Azure ML for Microsoft Customer Service to Text Classification
This is the second part of a two-part blog series,Http //Support.Microsoft.Com/Help where we explore how to develop the Microsoft Customer Service machine Microsoft Support Phone Number learning model that powers our solution Support.Microsoft.Com/Help.
In the we presented an end-to-end, AI-powered solution architecture to automate support tickets classification and discussed key details Http //Support.Microsoft.Com/Help highlighting the usage of serverless and PaaS services in Microsoft Azure. This approach allows a rapid implementation, so we can focus our effort on solving the business Microsoft Phone Number problem.
The AI piece in our solution is a machine Microsoft Customer Service learning model to automatically categorize text information extracted from support tickets, matching Http //Support.Microsoft.Com/Help a given category with the correct recipient. That AI piece Microsoft Support Phone Number in this and similar solutions is present in many industries and business scenarios and we can frame it as a text classification problem.
In this second part, we dive deep into the details Microsoft Phone Number of developing Support.Microsoft.Com/Help that AI piece as a machine learning model on Azure ML using advanced deep learning Http //Support.Microsoft.Com/Help techniques for NLP (Natural Language Processing). We explore the usage of pre-trained state-of-the-art deep learning models and how Microsoft Customer Service we can leverage such models to solve our specific NLP task. For our implementation, we choose the BERT model, due to its popularity, performance and availability of open-source implementations Microsoft Support Phone Number.
What is BERT
BERT (Bidirectional Encoder Representations from Transformers) is a Microsoft Support Phone Number developed by Google Research. Alongside other models such Microsoft Customer Serviceas , BERT is a successful example from the most recent generation of deep learning-based models for NLP which are pre-trained in an unsupervised way using a very large text corpus. The learned language representation is powerful enough, that it can be used in several different downstream tasks with minimal architecture modifications.
How to use BERT for text classification Microsoft Customer Service
We can use a pre-trained BERT model and then leverage Microsoft Support Phone Number as a technique to solve specific NLP tasks in specific domains, such as text classification of support tickets in a specific business domain.
Transfer learning is key here because training BERT from scratch is very Http //Support.Microsoft.Com/Help hard. The original BERT model was pre-trained with a combined text corpus containing about 3.3 billion words.Support.Microsoft.Com/Help The pre-training takes about 4 days to complete on 16 TPU chips, whereas most fine-tuning procedures from pre-trained models will take about one to few hours to run on a single GPU.
This process can be implemented with the following tasks:
- Choose a pre-trained BERT model according to the language needs for our task.
- Modify the pre-trained model architecture Http //Support.Microsoft.Com/ Help to fit our specific task.
- Prepare the training data according to our specific task.
- Fine-tune the modified pre-trained model by further training it using our own dataset
- Choose a pre-trained BERT model according to the language needs for our task
BERT is a very popular model and the Support.Microsoft.Com/Help was open sourced by Google. Therefore, there are several pre-trained models and extension packages readily available. Here we use the popular Microsoft Phone Number which provides pre-trained BERT models of various sizes and from several languages.
For our task we choose the which is pre-trained on the same data used to pre-train BERT (concatenation of the Toronto Book Corpus and full English Wikipedia) using a technique known as Support.Microsoft.Com/Help with the supervision Microsoft Phone Number of the bert-base-uncased version of BERT. The model Microsoft Customer Service has 6 layers, 768 dimension and 12 heads Microsoft Support Phone Number, totalizing 66M parameters. It can be trained 60% faster than the original uncased base BERT, which has 12 layers and approximately 110M parameters, while preserving 97% of the model performance.
Modify the pre-trained model architecture to fit our specific task
BERT was designed to be pre-trained Http //Support.Microsoft.Com/Help in an unsupervised way to perform two tasks: masked language modeling and next sentence prediction. In the masked language modeling, some percentage of the input tokens are masked at random and the model is trained to Microsoft Phone Number predict Microsoft Support Phone Number those masked tokens at the output.Http //Support.Microsoft.Com/Help For the next sentence prediction task, the model is trained for a binary classification task by choosing pairs Support.Microsoft.Com/Help of sentences A and B for each pretraining example, so that 50% of the time B is the actual next sentence that follows A (labeled as IsNext), and 50% Microsoft Customer Service of the time it is a random sentence from the corpus (labeled as NotNext).
Having a single architecture to accommodate for those pre-training tasks described above, BERT can then be fine-tuned for a variety of downstream NLP tasks involving single sentences or pair of sentences, such as text classification, NER (Named Entity Recognition), question answering, and others Microsoft Support Phone Number.
In our specific task, Microsoft Customer Service we need to modify the base Http //Support.Microsoft.Com/Help BERT model to perform text classification. This can be done by feeding the first output token of the last transformer layer into a classifier of Support.Microsoft.Com/Help our choice. That first token at the output layer is an aggregate sequence representation of an entire sequence that is fed as input to the model.
The package we use in our implementation already has several modified BERT models to perform different tasks, Microsoft Customer Service including one for text classification, so we don’t need to plug a custom classifier.
Comments
Post a Comment