Artificial intelligence and machine learning (AI and ML) are key technologies that help organizations develop new ways to increase sales, reduce costs, optimize business processes, and better understand their customers. AWS helps customers accelerate their AI / ML adoption by offering powerful computing, high-speed networking, and scalable, high-performance storage options on demand for any machine learning project. This lowers the barrier to entry for organizations looking to adopt the cloud to scale their machine learning applications.
Developers and data scientists are pushing the limits of technology and increasingly adopting deep learning, which is a type of machine learning based on neural network algorithms. These deep learning models are larger and more sophisticated, resulting in increasing costs to run the underlying infrastructure to train and implement these models.
To enable customers to accelerate their AI / ML transformation, AWS is building high-performance, low-cost machine learning chips. AWS Inferentia is the first machine learning chip built from scratch by AWS for lower cost machine learning inference in the cloud. In fact, Amazon EC2 Inf1 instances powered by Inferentia offer 2.3x higher performance and up to 70% lower cost for machine learning inference than current-generation GPU-based EC2 instances. AWS Trainium is AWS’s second machine learning chip that is specifically designed for training deep learning models and will be available in late 2021.
Customers across all industries have deployed their machine learning applications in production at Inferentia and have seen significant performance improvements and cost savings. For example, AirBnB’s customer support platform enables smart, scalable and exceptional service experiences to its community of millions of hosts and guests around the world. It used Inferentia-based EC2 Inf1 instances to implement Natural Language Processing (NLP) models that its chatbots supported. This led to a two-fold improvement in performance out of the box compared to GPU-based instances.
With these silicon innovations, AWS enables customers to easily train and run their deep learning models in production with high performance and throughput at significantly lower costs.
Machine learning challenges accelerate the shift to cloud-based infrastructure
Machine learning is an iterative process that requires teams to quickly create, train, and deploy applications, as well as frequent training, retraining, and experimentation to increase the prediction accuracy of models. By implementing trained models in their business applications, organizations must also scale their applications to serve new users around the world. They must be able to service multiple requests arriving at the same time with near real-time latency to ensure a superior user experience.
Emerging use cases such as object detection, natural language processing (NLP), image classification, conversational artificial intelligence, and time series data are built on deep learning technology. Deep learning models are increasing exponentially in size and complexity, going from having millions of parameters to billions in a matter of a couple of years.
Training and implementing these complex and sophisticated models translates into significant infrastructure costs. Costs can quickly grow to be prohibitively large as organizations scale their applications to deliver near real-time experiences to their users and customers.
This is where cloud-based machine learning infrastructure services can help. The cloud provides on-demand access to computing, high-performance networks and large data storage, seamlessly combined with ML operations and higher-level AI services, to enable organizations to get started immediately and scale their initiatives. by IA / ML.
How AWS is helping customers accelerate their AI / ML transformation
AWS Inferentia and AWS Trainium aim to democratize machine learning and make it accessible to developers regardless of organization size and experience. Inferentia’s design is optimized for high throughput, throughput, and low latency, making it ideal for implementing ML inference at scale.
Each AWS Inferentia chip contains four NeuronCores that implement a high-performance systolic matrix matrix multiplication engine, greatly accelerating typical deep learning operations such as convolution and transformers. NeuronCores is also equipped with a large on-chip cache, which helps reduce external memory accesses, reducing latency and increasing performance.
AWS Neuron, the software development kit for Inferentia, is natively compatible with major ML frameworks such as TensorFlow and PyTorch. Developers can continue to use the same lifecycle development frameworks and tools they know and love. For many of your trained models, you can compile and implement them in Inferentia by changing just one line of code, with no additional changes to the application code.
The result is a high-performance inference implementation, which you can easily scale while keeping costs under control.
Sprinklr, a software as a service company, has a unified customer experience management platform powered by AI that enables companies to collect and translate customer feedback in real time across multiple channels into actionable insights. This results in proactive problem solving, better product development, better content marketing, and better customer service. Sprinklr used Inferentia to implement his NLP and some of his computer vision models and saw significant improvements in performance.
Several Amazon services also implement their machine learning models in Inferentia.
Amazon Prime Video uses computer vision machine learning models to analyze the video quality of live events to ensure an optimal viewer experience for Prime Video members. It deployed its image classification ML models on EC2 Inf1 instances and saw a 4x performance improvement and up to 40% cost savings compared to GPU-based instances.
Another example is Amazon Alexa’s artificial intelligence and ML, powered by Amazon Web Services, which is available on more than 100 million devices today. The promise of Alexa to customers is that you always get smarter, more conversational, more proactive, and even more personable. Delivering on that promise requires continual improvements in machine learning infrastructure response times and costs. By deploying Alexa’s text-to-speech machine learning models on Inf1 instances, he was able to reduce inference latency by 25% and cost per inference by 30% to improve the service experience for tens of millions of customers using Alexa every month.
Unleashing New Machine Learning Capabilities in the Cloud
As companies rush to future-proof their business by enabling the best digital products and services, no organization can be left behind in implementing sophisticated machine learning models to help innovate their customer experiences. In recent years, there has been a huge increase in the applicability of machine learning for a variety of use cases, from personalization and churn prediction to fraud detection and supply chain prediction.
Fortunately, machine learning infrastructure in the cloud is unlocking new capabilities that were not possible before, making it much more accessible to non-expert professionals. That’s why AWS customers are already using Amazon EC2 Inf1 instances powered by Inferentia to provide the intelligence behind their recommendation engines and chatbots and to get actionable insights from customer feedback.
With AWS cloud-based machine learning infrastructure options suitable for various skill levels, it is clear that any organization can accelerate innovation and span the entire machine learning lifecycle at scale. As machine learning continues to become mainstream, organizations can now fundamentally transform the customer experience – and the way they do business – with a cost-effective, high-performance, cloud-based machine learning infrastructure.
Learn more about how the AWS machine learning platform can help your business innovate here.
This content was produced by AWS. It was not written by the editorial staff of MIT Technology Review.