In an effort to boost the adoption of artificial intelligence technologies in enterprise settings, Hewlett Packard Enterprise (HPE) has launched two new artificial intelligence (AI) solutions. One solution is introducing a decentralized machine learning system that allows remote or edge installations to communicate updates to their models, and the other is geared at helping companies develop and train machine learning models at scale.
The first solution, HPE Swarm Learning is a new AI solution from Hewlett Packard Enterprise that accelerates insights at the edge by sharing and unifying AI model learnings without sacrificing data privacy. These insights range from disease diagnosis to credit card fraud detection.
HPE Swarm Learning is the first privacy-preserving decentralized machine learning solution for edge or dispersed sites, created by HPE’s R&D unit, Hewlett Packard Labs. Through the HPE Swarm API, the solution provides users with containers that can be effortlessly incorporated into AI models. Users may then share AI model learnings both within their company and with industry peers to enhance training without having to divulge actual data.
The majority of AI model training now takes place in a single location, using centralized integrated datasets. Due to the necessity to transport huge amounts of data back to the same source, this methodology can be inefficient and costly. It could also be inhibited by data privacy and ownership restrictions and regulations that restrict data exchange and mobility, resulting in inaccurate and biased models. In contrast, companies can make faster choices at the point of impact by training models and using insights at the edge, resulting in improved experiences and results.
HPE Swarm Learning is the only solution that allows enterprises to leverage dispersed data at its source to develop machine learning models that learn fairly while maintaining data governance and privacy. HPE Swarm Learning leverages Blockchain technology to securely enroll members, dynamically elect a leader, and combine model parameters to give robustness and security to the Swarm network, ensuring that only learnings acquired from the Edge are shared, rather than the data itself. In simpler words, HPE Swarm Learning works by establishing a peer-to-peer network between the nodes and ensures that model parameters can be safely transferred.
HPE Swarm Learning is available as part of a containerized Swarm Learning Library that can operate on Docker, within virtual machines, and is hardware agnostic. HPE also mentioned that TigerGraph is already using HPE Swarm Learning in conjunction with their data analytics platform to spot odd behavior in credit card transactions.
Hewlett Packard Enterprise also unveiled the HPE Machine Learning Development System, an end-to-end solution that combines a machine learning software platform, compute, accelerators, and networks to create and train more accurate AI models at a faster and larger scale. The new system relies on HPE’s acquisition of Determined AI to combine its comprehensive ML platform, now formally transitioned as the HPE Machine Learning Development Environment, with the world’s premier AI and HPC solutions. Apart from the HPE Machine Learning Development Environment training platform, it includes container management (Docker), cluster management (HPE Cluster Manager), and Red Hat Enterprise Linux in its software and services stack.
According to the company, customers can accelerate the traditional time-to-value for reaping benefits from developing and training machine models from weeks to days. Adopting infrastructure to support model creation and training at scale has traditionally been a lengthy, multistep procedure. This entails the acquisition, installation, and administration of a highly parallel software ecosystem and infrastructure.
The HPE Machine Learning Development System helps businesses avoid the high costs and complexity of implementing AI infrastructure by providing the only solution that combines software, and specialized computing like accelerators, networks, and services. Thus, allowing businesses to quickly build and train Optimized ML models at scale. In other words, this solution makes it easier for businesses to construct and train machine learning models at scale, allowing them to achieve value faster. With distributed training, automated hyperparameter optimization, and neural architecture search – all of which are fundamental to ML algorithms – it can expand AI model training with minimal code rewrites or infrastructure revisions, and help to increase model accuracy.
The core architecture is built on HPE Apollo 6500 Gen10 server nodes with eight Nvidia A100 80GB GPUs and Nvidia Quantum InfiniBand networking. Up to 4TB of RAM and 30TB of NVMe local scratch storage are available on Apollo nodes, with HPE Parallel File System Storage as an option. To manage the system, there are additional ProLiant DL325 servers that operate as service nodes and are connected to the enterprise network through an Aruba CX 6300M switch.
The platform includes both optimized compute as well as accelerated computing, and interconnectivity, all of which are critical performance drivers for scaling models for a variety of workloads, from a modest configuration of 32 GPUs to a larger configuration of 256 GPUs. The HPE Machine Learning Development System delivers around 90% scaling efficiency for workloads like Natural Language Processing (NLP) and Computer Vision in a modest configuration of 32 GPUs. In addition, internal testing shows that the HPE Machine Learning Development System with 32 GPUs is up to 5.7 times quicker for an NLP task than another offering with 32 similar GPUs but suboptimal interconnect.
The HPE Machine Learning Development System is a fully integrated solution that includes pre-configured and installed AI infrastructure for rapid model development and training. HPE Pointnext Services will provide on-site installation and software configuration as part of the service, allowing users to instantly build and train machine learning models to generate faster, more accurate insights from their data.
HPE also revealed that Aleph Alpha, a German AI company, is using the HPE Machine Learning Development System to train its multimodal AI, which incorporates NLP and computer vision. The models push the boundaries of modern AI for all kinds of language and image-based transformative use cases, such as AI-assistants for the creation of complex texts, higher-level understanding summaries, searching for highly specific information in hundreds of documents, and leveraging of specialized knowledge in a conversational context, by combining image and text processing in five languages with almost human-like context understanding.
Aleph Alpha quickly set up the HPE Machine Learning Development System and began efficiently training in rapid time, merging and monitoring hundreds of GPUs. As per Jonas Andrulis, Founder, and CEO of Aleph Alpha, the company was pleasantly surprised by the HPE Machine Learning Development System’s efficiency and performance of more than 150 teraflops. Aleph Alpha swiftly set up the system and began training its models in hours rather than weeks.
“While running these massive workloads, combined with our ongoing research, being able to rely on an integrated solution for deployment and monitoring makes all the difference,” says Jonas.
Both solutions are currently available to users. Swarm Learning, can also be coupled with the Machine Learning Development System.