Run:ai, an artificial intelligence (AI) compute orchestration company that received US$75 million in March, is collaborating with NVIDIA in an effort to make life simpler for data scientists. To assist businesses in streamlining their AI deployment, recently Run:ai has released advanced model serving functionality. It unveiled updates to its Atlas Platform, such as two-step model deployment, which makes it simpler and quicker to deploy machine learning models.
In the past several years, Run:ai has established a solid reputation by assisting its users in getting the most out of their GPU resources, both on-premises and in the cloud, for model training. It is pretty apparent that developing models and putting them into use (production) are two exclusive things. Unfortunately, the latter is where many AI initiatives still fall short. Major obstacles to using AI in production include configuring a model, integrating it with data and containers, and allocating only the necessary amount of computing. Typically, deploying a model involves manually changing and loading time-consuming YAML configuration files.
Therefore, it should come as no surprise that Run:ai, which views itself as an end-to-end platform, is now going beyond training to enable its customers to operate their inferencing workloads as effectively as possible, whether in a private or public cloud or on edge. With Run:ai’s new two-step deployment method, companies can easily switch between models, optimize for GPU use that is affordable, and make sure that models function effectively in real-world settings.
In its official statement, Run:ai says that running inference workloads in production takes fewer resources than training, which consumes a significant amount of GPU computation and memory. Occasionally companies use CPUs rather than GPUs to run inference workloads, although this might result in increased latency. The end user needs a real-time reaction in many AI use cases, such as identifying a stop sign, using face recognition on the phone, or using voice dictation. These applications may be too unreliable for CPU-based inference.
When GPUs are used for inference tasks, it can result in decreased latency and improved accuracy, but this can be expensive and inefficient if GPUs are not completely used. The model-centric methodology of Run:ai automatically adapts to various workload needs. With Run:ai, it is no longer necessary to use a complete GPU for a single light application, saving money and maintaining low latency.
Another new feature of Run:ai Atlas for inference workloads includes the provision of new inference-focused metrics and dashboards that provide information on the performance and overall health of the AI models currently in use. When feasible, it can even scale installations to zero resources automatically, freeing up precious resources that may be used for other workloads and cutting costs.
As a result of solid cooperation between the two businesses, the company’s platform now also provides an interface with Nvidia’s Triton Inference Server software. As a result, businesses can deploy several models or multiple instances of the same model, and execute them simultaneously within a single container. NVIDIA Triton Inference Server is a component of the NVIDIA AI Enterprise software package, which is fully supported and designed with AI deployment in mind. These features are mainly geared toward assisting enterprises in establishing and utilizing AI models for inference workloads on NVIDIA-accelerated computing so they can deliver precise, real-time replies.