Friday, January 16, 2026
ad
Home Blog Page 295

Hugging Face Launches Optimum to Scale Transformer Models

Hugging Face launches Optimum

Recently Hugging Face launched a new open-source library called Optimum, which aims to democratize the production performance of Machine Learning models. This tool kit also maximizes efficiency to train and run models on specific hardware.

Many data-driven companies like Tesla, Google, Facebook run millions of Transformer model predictions every day to — drive in AutoPilot mode, complete sentences in Gmail, or translate your posts, respectively. As Transformer has brought a capricious improvement in the accuracy of machine learning algorithms, it has vanquished many challenges in NLP and gradually bolstered attempts to expand modalities in areas of Speech and Vision.

Despite such advancements, many machine learning engineers strive to obtain faster running scalable models when brought into production. With Optimum, Hugging Face not only improves Transformer-based models’ performance but also facilitates targeting efficient AI hardware. In addition, the Optimum library helps engineers to leverage the pinnacle of available hardware features with its state-of-the-art AI hardware accelerators.

Transformer-based models can be tricky and expensive as they require a lot of computational power. In order to get optimized performance while training and deploying, the model acceleration methods need to be compatible with targeted hardware. Since each hardware platform offers specific software tooling, it is inevitable to take advantage of advanced model acceleration methods such as sparsity and quantization. But, quantization of a model requires a lot of work as shown below:

  1. The model needs editing: Some operations need to be replaced by their quantized counterparts, new ops required to be inserted, and others need to adapt the weights and activations.
  2. Optimizing quantization: Post-editing, a model involves many parameters to find the best quantization that includes the following questions:
    1. Which kind of observers should be used to calibrate range?
    2. Which quantization scheme could be used?
    3. Which quantization data types (int8, uint8, int16) are supported by your target device?
  3. Tradeoff: When tuning a model, it must have a balance quantization and an acceptable accuracy loss.
  4. Exporting: the quantized model for a target device.

Citing Intel, Hugging Face referred to Low Precision Optimization Tool (LPOT), displaying the approach to solve quantization. LPOT is an open-source python library to help users deploy low-precision inference solutions that support post-training, quantization-aware training, and dynamic quantization. When specifying the quantization approach, objective, and performance criteria, the user needs to provide a configuration YAML file and the tuning parameters. Below is the code that displays how you can quantize, prune, and train transformers for Intel Xeon CPUs with Optimum:

  1. Quantize:
  1. Prune:
  1. Train:

The Hugging Face team mentioned that Optimum would focus on leveraging optimal production performance on dedicated hardware, where software and hardware acceleration methods give maximum efficiency. Further, Hugging Face said, they would work with their hardware partners — Intel, Qualcomm, and Graphcore to scale, train, and maintain acceleration.

Advertisement

AI improves breast cancer detection and reduces false positives

AI improves breast cancer detection

Doctors often use mammograms, MRIs, ultrasound, or biopsy to find or diagnose breast cancer but these methods have a high rate of false-positive findings. Researchers from NYU and NYU Abu Dhabi (NYUAD) have developed a novel AI that’ll improve breast cancer detection in ultrasound images, achieving radiologist-level accuracy. 

The study titled “Artificial intelligence system reduces false-positive findings in the interpretation of breast ultrasound exams,” was published in Nature Communications. The study was led by Farah Shamout, Ph.D., NYUAD assistant professor and emerging scholar of computer engineering and colleagues.

Breast ultrasound has high false-positive rates. The AI system developed by researchers achieves radiologist-level accuracy in classifying the images and identifying breast cancer in ultrasound images. This AI model will also reduce the frequency of false-positive findings. It also localizes the lesions in a weakly supervised manner.

Read more: AI-aided Surveillance Cameras under the ‘Safe Kerala’ Project

NYU Breast Ultrasound Dataset41 was used to develop and evaluate 5,442,907 images within 288,767 breast exams. The images included screening and diagnostic exams collected from 143,203 patients examined between 2012 and 2019 at NYU Langone Health in New York.

This AI detects breast cancer in women by assigning a probability for malignancy and highlighting parts of ultrasound images associated with its predictions. When the researchers conducted a reader study to compare its diagnostic accuracy with board-certified breast radiologists, the system achieved higher accuracy than the ten radiologists on average.

However, a hybrid model that aggregated the predictions of the AI system and radiologists achieved the best results in accurately detecting cancer in patients.

Advertisement

Toyota’s Woven Planet acquires Renovo Motors

woven planet acquires renovo

Toyota’s subsidiary Woven Planet Holdings acquires autonomous vehicle software developing firm Renovo Motors in a recent acquisition deal. No information has been provided by company officials regarding the valuation of this acquisition contract. 

This is Woven Planet’s third acquisition in under twelve months. Woven Planet wants to use Renovo’s expertise in automobility software development to further enhance its open vehicle development platform named Arene. 

Renovo’s elite team of expert engineers will now work with Woven Planet to build a complete software-defined vehicle infrastructure stack. Renovo’s office will still remain in California, but it will be integrated with Woven Planet’s operations. 

Read More: Nanowear Receives FDA Clearance to Implement Artificial Intelligence-based Diagnostics

Senior Vice President of Software Platform at Woven Planet Holdings, Nikos Michalakis, said, “A key part in delivering our ‘Mobility to Love, Safety to Live’ vision is to enable the most programmable vehicles on the planet – opening vehicle programming to everyone by simplifying vehicle software development and increasing deployment frequency without compromising safety and security.” 

He further added that the technological and cultural fir could not have been any better, and he is extremely excited to work with Renovo to develop solutions to power a new era of automobiles. He firmly believes that this acquisition will have a major impact in transforming worldwide mobility. 

San Francisco- based Automated Mobility on Demand systems developing company Renovo was founded by Jason Stinson and Christopher Heiser in the year 2010. The firm specializes in developing operating systems and safety software for autonomous driving vehicles. Renovo had raised over $14.5 million in funding from investors like True Ventures, Verizon, Intact Ventures, and Synapse Partners. 

“We are united around a singular goal to connect the most ubiquitous software and automotive technology in the industry,” said Co-founder and CEO of Renovo, Christopher Heiser. He also mentioned that this acquisition will allow them to do what they had always wanted to but at a global level.

Advertisement

Researchers use AI to predict high-risk zoonotic diseases

AI to predict high-risk zoonotic diseases

About 1.67 million animal viruses can infect humans, and scientists are constantly monitoring the threat of zoonotic diseases. Still, there are millions of viruses circulating among animal populations, making the task incredibly challenging. To increase the chances of identifying the next virus jump from animals to humans, scientists are enlisting the help of sophisticated machine learning algorithms. Scientists used AI to predict high-risk zoonotic diseases so vaccines can protect us from most likely candidates. 

Nardus Mollentze, Simon A. Babayan, and Daniel G. Streicker from the University of Glasgow have published a new proof-of-concept study in PLOS Biology. It suggests AI can predict the likelihood and risk of an animal-infecting virus infecting humans before it triggers global pandemics.

Researchers compiled a database of 861 zoonotic virus species from 36 families to train the machine-learning model. They then applied the ML model to identify genomic patterns with a high risk of jumping to humans. Scientists tested the new ML model’s efficacy by using it to analyze the dangers posed by a group of virus species that weren’t a part of the training dataset.

Read moreNanowear Receives FDA Clearance to Implement Artificial Intelligence-based Diagnostics

The model reduced the second set of 645 animal-associated viruses to 272 high and 41 very high-risk candidates that were not a part of the training set. The team found that some zoonotic virus genomes have identifiable genetic features that enable them to jump to humans.

The ML model exhibited genomic patterns that are more predictive of the potential for human infection than a virus species’ taxonomic relationships. The model even successfully identified SARS-CoV-2, a virus that triggered the global pandemic, as a “relatively high-risk coronavirus” without prior knowledge of other SARS-related coronaviruses.

Advertisement

Nanowear Receives FDA Clearance to Implement Artificial Intelligence-based Diagnostics

nanowear FDA clearance artificial intelligence

Nanosensor technology developing company Nanowear received Food and Drug Administration (FDA) 510(K) platform clearance to implement its new artificial intelligence-based diagnostics in its closed-loop ‘Hospital at Home’ network. 

Nanowear’s artificial intelligence technology named SimpleSense is a noninvasive wearable vest that can be used to track and monitor various patient data, including heart rate, respiration rate, physical activity, blood pressure, and lots more. 

The gathered data is uploaded on SimpleSense servers that analyze the data in real-time using artificial intelligence algorithms to generate diagnostic reports. This technology will help physicians and doctors in making better-informed decisions regarding patient treatments. 

Read More: Building Flexible Autonomy: Boston Dynamics releases Spot 3.0 Update

Director of Research at Google, Peter Norvig, said, “When time-synchronously tracking a complex system like a patient’s heart, lungs and upper vascular system, the data captured needs to match the system’s level of complexity. Nanowear’s 85+ biomarkers of high-fidelity data within a closed loop system enables unique AI and deep learning algorithms to ensure that need is met.” 

He further added that artificial intelligence technologies and tremendous ability to aid doctors, other healthcare professionals, and payers in tracking complex biological systems and individual risk patterns in the human body. 

Nanowear’s SimpleSense uses clinical-grade biomarkers that have a high Signal to Noise Ratio(SNR), which helps the platform to generate reports in a short time when compared to other smart wearable devices. 

“The largest bottleneck for any AI deployment is data preparation, which can take 70% – 80% of the time in any given application,” said the CEO and Co-founder of Nanowear, Verk Varadan. 

United States-based nanosensor manufacturing firm Nanowear was founded by Venk Varadan and Vivek K Varadan in the year 2014. The company specializes in developing cloth-based smart nanochip solutions for the healthcare industry. Nanowear has raised a total funding of $1.5 million to date from investors like MedTech Innovator, MAS Holdings, and Social Capital over two funding rounds.

Advertisement

AI-aided Surveillance Cameras under the ‘Safe Kerala’ Project

AI-aided Surveillance Cameras

Chief Minister Pinarayi Vijayan launched the ‘Safe Kerala’ project in February 2019 to ensure safe traffic on the state’s roads. Various schemes have been rolled out under the ‘Safe Kerala’ initiative, including e-challans for payment of fines by traffic law offenders, e-vehicles for law enforcement duties, and AI-aided Surveillance Cameras.

Today, P. Rajeeve, Minister for Industries, Law, and Coir in the Government of Kerala, will launch the first batch of 100 AI-aided surveillance cameras. Keltron manufactured these AI-aided surveillance cameras for ‘safe Kerala’ project, and the launch will take place at the Keltron communication complex in Manvila on Wednesday at 2.30 pm. 

Antony Raju, the Minister for Transport in the Government of Kerala, will receive the cameras on behalf of the Motor Vehicle Department (MVD) during the occasion. According to Keltron authorities, they have manufactured the AI-aided surveillance cameras in association with a technology partner with the feature of automatic number plate recognition (ANPR). Keltron’s unit at Manvila has done the cameras’ manufacturing, assembling, and testing. 

Read more: Astera Labs raises $50 million in Series C Funding Round

Keltron will manufacture and supply 726 traffic enforcement gadgets to the MVD for the ‘Safe Kerala’ project. The traffic enforcement gadgets include four-speed violation detection cameras, 18 red light violation detection cameras, four mobile speed enforcement systems, and 700 AI cameras for the Safe Kerala project. The entire production of traffic enforcement gadgets is being carried out at a budget of Rs 235 crore. 

Keltron will also be responsible for the management of the gadgets for the initial five years. Out of the total Rs 235 crore budget, Rs 70 crore is set aside for annual maintenance. The safe Kerala project is a traffic enforcement project given shape by the state government and the MVD to reduce road accidents and related deaths by increasing enforcement efficiency with the help of the latest technology. Besides efficient enforcement, the Safe Kerala project also aims at providing necessary training to drivers to impart safe driving skills and thus make the roads safer.

Advertisement

Building Flexible Autonomy: Boston Dynamics releases Spot 3.0 Update

spot 3.0 release update, boston dynamics
Image Source: Boston Dynamics

The Spot robot, formerly known as SpotMini, is a four-legged robot created by Boston Dynamics, an American robotics company. Founded in 1992 as a spin-off from the Massachusetts Institute of Technology, Boston Dynamics is currently owned by the Hyundai Motor Group). Recently, Boston Dynamics has unveiled Spot Release 3.0 for its quadruped robot, reflecting more than a year of software advancements over Release 2.0.

While the military financed Boston Dynamics’ initial research, the company has worked hard to disassociate itself from that legacy as it begins to sell its advanced robot creations. Spot has already been seen performing significantly more innocuous duties in recent video releases from the company, such as dancing, gardening, and skipping.

The latest version focuses on allowing Spot to complete tasks without the need for human participation, pushing the limits of automation. The highlights of Release 3.0 include autonomous dynamic replanning, cloud integration, some creative camera tricks, and a new ability to handle push-bar doors.

The Spot robot is now the data collecting solution companies may need to make inspection rounds safer and more efficient, thanks to the latest update’s customizable autonomy and repeatable data capture. Spot’s automated inspections have been simplified to allow for more efficient data collection and processing. It can be programmed to perform a variety of jobs, including gathering photographs, thermal images, point clouds, and other essential data which will be later processed into valuable signals at the edge using computer vision models, and developing bespoke uploads to communicate those signals to existing systems for analysis and review.

Spot Release 3.0 additionally improves Spot’s Autowalk feature by allowing for better planning. Autowalk, a mechanism that will enable the robot to record and repeat paths, is one of Spot’s inherent functions. Here, using the remote controller interface, an operator guides the robot along the path. The robot remembers the path and can repeat it when instructed. In industrial facilities, mines, factories, and building sites, Autowalk can be employed for inspection missions.

The recent update enhances Autowalk, eliminating the need for human intervention and supervision. Autowalk missions can now be edited by robot operators, who can add tasks like collecting photographs, reading indicators, or running third-party code. Spot’s planning abilities have also been improved, and he can now find the optimum path to complete specific actions. Its ability to respond to changes in its inspection paths, such as additional obstructions, has also been improved. It can also be programmed to do scheduled checks without human intervention during off-peak hours.

Boston Dynamics have also improved Spot’s data collection and processing capabilities. This includes the ability to take images from the same angle during Autowalk cycles and have them processed in the cloud using AI analysis (e.g., Tensorflow) and reviewed for potential critical signals, which can then be sent to existing inspection systems. Here, the images can be captured from the same angle every time with scene-based camera alignment for the Spot CAM+ pan-tilt-zoom (PTZ) camera.

Read More : AI Robot CIMON-2 to be deployed at International Space Station

Next, Spot has another interesting feature in version 3.0 that allows it to avoid colliding with its human coworkers: it can now make a freely customizable warning sound. It can also dynamically re-plan its routes if an unforeseen impediment appears in the middle of a tour.

Spots may even build tasks and attach them to specific traits, like going to a machine every day and inspecting it with a thermal image sensor. Spot can determine whether a particular object has gone amiss and send this information using scene recognition and image analysis.

Improved compatibility with Microsoft, Amazon, and IBM cloud services is another prominent feature of the latest update. The sensing capabilities of Spot can be used as a substitute for manual data logging, IoT instrumentation, and the installation of smart sensors on existing infrastructure. This functionality allows data collected during Spot’s Autowalk to be automatically integrated into a larger data-based workflow of enterprises.

While Spot is a well-known robot, new competitors have emerged in recent years, attempting to steal Spot’s thunder. After an unfortunate paintball incident at a US art installation – though Boston Dynamics condemned the event – the latest Spot 3.0 update may enable it to achieve new feats and create trust in the canine robot.

Advertisement

Astera Labs raises $50 million in Series C Funding Round

Astera Labs Funding round

Semiconductor developing company Astera Labs raises $50 million in its series C funding round led by Fidelity Management and Research. Other investors like Valor Equity Partners, Atreides Management, Avigdor Willenz Group, GlobalLink Capital, VentureTech Alliance, and Sutter Hill Ventures also participated in the funding round. 

Astera Labs wants to use this fresh funding to further expand their product range and increase market share. The company also plans to run extensive recruitment drives in the United States and Asia to hire new talents for boosting its operations and product development process. 

The new funding round follows the successful launch of Astera Labs’s Compute Express Link 2.0 and PCI Express 5.0 that enables users to optimize workflow in the cloud. 

Read More: Sydney Trains are using Artificial Intelligence to detect Trespassers

The CEO of Astera Labs, Jitendra Mohan, said, “With this investment and increased collaboration with our manufacturing partners, we will rapidly scale our worldwide operations to satisfy incredible customer demand and launch multiple new product lines to solve the industry’s most pressing connectivity challenges.” 

He further added that he is extremely excited to join hands with the investors to maintain their leading position in the intelligent cloud connectivity domain. 

San Francisco-based startup Astera Labs was founded by Jitendra Mohan, Casey Morrison, and Gajendra Akkasalamakki in the year 2017. The company specializes in developing purpose-built cloud connectivity solutions for data-centric systems. 

Astera Labs has developed a lot of products, including integrated circuits, services to enable CXL/PCIe connectivity, and many more. The startup has raised a total funding of $56.4 over three funding rounds till date. 

“We are leading the industry with design wins at the five most significant CPU/GPU/AI processor platforms in the world and the majority of Cloud customers,” said the Co-founder and Chief Business Officer of Astera Labs Gajendra Akkasakamakki. 

According to the founding investor of Astera Labs, Avigdor Willenz, the company might also plan for its initial public offering (IPO) in the near future.

Advertisement

Hanwha Techwin launches Artificial Intelligence-powered X Series Cameras

Hanwha Techwin Artificial Intelligence x series cameras

Surveillance solutions developing firm Hanwha Techwin launches its all-new artificial intelligence-powered camera series named X-core AI and X-pulse AI. Hanwha has integrated its new cameras with a high-end artificial intelligence system that helps cameras in increasing object deception capabilities in order to reduce false alarms. 

The cameras are equipped with world-leading business intelligence technology and use AI solutions to improve image quality considerably. Hanwha has used a unique H.265 compression technology that enables users to reduce data consumption to up to 80% depending upon certain environmental conditions. 

The X series camera features license-free artificial intelligence analytics tools that optimize operational efficiency in real-time and recorded video searches. Senior Vice President of Products, Solutions, and Integrations at Hanwha Techwin, Ray Cooke, said, “Bringing AI to the Wisenet X series line represents a major leap forward in features and capabilities for our customers and resellers.” 

Read More: California makes zero-emission Autonomous Vehicles mandatory by 2030

He further added that with the launch of X series cameras, the technology would be available to a broader customer base as they continue to be industry leaders in the surveillance domain. Hanwha has integrated WiseIR technology in X series cameras that help them to adjust the output of IR LEDs depending upon cameras’ zoom magnification.

Hanwha said, “WiseIR technology adjusts the output of the IR LEDs according to the cameras’ zoom magnification.” The cameras have a modular design with magnetic mounts for camera modules, making them easier for users to install. They also have a video capturing ability of up to 120fps, which results in smooth video outputs. 

New Jersey-based surveillance, aeronautics, optoelectronics, automation, and weapon technology developing enterprise Hanwha Techwin, formerly known as Samsung Techwin, was founded in 1977. Since the company’s establishment, it has brought in many cutting-edge innovations in the surveillance technology sector.

Advertisement

Google Introduces new world model Pathdreamer for Indoor Navigation

google pathdreamer by google ai
Source: Scientific American

When navigating in a new environment, humans generally rely on visual, spatial and semantic cues that can assist them in getting to their destination quickly. Suppose you are invited as a guest to a new house of your friend, you can make sensible predictions about the placement of goods in their respective rooms or depend on visual cues to locate the living room in the house.

This may be quite challenging for robotic agents trying to perform similar navigation around a given space.

The most common method leveraged is model-free reinforcement learning to learn implicitly what these cues are and how to apply them for navigation tasks in an end-to-end approach. However the problem with this is that, navigation cues learned via reinforcement learning are too costly, difficult to examine, and impossible to re-use in another agent without starting again.

A world model, which encapsulates rich and relevant information about their surroundings and allows an agent to make explicit predictions about actionable events within their environment, is an intriguing alternative for robotic navigation and planning agents. With outstanding results, such models have sparked significant interest in robotics, simulation, and reinforcement learning, including the discovery of the first known solution for a simulated 2D automobile racing problem and human-level performance in Atari games. However, when compared to the complexity and diversity of real-world landscapes, gaming worlds are still very simple.

Pathdreamer is a novel world model announced recently by Google AI that creates high-resolution 360 degrees visual views of regions of a building unseen by an agent, using just limited seed observations and a suggested navigation route. By constructing an immersive scene from a single point of view, Pathdreamer predicts what an agent would see if it moved to a new point of view or even to an area that had previously been unseen, such as around a corner. This approach can also aid autonomous entities in navigating the real world by encoding information about human environments.

By increasing the amount of training data for agents, these world models can help in training agents in the model.

Pathdreamer synthesizes high resolution 360º observations up to 6-7 meters away from the original location, including around corners. Source: Google. For more results, please refer to the full video.
To generate a new observation, Pathdreamer ‘moves’ through the point cloud to the new location and uses the re-projected point cloud image for guidance. Source: Google

Pathdreamer takes a series of previous observations as input and provides predictions for a trajectory of future locations, which the agent engaging with the returned observations can supply either upfront or iteratively. Both the inputs and the predictions employ RGB, semantic segmentation, and depth images. Pathdreamer internally uses a 3D point cloud to represent surfaces in the world. Each point in the cloud is labelled with the RGB colour value as well as the semantic segmentation class, such as wall, chair, or table.

The point cloud is initially re-projected into 2D in the new site to give ‘guidance’ pictures for predicting visual observations in a new location. Next, Pathdreamer uses these photos to produce realistic high-resolution RGB, semantic segmentation, and depth. New observations (actual or projected) are gathered in the point cloud as the model ‘moves.’ The use of a point cloud for memory has the advantage of temporal consistency: revisiting locations are displayed in the same way as earlier observations.

Converting guiding pictures into convincing, realistic results entails two distinct stages by Pathdreamer — both of which are powered by convolutional neural networks. The structure generator (a stochastic encoder-decoder) generates segmentation and depth pictures in the first step, and the image generator (image-to-image translation GAN) converts them into RGB outputs in the second stage. The first step construes a feasible high-level semantic description of the scene, which is then rendered into a realistic color image by the second stage.

The Google AI team used Matterport 3D RGB panoramas as training targets with a resolution of 1024×512 pixels, and the Habitat simulator to produce ground-truth depth and semantic training inputs and assemble them into equirectangular panoramas for the Image Generator. Due to the restricted amount of panoramas available, the team attempted data augmentation by randomly cutting and horizontally rolling the RGB panoramas. They also employed Habitat to produce depth and semantic pictures to train the Structure Generator. They accomplished data augmentation by perturbing the viewpoint coordinates with a random Gaussian noise vector, since this stage does not require aligned RGB pictures for training.

Pathdreamer operates in two stages: the first stage, the structure generator, creates segmentation and depth images, and the second stage, the image generator, renders these into RGB outputs. The structure generator is conditioned on a noise variable to enable the model to synthesize diverse scenes in areas of high uncertainty. Source: Google

In regions of great uncertainty, such as a location presumed to be around the corner or in an undiscovered chamber, a variety of situations are available. Pathdreamer’s structure generator is conditioned on a noise variable, which represents the stochastic information about the next location that is not recorded in the guiding pictures, using concepts from stochastic video generation. Pathdreamer can create various sceneries by sampling numerous noise factors, allowing an agent to test several conceivable outcomes for a given route. Not only in the first stage outputs (semantic segmentation and depth pictures), but also in the produced RGB images, these different outputs are represented.

Read More: Google TensorFlow Similarity: What’s New about this Python Library?

Finally, the Google AI team tested if Pathdreamer predictions could help with a downstream visual navigation task. Using the R2R dataset, they focused on Vision-and-Language Navigation (VLN) because achieving the navigation goal necessitates properly anchoring natural language commands to visual data, making task-based prediction quality assessment tricky.

The main characteristic of R2R (Room to Room) dataset is that here the agent prepares ahead by simulating various alternative passable paths through a given environment and ranks each against the navigation instructions provided to find the optimal path.

The researchers examined the following three situations as part of the experiment:

  1. Ground-Truth setting in which the agent plans by interacting with the real world, i.e. through moving.
  2. The baseline setting where the agent prepares forward but does not move. Instead, it interacts with a navigation graph, which records the building’s accessible paths but provides no visual observations.
  3. Pathdreamer setting in which the agent interacts with the navigation graph and receives suitable visual observations produced by Pathdreamer to plan ahead without moving.

According to the Google AI researchers, when the agent prepares ahead for three steps in the Pathdreamer mode, it has a 50.4 percent navigation success rate. This result is much higher than the Baseline setting’s 40.6 percent success rate excluding Pathdreamer. This indicates that Pathdreamer stores visual, spatial, and semantic information about real-world interior locations in a meaningful and accessible manner. In the Ground-Truth setting, the agent has a 59 percent success rate.

Source: Google

The team, however, points out that in this case, the agent needs to invest substantial time and resources to physically explore a huge number of paths, which would be prohibitively expensive in a real-world scenario.

At present the team envisions Pathdreamer will be used for more embodied navigation tasks, such as Object-Nav, continuous VLN, and street-level navigation.

Read more here.

Advertisement