Thursday, November 28, 2024
ad
Home Blog Page 336

Uber AI Says You Can Increase Task Completion If You Are Polite With Virtual Agents

Uber AI

The Uber AI researchers published an overview of a deep learning framework addressing customer engagement with ‘polite and positive’ assistants. The task-oriented conversational agents like Alexa, Siri, Google’s assistant fulfill tasks like booking cabs by conversing with users and retrieving information (current location, destination, and type of cab) from them.

As users, we love to engage with such virtual entities when we see them generate proper interpersonal responses and build an emotional connection with us. Consequently, Uber’s AI team came up with ways to make the assistants use appropriate social language (language interpreted differently in different social contexts). They examined the relationship of customer service representatives’ use of social language to drivers’ responsiveness and the completion of their first trip, based on an analysis of driver and human-agent conversations. 

Interestingly, scientists defined ‘Politeness’ as a strategy to avoid feeling awkward or embarrassed when the social distance between two parties is enormous. They trained an SVM classifier on a corpus containing domain-independent lexical and syntactic features of politeness with a politeness label.

Similarly, positivity, defined as “the quality or state of being positive,” was evaluated by VADER, a rule-based sentiment analyzer. They found that these social language norms, like politeness and positivity used by human agents, are associated with greater users’ responsiveness and task completion.

This paper proposes, for the first time, an end-to-end deep learning framework for task-oriented dialogue generation that jointly understands input utterances and generates output utterances infused with social language aimed at particular task completion. The model addresses varying meanings of positivity and politeness according to context over time by taking into account the conversation context, completed tasks, and generates language with the desired content and social language norms.

A seq2seq model, built upon an architecture inspired by Huber et al., was modified by introducing a layer with a social language understanding component. The politeness and positivity features are extracted from responses using the pre-trained classifiers — SVM and VADER.

In terms of content preservation and social language level, the model was evaluated using both human judgment and automatic linguistic measures. And it was found that the model can generate responses that enable agents to address users’ issues in a more socially appropriate way.

Advertisement

IIsc Invites For Its New Deep Learning Specialisation

Deep Learning Specialisation IISc

Indian Institute of Science (IISc), currently the best university in India, has announced a masters’ level deep learning specialization in tie-up with TalentSprint. Faculty members from IISc and TalentSprint will jointly teach and mentor participants.

The Centre for Continuing Education wing of IISc offers this 10-month executive program, structures for machine learning enthusiasts and industry practitioners alike. The aim is to train the workforce in deep learning and develop applications in domains like text, video, speech, image, and more.

The program consists of live online faculty-led interactive sessions, curated capstone projects and hackathons, mentorship, case studies, and campus visit. The syllabus currently consists of a Bridge Course (Programming and Mathematical Preliminaries) of 12 Hrs, Mathematical Foundations and Data Visualization of 44 Hrs, Paradigms of Machine Learning of 16 Hrs, Deep Learning and its Applications for 80 Hrs, and Deploying AI Systems for 8 Hrs. 

Deep learning is increasingly being used to extract valuable insights from enormous amounts of data, build innovative products and improve customer experience, thereby enhancing revenue opportunities. This has led to a massive growth in need for professionals with expertise in Deep Learning. This program will fulfill that need. Our team of research faculty will teach and mentor participants and help them build expertise in both the fundamentals and applications of Deep Learning,” said Prof Chiranjib Bhattacharya, chair of the department of computer science and automation, and dean of the advanced deep learning program at IISc.

The program also offers hands-on projects like brain tumor detection, fraud detection, expression identification, and more. At the end of the program, the participants will possess a portfolio that demonstrates their mastery and connect with experts at the forefront of deep learning practices. For entrepreneurs, the program has provisions to boost startup ideas with professional mentorship. 

The enrolments have begun for the first batch (consisting of 50 participants or more), and the classes will start in March 2021. The program expects some form of coding efficacy and sets bachelors’ as the minimum eligibility requirements with a work experience of 1 year. 

Advertisement

Facebook Releases Code Of Its State-Of-The-Art Voice Separation Model

Facebook Voice Seperation Model

Facebook researchers open-sourced code of their work, “Voice Separation with an Unknown Number of Multiple Speakers.” Suppose, there is only one mic and there are multiple speakers, talking simultaneously. Can you separate the voices? For a human, it is easy. But, in the case of a machine, how do you do that? 

The single-source multiple-speaker voice-separation paper answers the question. It extends the state-of-the-art voice separation task to five persons, which were previously limited to two persons. Independent Component Analysis mostly addressed this task in the past. However, with the recent advent of deep learning, it is now possible to separate mixed audio containing multiple unseen speakers.

The main contributions as listed by authors are: 

  1. A novel audio separation model that employs a specific RNN architecture, 
  2. a set of losses for effective training of voice separation networks, 
  3. performing effective model selection in the context of voice separation with an unknown number of speakers,
  4. state of the art results that show a sizable improvement over the current state-of-the-art in an active and competitive domain.

Also Read: Computer Vision Has A New DeIT By Facebook

Previous methods were actually trained using masks for each voice. But this paper introduced a novel mask-free approach. In voice separation, two subtasks exist innately. First, improve the signal quality while screening out the noises and second, identify the speaker to maintain the continuity in the voice sequence. 

The author used utterance level permutation invariant training (uPIT) loss for the first subtask and mean squared error between the L2 distance between the network embeddings of the predicted audio channel and the corresponding source.

To avoid biases arising from the distribution of data and to promote solutions in which the separation models are not detached from the selection process, model selection was based on an activity detection algorithm.

Starting from the model that was trained on the dataset with the largest number of speakers — C, speech detectors are applied to each output channel. If it detects silence (no-activity) in any of the channels, it moves to the model with C − 1 output channels and repeats the process until all output channels contain speech.

Advertisement

Intel’s RealSense ID Now Offers Facial Recognition With Higher Accuracy

Intel RealSense

Intel is now offering RealSense technology to customers for facial recognition under the purview of RealSense ID. Complemented by LIDAR and Infrared sensors, the RealSense 3D cameras are currently the new game-changer in the industry, thanks to Amazon’s Reckognition fiasco. Customers will have access to faster facial recognition without technical glitches around racial components, light conditions, facial changes, or contact height.

Intel claims that RealSense ID has an unprecedented true-acceptance rate (recognizes you as you) of 99.7% with a chance of error in the 1000000 times. The spoofing rate (recognizes a recorded photo of you as you) stands at less than 1%. The timing reported per facial recognition is 1.5s (sensing a presence) + 0.8s (for facial authentication). You do not have to stand in a line for verification.

For privacy concerns, the captured images are stored on the device, and data is encrypted at all levels using the AES-256 scheme. The device uses an algorithm to give the photos an ID, and all further communication uses that designated ID without revealing any visual information. This is done by a neural network that sits at the base of its facial recognition.

Also Read: Intel India Launches An AI Research Center, INAI

Intel is currently offering two builds — Intel RealSense ID Solution F455 and F450. While the former is a ready-to-deploy model, the latter provides a custom solution for specialized use-cases. The company thinks the technology will be used at security checkpoints, ATMs, smart locks, kiosks, POS for verification purposes. The systematic details assure safety, but if we look at the scale of use-cases like airports, ATMs that run into millions, the accuracy level still becomes a challenge. A chance of false acceptance in 1,000,000 can cause security concerns.

The codebase behind the technology has been open-sourced long ago, ensuring no corporate or government backdoors entries. However, neural systems are prone to adversarial inputs. Hence, the adversarial security of these neural networks creates additional room for blunders.

The integration of Intel’s RealSense with Windows Hello remains an issue, so we will not be able to use the models with our laptops or desktops for authentication purposes. However, Intel is now trying hard to salvage its RealSense technology that was lying defunct until now.

Advertisement

OpenAI’s DALL·E Can Create Images From Text

OpenAI Dalle

OpenAI created a multimodal generative neural model — DALL·E — that can create images from text prompts given as input.  This neural network uses the 12-billion GPT-3 parameter version trained to generate images from text descriptions using a text-image pair dataset.

They put their experience of building GPT-3 and ImageGPT to show that manipulating visual concepts through language is now within reach. The researchers demonstrated that language can instruct a large neural network to perform various text generation tasks and generate high fidelity images.

The samples released for each caption are the top 32 of 512 images after reranking with CLIP. This procedure can be seen as a kind of language-guided search and dramatically impacts sample quality.

The demos that are released by the researchers showcase images of imaginary objects with both modified and preserved attributes. The model understands the three-dimensional visualization of items with their internal and external structure and can infer contextual details independently. However, the model goes beyond and also shows zero-shot visual reasoning, geographic knowledge, and temporal knowledge.

The network architecture uses transformers with simple decoder-only components. In essence, it is a language model that receives both the text and the image as a single stream of data containing up to 1280 tokens. It uses maximum likelihood training to generate all of the tokens. 

They use an attention mask at each of its 64 self-attention layers allowing each image token to attend to all text tokens. OpenAI’s DALL·E uses the standard causal mask for the text tokens and sparse attention for the image tokens with either a row, column, or convolutional attention pattern, depending on the layer.  The researchers will be publishing a paper soon detailing other elements of DALL·E.

There are always some trade-offs involved. Here are some granular points that were pointed out by the researchers: –

  1. The phrase determines the success rate
  2. OpenAI DALL·E confuses the associations between the objects and their colors when more items are present
  3. DALL·E can draw multiple copies but is unable to count past three reliably

You try creating images from text here.

Advertisement

CLIP From OpenAI Recognizes Images From Their Captions

OpenAI Clip

OpenAI’s Contrastive Language–Image Pre-training (CLIP) learns image representation from associated natural language. The intuition is to learn how to recognize a wide variety of visual concepts in images and associate them with their names. Thus, no preferential fine-tuning is required for many downstream tasks while benchmarking against the current state-of-the-art.

The researchers from OpenAI did not use curated labeled training data for CLIP. Instead, they obtained training data (400 million images and their captions) from the internet that are highly varied and highly noisy. Hence, it is a complete departure from the prevalent practice of using standard labeled datasets to train computer vision models that specialize in only one task.

The researchers used a simplified version of ConVIRT architecture. To make CLIP efficient, they adopted a contrastive objective for connecting text with images. The training objective was to predict a caption from 32,738 random ones, which is the correct one for a given image. After pre-training, natural language is used to reference learned visual concepts, enabling the model’s zero-shot transfer to downstream tasks.

Also Read: OpenAI Releases Robogym, A Framework To Train Robots In Simulated Environments

The pre-training distilled down to competitive performance on over 30 different existing computer vision datasets, spanning tasks such as OCR, action recognition in videos, geo-localization, and many types of fine-grained object classification. This result suggests that the zero-shot evaluation of task-agnostic models is much more representative of a model’s capability.

The most crucial part is the robustness of the neural net against adversarial data. Since the model is not directly optimized for the benchmarks, it learns much more rich representations that make it adversarially robust.

Even if the model seems to be versatile, it has the following limitations as reported by the researchers are:

  1. It struggles with more abstract or systematic tasks such as counting the number of objects in an image and more complex tasks such as predicting the spatial distance
  2. Zero-shot CLIP struggles with very fine-grained classification, such as telling the difference between car models, variants of aircraft, or flower species
  3. CLIP poorly generalizes to images absent in its pre-training dataset
  4. CLIP’s zero-shot classifiers can be sensitive to wording or phrasing and sometimes require trial and error “prompt engineering” to perform well.

To learn more about CLIP, have a look at the paper and the released code.

Advertisement

Computer Vision Has A New DeIT By Facebook

Facebook DiET

Facebook AI introduces DeIT (Data-efficient Image Transformer), a Transformer-based approach to train computer vision models. Over the years, Transformer has led to several breakthroughs in NLP, but the use of Transformer for image processing has been a new advancement of late. The idea behind Transformer is to move away from the popularly used image processing techniques like Convolutional Neural Networks as the new technique delivers exceptional results while decreasing the need for data and computation.

On 3 December 2020, Google also released a Transformer-based image processing technique — Visual Transformer (ViT), achieving state-of-the-art results on image classification by obtaining superior accuracy on the ImageNet dataset. They used external data that amounts to 300 million training images, which is yet to be released. But within mere 20 days, researchers from Facebook published DeIT, which was trained on a single 8-GPU node in two to three days (53 hours of pre-training, and optionally 20 hours of fine-tuning) with no external data. 

Researchers from FAIR built upon Visual Transformer (ViT) architecture from Google and used patch embeddings as input. But, they introduced a new transformer-specific knowledge distillation procedure based on a distillation token that brought down the training data requirement significantly compared to ViT.

Also Read: Top Image Processing Libraries In Python

It seems that Google Brain and FAIR researchers are trying to one-up each other. The ground details are below.

DeIT achieved competitive results against the state-of-the-art on ImageNet. When the pre-trained model was fine-tuned for fine-grained classification on several popular public benchmarks like CIFAR-10, CIFAR-100, Flowers, Stanford Cars and iNaturalist-18/19, it managed to secure the 2nd position in terms of classification accuracy in iNaturalist-18/19 with competitive scores in the rest of them. 

Currently, the FAIR team has released three models with varying numbers of parameters:-

Facebook

The reported tricks used to achieve such a feat are Knowledge distillation, Multi-head Self Attention layers (MSA) [with heads = 3,6,12] and certain standard image augmentation techniques like Auto-Augment and Random-Augment. They also used ADAMW optimizer and regularization techniques like Mixup and Cutmix to improve performance. 

Check out the paper here and the released code here.

Advertisement

OpenMined, In Collaboration With PyTorch, Introduces A Free Course Of “The Privacy AI Series”

OpenMined Privacy Course

OpenMined has released a course to train next-generation machine learning enthusiasts and practitioners to process sensitive data without breaching privacy. OpenMined is well known as a community focussed on developing tools and frameworks for AI that can work with data that can not be pooled centrally for privacy concerns. This course is a part of their collaboration with PyTorch to offer four free courses (The Privacy AI Series) on machine learning with privacy-preserving techniques. 

Currently, there are four courses planned to be offered — Our Privacy Opportunity, Foundations of Private Computation, Federated Learning Across Enterprises, and Federated Learning on Mobile.

At present, OpenMined has released the first course “Our Privacy Opportunity. The course is being offered free of cost along with a completion certificate. The best part is that you will be working on real-world projects while being mentored by world-class researchers with names ranging from Andrew Trask, PhD Researcher at the University of Oxford, Cynthia Dwork, author of Differential Privacy, Harvard, Ilya Mironov, author of Renyi Differential Privacy, FAIR, and more.

The course is aimed at dealing with current privacy infrastructures, their limitations, and building the foundations for upcoming courses on federated learning. As per the course design, it will take you around the privacy-transparency tradeoff and teach you about the principles of privacy. Moreover, the first course requires you to only invest a little over seven hours. At the end of the course, you will be able to come up with privacy product specifications on your own. 

The course has been structured for beginners and hence, assumes no prerequisites. It begins by defining information flow, then puts lights on failures in terms of privacy and transparency in the information structure. After exposing the lacunas of current information flow designs, the course builds upon structured transparency and its impact.

Register for the first course of The Private AI Series here.

Advertisement

PURE EV Develops Next-Gen AI System To Automatically Resolve Defects In Lithium-Ion Batteries Of Electric Vehicles

Pure EV

IIT Hyderabad-incubated Startup PURE EV has developed an Artificial Intelligence-driven hardware that automates identification and repair of defects in Lithium-Ion Batteries of Electric Vehicles. This next-generation technology completely does away with the requirement for customers to visit Service Centers to address battery defects.

PURE EV Researchers have designed Artificial Neural Network (ANN)-based algorithms for the system called ‘BaTRics Faraday,’ which identify the defects in various series in the battery and also auto heals them to the best of the electro-chemistry potential of the cells. This process is fully automated by the hardware and no manual intervention is required all the way till complete capacity restitution.

Extensive field testing of ‘BaTRics Faraday’ has already been completed. This system can be used for all five Two-Wheelers models launched by PURE EV (Epluto 7g, Etrance Neo, Etrance, Egnite & Etron+). This system will be rolled out in the first quarter of 2021.

Elaborating about this system, Dr. Nishanth Dongari, Founder, PURE EV, and Associate Professor, Department of Mechanical and Aerospace Engineering, IIT Hyderabad, “Lithium batteries are the most critical component of electric vehicles. They contain multiple lithium cells welded together in series and parallel arrangements to meet the desired voltage and ampere-hour (Ah) capacity. In case of any defects coming to batteries in any of the cell series, it leads to significant downtime to the EV owners. Additionally, the usage behavior pattern and Indian environmental conditions put more load on the batteries. It is, indeed, a very difficult task for battery OEM to get the defects rectified through diagnosis and replacement of defective series of cells.”

Further, Dr. Nishanth Dongari said, “Hence the need of the hour is to develop an external intelligent hardware device which resolves battery defects through an external healing process. PURE EV has developed AI-driven hardware which carries out the diagnosis and resolution of defects in the battery through an externally-connected device to the battery. This saves precious man-hours otherwise spent in replacement of series of cells and significantly reduces the battery ‘Turn Around Time’ (TAT).”

Lithium batteries repair is a challenging task currently in the market as the supply chains are not well established and reputed OEMs (Original Equipment Manufacturers) are not yet operational in this emerging segment. Electric Vehicle Customers are reliant on EV OEMs who, in turn, are reliant on battery OEMs.

In such a scenario any innovation that gives assurance to prospective customers stakeholders on the lower turnaround time will significantly boost their confidence level on emerging this technology. It is imperative for the electric vehicles to become mainstream that innovative solutions like ‘BaTRics Faraday’ come to market which ensures that the battery does not turn into an idling asset for EV owners

Highlighting the need for this technology, Mr. Rohit Vadera, Chief Executive Officer, PURE EV, said, “This intelligent device enhances the capability of PURE EV to turn around battery defects within a shorter time period. With the significant takeoff happening for EVs, PURE EV is building the necessary infrastructure and technical capabilities to become a pioneer in battery after-sales service for its esteemed customers.”

Mr. Rohit Vadera added, “PURE EV will be able to provide service in much lesser TAT and with the establishment of company-owned workshops pan India we intend to emerge as a reputed benchmark in the battery after-sales service standards.”

Currently, PURE EV has made this device operational based out of its factory. In future, PURE EV plans to establish company-owned high-end ‘Battery Diagnostics and Repair’ workshops at major demand centers across India. Such workshops will ensure a significantly lower TAT for battery repair across various geographies and the company deems this as one of the critical steps in the vision to become an established name across pan India in the EV space.

PURE EV has an in-house battery manufacturing facility and a research setup based out of IIT Hyderabad campus where the company’s dedicated R&D team works on core areas of battery thermal management system for development of long-range and high-performance Lithium batteries.

Advertisement

Google Cloud Is Offering Free Training On AI, Big Data, & More

Google Cloud Free Training

Google Cloud is offering free training on in-demand skills like AI, Analytics, Kubernetes, and more through its Qwiklabs. On 9 December, Google Cloud extended the offer of no-cost training and allowed users to register by 31 December 2020 to avail of the offer.

Google Cloud has been helping people navigate through the difficult times due to the pandemic by allowing learning to get started with the technologies. Since the lockdown, cloud providers have witnessed a steep rise in the adoption of cloud computing.

Looking at the recent trends, in 2021 organizations would require professionals who are not only proficient in the latest technologies like machine learning, analytics, and more but also familiar with levering such technologies on the cloud to develop products. For any developers, cloud computing skills has moved from nice-to-have to must-have skill as remote working has become the new normal.

To enable learners to master the skills of the future, Google Cloud is offering free access to its Qwiklabs for 30 days. You will have to copy the offer code and then click on enroll to begin. However, you should follow the instruction that mandates the completion of 30 minutes tour of Qwiklabs and Google Cloud to get 30-day training. If you only signup on Qwiklabs and do not finish the tour, you will not get access to the free training.

Every lesson will come with a lab that will give you free access to the Google Cloud Platform to learn for free. Although there are timers to end the session to avoid endless computing, you can start the session again to continue to learn and complete the courses. You will also get badges on completing the courses to showcase your knowledge.

Learn from a wide range of courses like Big Data, Machine Learning, Infrastructure & DevOps, Website & App Development, and more and upskill to stay relevant in 2021 and beyond.

Register for free training from Google Cloud here.

Advertisement