Concept-Whitening For Interpreting Neural Networks At Ease

By

-

January 18, 2021

Researchers at Duke University have recently introduced Concept-Whitening, a new type of layer in neural networks that provides the necessary means of interpreting the neural models without hurting predictive performance. The new layer is an alternative to a batch normalization layer as it normalizes and also de-correlates, whitens the latent space — the numerical parameters that store encoded features.

There has been a trade-off between predictive accuracy and interpretability in the machine learning field from the onset of neural networks. The researchers experimented with ConvNets and tested the performance pre-and-post addition of a concept-whitening layer. In ConvNet, the earlier layers detect edges and corners, and the successive layers are built upon those features to detect far more complex attributes. The latent space of the neural model encodes concepts that discriminate classes it is meant to detect. Sadly, the neural models are cheaters. They learn most discriminative features that may not be relevant at all. It is thus vital to know what these models encode in them.

Therefore, a lot of attempts have been made to see inside their hidden layers. In the recent past, there were efforts to interpret individual nodes of pre-trained neural networks. But, the nodes are not always ‘pure,’ i.e., encodes a mixture of features, and information about any concept could be scattered throughout the network. Similarly, Concept-vector — vectors from the latent space chosen to align with predefined or automatically discovered concepts — have also been used. Consequently, they assume each vector encodes only one concept, which is not valid. Hence, these post-hoc approaches rely on the latent space to possess properties that it may not have and can produce misleading and unusable interpretations. Thus, Concept-Whitening emerges as a significant development in deep learning that is featured in Nature Machine Intelligence.

The concepts need not be the labels in the classification problem like the points on any axis that are easier to detect and interpret. The Concept-Whitening module imposes the latent space aligned along the target concepts’ axis. Thus, each point in the latent space has an interpretation in terms of known concepts. The module uses Whitening, which decorrelates and normalizes each axis, along with a rotation matrix that preserves whitening transformation and aligns the concepts with the axes to disentangle concepts.

Also Read: VOneNets – Computer Vision meets Primate Vision

The researchers were quickly able to show a small modification, adding a Concept-Whitening module to neural network architecture, easily visualizing how the network is learning all of the different concepts at any chosen layer. They even showed how concepts are represented at a given layer of the network. The module provides all these perks without hurting predictive performance.

Their experiment with ConvNets revealed that complex concepts are filtered out. The lower layers of the model create lower-level abstract concepts. For instance, an airplane at an early layer is represented by an abstract concept defined by white or grey objects on a blue background. A bed is represented by an abstract concept that seems to be characterized by warm colors (orange, yellow).

In that sense, the Concept-Whitening layer discovers new, more straightforward concepts that can be formally defined and built on, if desired.

Read more here and tinker with the code here.

Amazon Introduces ‘Alexa Custom Assistant,’ Giving Advanced Capabilities To Developers

By

Ratan Kumar

-

January 17, 2021

Amazon debuts’ Alexa Custom Assistant’ to allow companies to build custom assistants. Fiat becomes the first company to leverage the new solution in vehicles. Over the years, building virtual assistants have been a resource-intensive task for organizations as they had to build new skills from the ground up. However, with Alexa Custom Assistant, companies can develop specific skills while using already present Amazon’s existing advanced technologies for Alexa.

Since Alexa Custom Assistant works on top of advanced Alexa technologies — Alexa Skills Kit (ASK), companies will quickly develop and bring products to the market. Organizations now make assistants that can have unique wake word, voice, skills, and capabilities, providing personalized user experience.

Our customers expect to easily connect with their digital lifestyles wherever they go and today we responded with plans to offer new intelligent experiences built on Alexa’s world-class voice AI technology,” said Mark Stewart, Chief Operating Officer, FCA – North America. “We look forward to the expanding partnership with Amazon and the integration of Alexa Custom Assistant within our powerful Uconnect system as we continue on our path to put customer needs and expectations at the center of everything we do.”

Also Read: Google’s Trillion-Parameters NLP Model

What makes Alexa Custom Assistant even more powerful is that Alexa and brand-specific virtual assistants will work hand in hand to deliver a superior customer experience. Both Alexa and bespoke virtual assistants will work simultaneously, improving the skills of virtual assistants. Based on users’ queries, the request will be channeled to either Alexa or brands’ virtual assistants. Such advancement can rapidly increase virtual assistants’ adoption to simplify the way we do our day-to-day tasks.

Alexa Custom Assistants is available at places where Alexa is supported, like U.S, Canada, India, U.K, France, Germany, Japan, Italy, Australia, Brazil, Mexico, and more, but you will have to join the interest list.

Google’s Trillion-Parameters NLP Model

By

Pradyumna Sahoo

-

January 16, 2021

Google loves to scale up things. And no wonder, the trillion parameter threshold for pre-trained language models has been breached by Google AI research. The researchers have recently introduced Switch-Transformers that have 1.6-trillion-parameters. It is the largest sparse-model ever trained for the first time with lower precision (bfloat16) formats.

The basic intuition at work here is to use simple architectures that surpass far more complicated algorithms backed by large datasets and parameter counts. The researchers built the Switch-Transformers’ base architecture on Google’s T5 architectures. They reported a four-fold speed over the T5-XXL and seven times over T5-Base and T5-Large in pre-training speed with the same computational resources.

However, the Switch Transformer fundamentally differs from the currently famous Pre-trained Language Models (PLMs) that use densely activated transformer architectures like GPT-2 and GPT-3. The transformer does not re-use the same weights for all input; instead, it contains a mixture of experts, small models that select different parameters for each input, specialized in various tasks. A gating network looks upon this mixture and draws an inference from the most relevant expert for the task at hand.

This arrangement results in a sparsely-activated expert model with an outrageous number of parameters but provides greater computational efficiency. The sparsity comes from activating a subset of the neural network weights — only the expert model’s weight for each input. The reported computational efficiency was observed from the fact that the 1.6-trillion-parameter model with 2,048 experts (Switch-C) exhibited “no training instability at all,” in contrast to a smaller model (Switch-XXL) containing 395 billion parameters and 64 experts. The researchers credit the efficient combination of data, model, and expert-parallelism to create models with up to a trillion parameters.

This sparse network is distilled to a specialized fine-tuned dense version for a particular downstream task. The researchers were able to reduce the model size by up to 99% while preserving 30% of the sizable sparse teacher’s quality gains. But because of the vast size of the model, novel pre-training and fine-tuning techniques were employed. They came up with a selective precision training that enables training with lower bfloat16 precision. They used a new initialization scheme that enables scaling to many experts and, lastly, increased expert regularization that improves sparse model fine-tuning and multi-task training.

The team trained the huge model by spreading the weights over specialized hardware, TPUs. This training scheme provided a manageable memory and computational footprint on each device. They used the Colossal Clean Crawled Corpus, a 750GB-sized web-crawled multilingual data dataset of text. Masked-training was leveraged where the model had to predict for masked words.

Astonishingly, the model showed a universal improvement across 101 languages and with 91% of languages benefiting from 4x+ speedups over the mT5 baseline. The researchers have a plan to apply the Switch Transformer to multimodal models.

VOneNets – Computer Vision meets Primate Vision

By

Pradyumna Sahoo

-

January 15, 2021

The neuroscientists’ group at MIT-IBM Watson AI lab released VOneNets, a biologically inspired neural network fortified against adversarial attacks. VOneNets are ordinary convolutional neural networks that are more robust by simply adding a new layer, VOneBlock, that mimics the earliest stage of the brain’s visual processing system, V1.

What Is V1?

V1 is the brain’s primary visual cortex that processes visual input like static images and moving ones to recognize edges. And later, the neurons build upon the edges up to the full visual representation of the input. This behavior has inspired ConvNets that detect edges from raw pixels in the first layer, then use the nonlinear combination of edges to detect simple shapes in the next layer. These shapes are again combined in a non-linear fashion to detect higher-level features in subsequent layers. Yet, the ConvNets struggle to recognize objects in corrupted images that are easily recognized by humans.

Scientists thus carried out functional modeling of V1 to emulate the brain’s prowess for visual understanding in computers. As a result, a classical neuroscientific model, the linear-nonlinear-Poisson (LNP) model, was developed. The LNP model consists of a biologically-constrained Gabor filter bank, simple and complex cell nonlinearities, and a neuronal stochasticity generator. And this LNP model became the base of the VOneNets that are superior to ordinary ConvNets.

Also read: Optical Chips Paves The Way For Faster Machine Learning

Why VOneNets?

Can V1 inspired computations provide adversarial robustness if used as a base for initial layers?

Yes, Li et al. had shown in an experiment that biasing a neural network’s representations towards those of the mouse’ V1 increases the robustness of grey-scale CIFAR trained neural networks to both noise and white box adversarial attacks.

How are the activations in ConvNets and primate V1’s related in the context of adversarial robustness?

Using the “BrainScore” metric that compares activations in deep neural networks and neural responses in the brain, the scientists measured all ConvNet model’s robustness. They tested it against white-box adversarial attacks, where an attacker has full knowledge of the structure and parameters of the target neural networks. They found that the more brain-like a model was, the more robust the system was against adversarial attacks.

How does VOneBlock simulate V1?

The LNP model of V1 consists of three consecutive processing stages — convolution, nonlinearity, and stochasticity generator — with two distinct neuronal types, simple and complex cells — each with a certain number of units per spatial location.

The VOneBlock contains elements that mimic the LNP model. A fixed-weight biologically-constrained Gabor filter bank carries the convolutions in VOneBlock to approximate the diversity of primate V1 receptive fields like color and edges. A nonlinear layer introduces nonlinearities with two different nonlinearities, ReLU for simple cell activations and spectral power of a quadrature phase-pair for complex cell activations. And the stochastic layer mimics stochastic neural behavior — repeated measurements of a neuron in response to nominally identical visual inputs resulting in different spike trains.

How do VOneBlock is integrated into VOneNets?

The VOneNet replaces the first stack of convolutional, normalization, nonlinear, and pooling layers in a ConvNet with the VOneBlock and a trained transition layer.

How do VOneNets perform against adversarial attacks?

VOneNets are substantially more robust than their corresponding base models and retain high ImageNet performance after training. The adversarial robustness permeates across all architectures, hence shows the generalisability of the VOneblocks. All properties of the VOneBlock — Convolution, nonlinearities, and stochasticity — work in synergy to improve robustness to different perturbation types. And the neuronal stochastic elements at the VOneBlock level lead the downstream layers to learn more robust input features.

The blue region represents the increment in white-box accuracy over the base models.

Read about the research in detail here and find the code here.

AV-MAP: From Short Video To The Entire Floor Plan Using ML

By

Pradyumna Sahoo

-

January 14, 2021

AV-MAP predicts the entire floor plan from short videos.

Researchers from Facebook AI and other universities have developed a framework, AV-MAP, that can infer rooms’ general layout from a house’s short-clip. This framework predicts the house’s whole structure with 66% accuracy from the short clip covering merely 25% of its floor plan.

AV-MAP stands out from current methods. Current methods mostly need movements to map the floor using videos from cameras and 3D sensors. These methods do not use audio of the videos that provide complementary information about distant free space and rooms outside of the camera’s reach, like an echo in the hall, a dishwasher’s humming, showers in the bathroom, and more. Hence, the current methods can not predict beyond the visual field captured in the video.

The team at Facebook AI, Carnegie Mellon University, and the University of Texas came up with AV-MAP that does not need movement to capture the house’s layout. The basic intuition was to use sound with the video input. Sound inherently driven by geometry, i.e., reflection reveals the distance between rooms. Identifying meaningful sounds because of activities or objects coming from different directions reveals the plausible room layouts. For instance, the sounds from the left and utensil sounds from the right indicate that the drawing-room on the left and the kitchen on the right.

Also read : Computer Vision Has A New DeIT By Facebook

AV-MAP uses a novel multimodal encoder-decoder framework that jointly learns about audio-visual features to reconstruct a floor plan from a given short video clip. The framework consists of three components: Top-Down Feature Extraction, Feature Alignment, and a Sequence Encoder-Decoder architecture. The feature extractor, a modified ResNet, obtains top-down floor plan-aligned features for each modality (ambisonic audio and RGB) independently at each time-step.

These extracted features are mapped to a standard coordinate frame using the relative motion of the camera. At the encoder, the entire feature sequence undergoes pixelwise self-attention operations and convolutions. Lastly, the two modalities are fused at the decoder via a series of self-attention and convolution layers. The AV-MAP model then predicts the interior structure of the environment and the associated rooms’ semantic labels, like bathroom, kitchen, and more.

The team created two experimentation settings (active and passive) to test the framework using Matternet3D and SoundSpaces datasets. These are datasets of 3D modeled houses hosted in Facebook’s AI habitat. In the active setting, a virtual camera emits a known sound while moving throughout the room of a model home. Similarly, in the passive setting, the model uses sounds made by objects and people inside the house. Overall, the researchers found that AV-MAP offers an 8% improvement in floor plan accuracy over the state-of-the-art approach.

Read about the framework here.

Amazon Web Services (AWS) Is Hosting A Free AI Conclave

By

Pradyumna Sahoo

-

January 13, 2021

AWS will host a two-day free AI online conclave on 28 and 29 January. The conclave is power-packed with 20+ breakout sessions by Amazon and industry experts. The participants will receive training on building, training, and deploying sophisticated models with any Amazon Web Service framework solutions on the cloud and on edge.

The AWS AI conclave will feature prominent personalities like Swami Sivasubramanian, Vice President, AWS, Stefano Soatto, Director of Applied Science, AWS and, Rajeev Rastogi, Vice President, Machine Learning, Amazon India. The speakers will cover topics from strategies and frameworks to real-world applications like Anomaly detection, Real-time personalization and recommendation, Fraud Detection, Automation in Pharmacovigilance (Drug Safety), and more.

The event has two editions — business and technical — based on participants’ different AI and ML solution adoption journeys. While the business edition is earmarked for business and technology leaders to educate them about operational insights on the AWS ecosystem, the technical edition is provisioned for beginners to experienced machine learning and data science practitioners. You can register for the technical edition here.

Also Read: Optical Chips Paves The Way For Faster Machine Learning

The participants will develop the right skills to create new insights and make more informed predictions that translate to operational efficiencies in terms of productivity. They will also learn about proven best practices to apply AI for organizations from thought leaders across fields like Amitabh Kant, CEO at NITI Aayog, Puneet Chandok, President-Commercial Business at AWS India and South Asia, and Eitan Medina, Chief Business Officer at Habana Labs.

At present, AWS holds the dominant position, a 33% market share in Cloud-based services and infrastructure. Hence, companies that use AWS stack, always look out for candidates who understand how to build smart, customer-centric, scalable solutions in the cloud and on edge using Amazon’s broadest and deepest set of machine learning and AI services. This conclave provides the platform to participants for building their network with industry peers and fostering new collaborations.

Microsoft Is Hosting Free Virtual Workshop On Reinforcement Learning Day, Providing Job Opportunity At Its Research Labs

By

Pradyumna Sahoo

-

January 12, 2021

Microsoft Research will be observing Reinforcement learning day on 14th January 2021. On this day, Microsoft will host a free virtual workshop that features prominent scientists like Yoshua Bengio (one of the Godfathers of deep learning), John Langford and many others to bring together the research communities to learn from each other and build on the latest knowledge.

Reinforcement learning studies how natural and artificial systems learn to make decisions in complex environments based on external, and possibly delayed feedback. The topic amalgamates ideas from computer science, cognitive science, mathematics, economics, control theory, and neuroscience.

This virtual workshop will feature multidisciplinary talks that span across theory to practice. The workshop will also provide a common platform for researchers from industry and academia alike. The aim is to highlight emerging research opportunities for the reinforcement learning community, particularly those driven by the evolving need for robust decision making in practical applications. The speakers at the workshop will speak about applications of Reinforcement Learning: Recommender Systems, Robotics, Healthcare, Education, Conversational AI, Gaming, Finance, Neuroscience, and Manufacturing.

Also Read: Microsoft’s DeBERTa Surpasses Humans On Natural Language Understanding

The agendas for the workshop have been chosen to keep in mind the latest developments in the field like Hierarchical Reinforcement Learning, Active Imitation Learning with Noisy Guidance, META-Q-Learning, Fundamental Limits of Imitation Learning and more.

Microsoft Research, earlier, had called for papers to be featured in a virtual poster session to showcase recent and ongoing research in all areas of reinforcement learning like Deep RL, RL Theory, Bandit Algorithms, Multi-Agent RL. This poster presentation will run in parallel with the main workshop event. Since top minds in the RL field will populate the workshop, the organisers have created a job board that has research-based job openings in Microsoft’s own labs around the world.

Register for the Microsoft’s Reinforcement Learning Day 2021 free virtual workshop here.

Optical Chips Paves The Way For Faster Machine Learning

By

Pradyumna Sahoo

-

January 12, 2021

Silicon transistors, the basic unit of the silicon processors, can not be shrunk further without avoiding quantum-mechanical effects. Consequently, current silicon-based processors have hit their performance limits. And this has led the quest for new architectures that can replace silicon chips with optical chips.

The researchers’ team, led by Prof. Wolfram Pernice from the Institute of Physics and the Center for Soft Nanoscience at the University of Münster, developed an optical chip. These chips process data with a speed of 50 to 100 GHz and in parallel than the new-age graphics cards or specialized hardware like Google’s TPU, which usually work in the low GHz range.

These photonic chips have achieved the breakneck speed, thanks to a combination of vital structural components:

Frequency combs – Provides various optical wavelengths that are processed independently of one another in the same photonic chip.
Phase-change materials (PCMs) – Energy-efficient storage elements used in optical data storage like DVDs. These stores and preserve the matrix elements without the need for an energy supply in the new processor.

The chip-based frequency combs are combined with phase-change materials (PCMs) to carry out matrix multiplications on multiple data sets parallel by wavelength multiplexing, calculating all wavelengths simultaneously without extra energy supply. This combination permits data rates and computing densities, i.e., operations per area of processor, never previously attained in an energy-efficient manner.

To test the optical chip, the researchers tried a convolutional neural network to recognise handwritten numbers where the convolutional operation between input data and one or more filters can be transferred very well to the matrix architecture. As a result, the whole training and inference cycle consisting of many matrix multiplications completes in just one timestep.

Optical chips promise various applications that require faster computation over large data volume in an energy-constrained environment. For example, huge amounts of data can be processed simultaneously while saving energy at a much faster rate than previously possible.

Deeper neural networks that allow more accurate forecasts and more precise data analysis are now possible because of the exponential speed-up in matrix operations. These optical chips can also power the evaluation of large quantities of medical data, such as high-resolution 3D imaging data — that can provide faster diagnosis.

Even self-driving vehicles, which depend on a fast and rapid evaluation of sensor data, can use these optical chips for speedier inference. IT infrastructures such as cloud computing that provide storage space, computing power, or applications software can enhance their throughput and increase their profitability.

Microsoft’s DeBERTa Surpasses Humans On Natural Language Understanding

By

Pradyumna Sahoo

-

January 11, 2021

Microsoft’s DeBERTa (Decoding-enhanced BERT with disentangled attention) has surpassed the natural language understanding baseline set at par with human performance in the SuperGlue benchmark. While the human baseline stands at 89.8 macro-average scores, the DeBERTa ensemble scored 90.3. The model used only half of the training data as compared to RoBERTa.

Microsoft’s DeBERTa uses the neural architectures of Transformer (48 transformers layers) that has 1.5 billion parameters. Like other transformer-based language models, DeBERTa is pre-tuned on considerably large datasets to learn universal language representations. Then, the model is fine-tuned to various downstream NLU tasks like Choice Of Plausible Alternatives (COPA), Multi-Sentence Reading Comprehension (MultiRC), Recognizing Textual Entailment (RTE), Word-in-Context (WiC), Winograd Schema Challenge (WSC), and reading comprehension with commonsense reasoning.

The Microsoft researchers used three techniques to build upon RoBERTa: –

Disentangled attention mechanism (contextual and positional information is preserved for each word by two vectors separately)
Enhanced mask decoder (the training comprises of predicting the correct word for the missing place)
Virtual adversarial training method (for fine-tuning to learn more robust word representation)

Separate encoding for context and position correctly assigns attention weights between words that account for any dependencies. The enhanced masked decoder forces the model to predict words for a mask by accounting for previous words. The idea is to make the model aware of these words’ absolute positions, which is critical to handle syntactic nuances.

Also Read: IIsc Invites Application For Its New Deep Learning Specialisation

For models as big as DeBERTa, improving models’ generalization towards adversarial examples is a challenge. Therefore, the researchers at Microsoft developed the Scale-Invariant-Fine-Tuning (SiFT) method. The adversarial perturbations are applied to the normalized word embeddings, and the model is regularized to produce the same output on a task before adding perturbations of that example.

Microsoft’s DeBERTa will be a part of the Microsoft Turing natural language representation model (Turing NLRv4). These models will serve Microsoft products like Bing, Office, Dynamics, Azure Cognitive Services, chatbots, recommendation, question answering, search, personal assist, customer support automation, and content generation.

Uber AI Says You Can Increase Task Completion If You Are Polite With Virtual Agents

By

Pradyumna Sahoo

-

January 10, 2021

The Uber AI researchers published an overview of a deep learning framework addressing customer engagement with ‘polite and positive’ assistants. The task-oriented conversational agents like Alexa, Siri, Google’s assistant fulfill tasks like booking cabs by conversing with users and retrieving information (current location, destination, and type of cab) from them.

As users, we love to engage with such virtual entities when we see them generate proper interpersonal responses and build an emotional connection with us. Consequently, Uber’s AI team came up with ways to make the assistants use appropriate social language (language interpreted differently in different social contexts). They examined the relationship of customer service representatives’ use of social language to drivers’ responsiveness and the completion of their first trip, based on an analysis of driver and human-agent conversations.

Interestingly, scientists defined ‘Politeness’ as a strategy to avoid feeling awkward or embarrassed when the social distance between two parties is enormous. They trained an SVM classifier on a corpus containing domain-independent lexical and syntactic features of politeness with a politeness label.

Similarly, positivity, defined as “the quality or state of being positive,” was evaluated by VADER, a rule-based sentiment analyzer. They found that these social language norms, like politeness and positivity used by human agents, are associated with greater users’ responsiveness and task completion.

This paper proposes, for the first time, an end-to-end deep learning framework for task-oriented dialogue generation that jointly understands input utterances and generates output utterances infused with social language aimed at particular task completion. The model addresses varying meanings of positivity and politeness according to context over time by taking into account the conversation context, completed tasks, and generates language with the desired content and social language norms.

A seq2seq model, built upon an architecture inspired by Huber et al., was modified by introducing a layer with a social language understanding component. The politeness and positivity features are extracted from responses using the pre-trained classifiers — SVM and VADER.

In terms of content preservation and social language level, the model was evaluated using both human judgment and automatic linguistic measures. And it was found that the model can generate responses that enable agents to address users’ issues in a more socially appropriate way.