Google Introduces 540 billion parameters PaLM model to push Limits of Large language Models

By

-

April 8, 2022

google pathways palm — Image Credits: The Verge

Language learning models are the latest fad in artificial intelligence technologies. In the realm of AI, we’ve seen some pretty remarkable advancements in the last few months. Last year, Google unveiled Pathways, a new AI architecture that works similarly to the human brain and learns faster than previous models. Before, AI models were only trained for one sense, such as sight or hearing, and not both. On the other hand, Pathways allows Google to interpret text, pictures, and audio in a single AI model. Google Pathways was recently put to the test by the Google Research team, who used it to train the Pathways Language Model (PaLM), a 540-billion-parameter dense decoder-only autoregressive transformer, using 780 billion tokens of high-quality text. In “PaLM: Scaling Language Modeling with Pathways”, the team explains PaLM outperforms state-of-the-art few-shot performance on language interpretation and creation tasks in many instances.

Language models anticipate the next item or token in a text sequence based on the preceding tokens. When such a model is applied iteratively, with the projected output fed back as input, the model is called autoregressive. Many researchers have built large language models based on autoregressive concept models that are founded on the Transformer deep-learning architecture. The transformer architecture made it easier for models to capture context when parsing text. This was a game-changer because since the previous language models like Recurrent Neural Networks (RNNs) sequentially analyzed text, training on a vast text corpus had to be done word by word, sentence by phrase, which took a long time. Moreover, it meant that any kind of long-term context was computationally too costly to maintain. Transformer architecture makes use of key, query, and value parameters to determine which portion of the text is most relevant in a given context. Transformer-based models, such as BERT, also use a process known as Attention, which allows the model to learn which inputs require more Attention than others in specific instances.

PaLM is based on a conventional transformer model architecture, although it only employs a decoder and adds the modifications like SwiGLU Activation, Parallel Layers, Multi-Query Attention, RoPE Embeddings, Shared Input-Output Embeddings, and No Biases and Vocabulary.

SwiGLU activations are used for the multilayer perceptron (MLP) intermediate activations, resulting in considerable quality improvements over typical ReLU, GeLU, or Swish activations; and a “parallel” formulation in each transformer block, rather than the standard serialized formulation, results in around 15 percent quicker large-scale training speeds. At autoregressive decoding time, multi-query Attention keeps costs down, and the use of RoPE embeddings instead of absolute or relative position embeddings allows for superior performance on larger sequence lengths. To boost training stability for big models, the system additionally shares the input and output embedding matrices and employs no biases in the dense kernels or layer norms. Moreover, to accommodate the vast number of languages in the training corpus without over-tokenization, the team adopts a SentencePiece vocabulary with 256k tokens.

Any language learning model is based on the idea of using a massive amount of human-created data to train machine learning algorithms that can help build models that replicate how people communicate. OpenAI’s GPT-3, for example, contains 175 billion parameters and was trained on 570 Gigabytes of text. DeepMind’s Gopher, a 280-billion-parameter autoregressive transformer-based dense language learning model was trained on 10.5 Terabytes of MassiveText. This includes various sources like MassiveWeb (a compilation of web pages) C4 (Common Crawl text), Wikipedia, GitHub, books, and news articles. PaLM was trained on a range of English and multilingual datasets, including high-quality online publications, books, Wikipedia articles, interactions, and GitHub code. The researchers also developed a “lossless” vocabulary that retains all whitespace (which is critical for coding), separates out-of-vocabulary Unicode characters into bytes, and divides numbers into distinct tokens, one for each digit.

Regardless of having only 5 percent code in the pre-training dataset, PaLM performs well on coding and natural language tasks in a single model. Its few-shot learning performance is incredible since it is on par with the fine-tuned Codex 12B despite using 50 times less Python code in training. This observation backs up prior findings that larger models can be more sample efficient than smaller models because they can more effectively transfer learning from multiple programming languages and plain language data.

PaLM’s performance may be enhanced even further by fine-tuning it on a Python-only code dataset called PaLM-Coder. With a compile rate of 82.1 percent, PaLM-Coder 540B beats the previous state-of-the-art record of 71.7 percent on a code repair assignment called DeepFix, where the objective is to fix originally erroneous C programs until they compile successfully. It could also decompose multi-step issues into many sections and answer various elementary school-level arithmetic problems. Aside from the astounding feat, PaLM was designed in part to demonstrate Google’s capacity to harness thousands of AI processors for a single model.

PaLM beat other language models on 28 out of 29 English benchmarks, including TriviaQA, LAMBADA, RACE SuperGLUE, etc., improving few-shot performance on language understanding and generation. These tasks include question-answering tasks (open-domain closed-book variant), cloze and sentence-completion tasks, Winograd-style tasks, in-context reading comprehension tasks, and common-sense reasoning SuperGLUE tasks, and natural inference tasks. Furthermore, PaLM displayed remarkable natural language understanding and generating capabilities on several BIG-bench tasks. For example, the model can distinguish between cause and effect, understand conceptual combinations in certain situations, and even guess the movie from an emoji. Even though just 22 percent of the training corpus is non-English, PaLM performs well on multilingual NLP benchmarks, including translation and English NLP tasks.

Also, PaLM demonstrates breakthrough skills in reasoning problems that need multi-step arithmetic or common-sense reasoning by integrating model size with chain-of-thought prompting. PaLM outperforms the previous top score of 55 percent achieved by fine-tuning the GPT-3 175B model with a training set of 7500 problems and combining it with an external calculator and verifier by solving 58 percent of the problems in GSM8K, a benchmark of thousands of challenging grade school level math questions, using 8-shot prompting. PaLM can even provide clear explanations for instances requiring a complicated combination of multi-step logical reasoning, world knowledge, and deep language comprehension. It can, for example, give high-quality explanations for original jokes that aren’t present on the internet.

PaLM is the first large-scale use of the Pathways system, scaling training to 6144 chips, the largest TPU-based system configuration utilized for training to date. Data parallelism is used at the Pod level to scale the training over two Cloud TPU v4 Pods, while conventional data and model parallelism is used inside each Pod. Most earlier language learning models were either trained on a single TPU v3 Pod (e.g., GLaM, LaMDA), employed pipeline parallelism to scale to 2240 A100 GPUs across GPU clusters (Megatron-Turing NLG), or used multiple TPU v3 Pods (Gopher) with a maximum scale of 4096 TPU v3 chips (Megatron-Turing NLG).

PaLM achieves the maximum training efficiency for the language learning model at this scale, with 57.8 percent hardware FLOPs usage. This is due to a combination of the parallelism technique and a Transformer block reformulation that allows for simultaneous computation of the attention and feedforward layers, resulting in speedups from TPU compiler optimizations.

BrainChip partners with SiFive to deploy AI at Edge

By

Dipayan Mitra

-

April 8, 2022

Advanced AI software and hardware developing company BrainChip announces that it has partnered with semiconductor technology and software automation platform SiFive to deploy artificial intelligence technology at the edge.

The companies say they have combined their technologies to offer semiconductor designers artificial intelligence and machine learning computing at the edge.

With high performance, ultra-low power, and on-chip learning, BrainChip’s Akida is a new advanced neural networking processor architecture that takes AI to the edge that previous technologies can not.

At the same time, SiFive Intelligence solutions merge software and hardware to accelerate AI/ML applications with its highly configurable multi-core, multi-cluster capable design.

According to the companies, the two highly advanced technologies will result in a highly efficient edge AI computing solution. For AI and ML workloads, SiFive Intelligence-based processors provide industry-leading performance and efficiency.

Vice President of Products at SiFive, Chris Jones, said, “Employing Akida, BrainChip’s specialized, differentiated AI engine, with high-performance RISC-V processors such as the SiFive Intelligence Series is a natural choice for companies looking to seamlessly integrate an optimized processor to dedicated ML accelerators that are a must for the demanding requirements of edge AI computing.”

He further added that the partnership with BrainChip is a valuable addition to their ecosystem portfolio. Akida, BrainChip’s first neuromorphic processor, replicates the human brain by analyzing only relevant sensor inputs at the point of capture and processing data with exceptional efficiency and precision while consuming minimum energy.

CMO of BrainChip, Jerome Nadel, said, “We are pleased to partner with SiFive and have the opportunity to have our Akida technology integrated with their market-leading product offerings, creating an efficient combination for edge compute.”

He also mentioned that as the company expands its network of portfolio partners, it wants to ensure that these partnerships are based on complementary technologies, enabling capabilities, and a wide range of contexts so that it may reach the maximum number of potential customers.

TCS’ Conversational AI Platform recognized by Celent

By

Dipayan Mitra

-

April 8, 2022

Celent has recognized TCS Conversa, a conversational AI platform from Tata Consultancy Services (TCS), as a Technology Standout among Retail Banking Intelligent Virtual Assistant (IVA) platforms.

Celent compared ten such IVA platforms, among which TCS Conversa was named the best. The conversation AI platform achieved this because of its advanced technology and numerous unmatched functionalities.

TCS Conversa is a secure, enterprise-ready, and domain-rich conversational platform that enables businesses to easily implement an intelligent conversational assistant for new and current customer interfaces through chat and voice.

An additional advantage of the platform is that it comes with support for both on-site and cloud models.

Senior Analyst at Celent, Bob Meara, said, “A feature-rich, easily deployable platform that provides diverse support capabilities, interactive channel adapters, and on-premise hosting, we find Conversa to be a leading solution for retail banks.”

He also mentioned that Celent considered various factors, including the platform’s functionality, regional availability, technology and integration capability, and customer feedback before making this decision.

According to the report, clients gave TCS a positive overall rating, praising the conversational design elements for their usefulness and the ease of system maintenance in terms of technology.

TCS says that its conversational AI platform is a strong contender for its TCS BaNCS clients, in which the company had already integrated IVA capabilities as a step toward AI democratization. Celent’s report mentions TCS Conversa’s natural language reasoning capability, no-code-dialog design, and workflow as being out-of-the-box.

Business Group Head of Business, Banking, Financial Services, and Insurance, K Krithivasan, said, “Conversational AI is the future of customer experience, and financial services firms want to unlock its full potential. They want powerful, next-generation bots that can process complex queries with a humanized approach. TCS Conversa, a feature-rich advanced AI platform, helps BFSI enterprises transform operations.”

He further added that this award recognizes their vision, market-leading AI capabilities, and widespread use of sophisticated products like Conversa.

GM and Honda to develop Affordable EVs

By

Dipayan Mitra

-

April 8, 2022

Global automobile manufacturing giants General Motors (GM) and Honda announced that they plan to develop new affordable electric vehicles (EVs) to enter a new market.

The partnership entails producing an EV for the North American, South America, and Chinese markets at a lower cost than Chevrolet’s planned Equinox EV.

According to the companies, their jointly developed EVs will be built on a new global architecture that will use next-generation Ultium battery technology.

GM and Honda aim to start mass manufacturing EVs, particularly compact crossover vehicles, in the world’s largest market with annual volumes of more than 13 million vehicles by 2027. On a worldwide scale, GM and Honda will share their best technology, design, and manufacturing strategies to offer affordable EVs.

GM Chair and CEO Mary Barra said, “This is a key step to deliver on our commitment to achieve carbon neutrality in our global products and operations by 2040 and eliminate tailpipe emissions from light-duty vehicles in the US by 2035.”

She further added that the collaboration would allow them to get more people throughout the world into electric vehicles faster than either company could do on its own.

The companies also discuss future EV battery technology collaboration potential for lowering the cost of electrification, increasing performance, and ensuring future vehicle sustainability. Back in 2013, both companies had partnered to develop a next-generation fuel cell system and hydrogen storage technologies.

GM and Honda also collaborated in 2018 to support GM’s EV battery module development efforts. Moreover, Honda mentioned that it is aiming to reach carbon neutrality on a global basis by 2050.

Senior Managing Executive Officer at Honda, Shinji Aoyama, said, “The progress we have made with GM since we announced the EV battery development collaboration in 2018, followed by co-development of electric vehicles including the Honda Prologue, has demonstrated the win-win relationship that can create new value for our customers.”

Microsoft Offers Detection Guidance on Spring4Shell Vulnerability

By

Dipayan Mitra

-

April 8, 2022

Technology giant Microsoft recently released a blog to guide users to detect Spring4Shell vulnerabilities across its cloud services.

According to the company, it is currently detecting a ‘limited volume of exploit attempts’ across its cloud services that are aimed at the critical Spring4Shell remote code execution (RCE) vulnerability. Spring4Shell is a zero-day vulnerability (CVE-2022-22965) that security experts have classified as Critical.

It is also known as a proof-of-concept attack that only affects non-standard Spring Framework configurations, such as when Web Application Archive (WAR) packaging is used instead of Java Archive packaging (JAR).

Microsoft’s guide contains all the steps and methods that can be used to identify and rectify the issue.

“Microsoft regularly monitors attacks against our cloud infrastructure and services to defend them better. Since the Spring Core vulnerability was announced, we have been tracking a low volume of exploit attempts across our cloud services for Spring Cloud and Spring Core vulnerabilities,” mentioned Microsoft in the blog.

Below mentioned are the traits of systems that are most vulnerable to the attack –

Running JDK 9.0 or later.
Spring Framework versions 5.3.0 to 5.3.17, 5.2.0 to 5.2.19, and earlier versions
Apache Tomcat as the Servlet container:
- Packaged as a traditional Java web archive (WAR) and deployed in a standalone Tomcat instance; typical Spring Boot deployments using an embedded Servlet container or reactive web server are not impacted.
- Tomcat has spring-webmvc or spring-webflux dependencies.

People can use the “$ curl host:port/path?class.module.classLoader.URLs%5B0%5D=0” command to determine the vulnerability of their systems.

Though this command can be used as a predictive tool to check vulnerability, any system that falls within the scope of the impacted systems listed above should still be considered susceptible.

TNAU partners with NEC Laboratories to detect crop diseases using AI

By

Dipayan Mitra

-

April 7, 2022

Tamil Nadu Agricultural University (TNAU) announces that it has partnered with NCE Laboratories India for developing solutions to detect crop diseases using artificial intelligence and machine learning.

TANU and NEC Laboratories recently signed a memorandum of understanding to establish this collaboration.

According to a release, NEC seeks to help identify illness and deficiency categories and give cures through agricultural experts under the terms of the MoU.

NEC plans to develop an artificial intelligence-powered smartphone application, while TNAU would provide expert advice and assist in data collection to diagnose diseases. The signing of the MoU was witnessed by NEC Laboratories India senior vice-president and head Keiji Yamada, TNAU Registrar and acting vice-chancellor A S Krishnamoorthy, and senior officials from the Center for Plant Protection Studies.

Keiji Yamada said, “We are proud to be partnering with TNAU in new ways of applying AI and analytics to resolve crucial issues in agriculture in real-time.” He further added that this collaboration is vital and relevant since India is also one of the world’s major food producers, in addition to being a predominantly agricultural society.

The combined expertise of both entities in their respective fields will be crucial in developing artificial intelligence solutions for the challenge of crop disease detection.

The United Nations claimed that farmers across the globe lose nearly 40% of their produce due to diseases and insects which infect plants. The agriculture industry in India is burdened with the challenge of crop losses of 30-60%. Diseases (15-25%), soil nutrient shortage (36.5%), insects, and pests, which reduce crop output by 10-40%, are the primary causes of loss.

Therefore, the technology jointly developed by TNAU and NEC Laboratories will considerably help farmers minimize their losses by detecting diseases at a very early stage and aid in taking measures.

Tredence opens AI delivery and R&D centers in India, to Hire 500 employees

By

Dipayan Mitra

-

April 7, 2022

Actionable and quantifiable analytics solutions providing company Tredence announces that it has opened new AI delivery and R&D centers in India to strengthen its presence in the country.

The company said it had launched its new centers in cities including Bengaluru, Gurugram, and Chennai. The AI delivery and R&D centers will become fully operational by May 2022 and will have an initial capacity of 1400 seats.

Over the last year, Tredence has doubled its workforce in the country and has promised to hire nearly 500 more by the end of 2022. Through the new delivery centers, Tredence will provide advanced analytics, data engineering, and data science solutions to retail, CPG, TMT, industrial manufacturing, and healthcare clients around the world.

CEO and Co-founder of Tredence, Shub Bhowmick, said, “Expanding our reach into new talent markets is a critical component of our growth plan. We plan to do so by constantly looking for top-tier talent in new regions and partnering with premier institutes like IIT Madras to focus on training and talent development.”

He further added that opening the company’s new offices is a step towards Tredence’s goal of assisting employees specializing in legacy technologies to modernize by providing them with data science and data engineering opportunities. Tredence was named a Great Place to Work-Certified company in India earlier this year, confirming its inclusiveness and openness.

United States-based analytics solutions provider Tredence was founded by Shashank Dubey, Shub Bhowmick, and Sumit Mehra in 2013. The company specializes in offering actionable and quantifiable analytics solutions to marketing, sales, and operational issues with a broad industry focus, strong advanced analytical skill sets, and deep domain expertise.

“Expansion and addition of new India delivery centers signifies an exciting chapter for Tredence as we continue to build groundbreaking data science solutions for global industries. Setting up new delivery and talent centers in India reflects the momentum we are experiencing in the market and aligns with our growth imperatives,” said Chief Operating Officer of Tredence Harish Gudi.

He also mentioned that cities like Bengaluru, Chennai, and Gurugram are excellent places to start for tech companies trying to improve AI innovation and distributed agile delivery models.

Ai-Da becomes World’s First Robot to Paint like an Artist

By

Dipayan Mitra

-

April 7, 2022

Ai-Da, a novel robot developed by English gallerist Aidan Meller and Cornish robotic business Engineered Arts, is the world’s first artificial intelligence-powered robot that can paint just like an artist.

The developers of this robot claim it to be “the first ultra-realistic humanoid artist.” Ai-Da was first introduced back in 2019 at Oxford University, after which it has traveled across the world to showcase its capabilities.

Recently in 2021, Ai-Da also displayed her artwork at the Design Museum located in London. The robot has a robotic arm with which it can create stunning paintings, a realistic-looking face structure, along with eyes that can scan the surroundings and also blink like humans.

“We haven’t spent eye-watering amounts of time and money to make a very clever painter. This project is an ethical project,” said the developer of the robot, Aidan Meller.

AI algorithms lead Ai-Da to probe, pick, make decisions, and, eventually, create a painting, with her camera fixated on her subject. The developers say that Ai-Da can not only paint but can also sculpt, sketch, and even write poems.

As artificial intelligence is an evolving technology with breakthroughs each year, Ai-Da also gets updated regularly, which helps in expanding and further refining its capabilities.

While the AI-powered robot has not yet been able to create entirely unique pieces of art, her five-hour method from start to finish ensures that no two paintings are the same. Ai-Da’s new painting style was introduced before her solo exhibition at the Venice Biennale in 2022, which will open to the public on April 22.

“Ai-Da Robot, as technology, is the perfect artist today to discuss the current obsession with technology and its unfolding legacy,” added Meller.

Meta is hiring a Postdoctoral Researcher for its Facebook AI Research

By

Dipayan Mitra

-

April 7, 2022

Meta is currently hiring a Postdoctoral Researcher for its Facebook AI Research (FAIR), a research organization focused on making fundamental progress in AI.

According to Meta, its Facebook AI Research (FAIR) conducts cutting-edge research to advance the state-of-the-art and deepen the company’s understanding of existing AI approaches. Meta is hiring for one-year fixed-term Postdoctoral roles in multiple locations across the globe.

The selected candidate will collaborate with the supervisor and other researchers at FAIR to establish and implement a research agenda and publish world-class research.

The position involves research in multiple fields such as AI, ML, computer vision, natural language processing, Speech & Audio, Conversational AI, Theory, Reinforcement Learning, Robotics, and many others.

Meta says that the selected candidate will gain not only significant experience at a top-tier research institute but also the chance to publish scholarly articles regularly and establish a global reputation.

Key job responsibilities include presenting high-quality papers and researchers at conferences, writing and debugging research codes, designing and implementing experiments to test research ideas, and several other related tasks.

Ph.D. degree holders or individuals who have completed a postdoctoral assignment in Computer Science or a similar field with a background in AI/ML and have experience in C/C++, Python, C# can readily apply for this position at Meta.

However, candidates with first-authored publications at peer-reviewed conferences will be preferred. Interested applicants can submit their applications through the official website of Meta. The company will conduct interviews in three windows starting from March and will continue till October 2022.

Gupshup acquires Conversational AI platform Active.ai

By

Dipayan Mitra

-

April 7, 2022

Conversational messaging platform providing company Gupshup acquires Active.ai, a leading conversational AI provider for banks and Fintech companies.

The acquisition will help Gupshup further strengthen its expertise in customer experience for its clients, especially those involved in the BFSI industry. However, no information has been revealed by either of the companies regarding the valuation of this acquisition deal.

In addition to this acquisition, Gupshup is also in talks with possible investors about raising $100 million to $200 million in pre-IPO fundraising. Last year in July, Gupshup raised $240 million in its additional funding round, which witnessed participation from investors such as Tiger Global, Fidelity Management, Think Investments, Malabar Investments, Harbor Spring Capitals, and White Oak.

During the funding round, Gupshup officials said that the raised capital would be used to expand the company’s market reach and to develop its business messaging platform further.

Recently, Gupshup acquired AI-powered cloud communication startup Knowlarity Communications. Gupshup planned to employ Knowlarity’s voice-based AI solutions for call centers and customer care to expand the abilities of its chatbot and AI-powered messaging service.

Co-founder and CEO of Gupshup, Beerud Sheth, said, “Active.Ai’s robust CBaaS platform adds more vertical depth to our product stack, giving BFSI customers the tools to create intelligent, frictionless micro conversations with consumers using voice, video and messaging channels.”

He further added that they are excited to welcome the Acitve.Ai team to the Gupshup family and look forward to leading the next wave of conversational engagement and commerce technologies.

Singapore-based artificial intelligence company Active.ai was founded by Parikshit Paspulati, Ravi Shankar, and Shankar Narayanan in 2016. The company is known for building an advanced, patented conversational AI platform for financial institutions, insurance firms, and capital markets that can be quickly deployed.

Axis Bank, Kotak Mahindra Bank, Tata Capital, IndusInd Bank, and HDFC Securities Ltd in India, NTUC Income, NIUM, and Tonik Bank in Southeast Asia, and Abu Dhabi Commercial Bank PJSC (ADCB), Qatar Islamic Bank, and Burgan Bank in the Middle East are among Active.ai’s most notable customers.

“In the conversational economy, business to consumer engagement that combines advanced natural language processing with deep enterprise connectivity is essential. Active.Ai’s conversational engagement platform powers leading financial enterprises across 43 countries,” said Co-founder and CEO of Active.ai, Ravi Shankar.