Monday, January 12, 2026
ad
Home Blog Page 344

Facebook Releases Code Of Its State-Of-The-Art Voice Separation Model

Facebook Voice Seperation Model

Facebook researchers open-sourced code of their work, “Voice Separation with an Unknown Number of Multiple Speakers.” Suppose, there is only one mic and there are multiple speakers, talking simultaneously. Can you separate the voices? For a human, it is easy. But, in the case of a machine, how do you do that? 

The single-source multiple-speaker voice-separation paper answers the question. It extends the state-of-the-art voice separation task to five persons, which were previously limited to two persons. Independent Component Analysis mostly addressed this task in the past. However, with the recent advent of deep learning, it is now possible to separate mixed audio containing multiple unseen speakers.

The main contributions as listed by authors are: 

  1. A novel audio separation model that employs a specific RNN architecture, 
  2. a set of losses for effective training of voice separation networks, 
  3. performing effective model selection in the context of voice separation with an unknown number of speakers,
  4. state of the art results that show a sizable improvement over the current state-of-the-art in an active and competitive domain.

Also Read: Computer Vision Has A New DeIT By Facebook

Previous methods were actually trained using masks for each voice. But this paper introduced a novel mask-free approach. In voice separation, two subtasks exist innately. First, improve the signal quality while screening out the noises and second, identify the speaker to maintain the continuity in the voice sequence. 

The author used utterance level permutation invariant training (uPIT) loss for the first subtask and mean squared error between the L2 distance between the network embeddings of the predicted audio channel and the corresponding source.

To avoid biases arising from the distribution of data and to promote solutions in which the separation models are not detached from the selection process, model selection was based on an activity detection algorithm.

Starting from the model that was trained on the dataset with the largest number of speakers — C, speech detectors are applied to each output channel. If it detects silence (no-activity) in any of the channels, it moves to the model with C − 1 output channels and repeats the process until all output channels contain speech.

Advertisement

Intel’s RealSense ID Now Offers Facial Recognition With Higher Accuracy

Intel RealSense

Intel is now offering RealSense technology to customers for facial recognition under the purview of RealSense ID. Complemented by LIDAR and Infrared sensors, the RealSense 3D cameras are currently the new game-changer in the industry, thanks to Amazon’s Reckognition fiasco. Customers will have access to faster facial recognition without technical glitches around racial components, light conditions, facial changes, or contact height.

Intel claims that RealSense ID has an unprecedented true-acceptance rate (recognizes you as you) of 99.7% with a chance of error in the 1000000 times. The spoofing rate (recognizes a recorded photo of you as you) stands at less than 1%. The timing reported per facial recognition is 1.5s (sensing a presence) + 0.8s (for facial authentication). You do not have to stand in a line for verification.

For privacy concerns, the captured images are stored on the device, and data is encrypted at all levels using the AES-256 scheme. The device uses an algorithm to give the photos an ID, and all further communication uses that designated ID without revealing any visual information. This is done by a neural network that sits at the base of its facial recognition.

Also Read: Intel India Launches An AI Research Center, INAI

Intel is currently offering two builds — Intel RealSense ID Solution F455 and F450. While the former is a ready-to-deploy model, the latter provides a custom solution for specialized use-cases. The company thinks the technology will be used at security checkpoints, ATMs, smart locks, kiosks, POS for verification purposes. The systematic details assure safety, but if we look at the scale of use-cases like airports, ATMs that run into millions, the accuracy level still becomes a challenge. A chance of false acceptance in 1,000,000 can cause security concerns.

The codebase behind the technology has been open-sourced long ago, ensuring no corporate or government backdoors entries. However, neural systems are prone to adversarial inputs. Hence, the adversarial security of these neural networks creates additional room for blunders.

The integration of Intel’s RealSense with Windows Hello remains an issue, so we will not be able to use the models with our laptops or desktops for authentication purposes. However, Intel is now trying hard to salvage its RealSense technology that was lying defunct until now.

Advertisement

OpenAI’s DALL·E Can Create Images From Text

OpenAI Dalle

OpenAI created a multimodal generative neural model — DALL·E — that can create images from text prompts given as input.  This neural network uses the 12-billion GPT-3 parameter version trained to generate images from text descriptions using a text-image pair dataset.

They put their experience of building GPT-3 and ImageGPT to show that manipulating visual concepts through language is now within reach. The researchers demonstrated that language can instruct a large neural network to perform various text generation tasks and generate high fidelity images.

The samples released for each caption are the top 32 of 512 images after reranking with CLIP. This procedure can be seen as a kind of language-guided search and dramatically impacts sample quality.

The demos that are released by the researchers showcase images of imaginary objects with both modified and preserved attributes. The model understands the three-dimensional visualization of items with their internal and external structure and can infer contextual details independently. However, the model goes beyond and also shows zero-shot visual reasoning, geographic knowledge, and temporal knowledge.

The network architecture uses transformers with simple decoder-only components. In essence, it is a language model that receives both the text and the image as a single stream of data containing up to 1280 tokens. It uses maximum likelihood training to generate all of the tokens. 

They use an attention mask at each of its 64 self-attention layers allowing each image token to attend to all text tokens. OpenAI’s DALL·E uses the standard causal mask for the text tokens and sparse attention for the image tokens with either a row, column, or convolutional attention pattern, depending on the layer.  The researchers will be publishing a paper soon detailing other elements of DALL·E.

There are always some trade-offs involved. Here are some granular points that were pointed out by the researchers: –

  1. The phrase determines the success rate
  2. OpenAI DALL·E confuses the associations between the objects and their colors when more items are present
  3. DALL·E can draw multiple copies but is unable to count past three reliably

You try creating images from text here.

Advertisement

CLIP From OpenAI Recognizes Images From Their Captions

OpenAI Clip

OpenAI’s Contrastive Language–Image Pre-training (CLIP) learns image representation from associated natural language. The intuition is to learn how to recognize a wide variety of visual concepts in images and associate them with their names. Thus, no preferential fine-tuning is required for many downstream tasks while benchmarking against the current state-of-the-art.

The researchers from OpenAI did not use curated labeled training data for CLIP. Instead, they obtained training data (400 million images and their captions) from the internet that are highly varied and highly noisy. Hence, it is a complete departure from the prevalent practice of using standard labeled datasets to train computer vision models that specialize in only one task.

The researchers used a simplified version of ConVIRT architecture. To make CLIP efficient, they adopted a contrastive objective for connecting text with images. The training objective was to predict a caption from 32,738 random ones, which is the correct one for a given image. After pre-training, natural language is used to reference learned visual concepts, enabling the model’s zero-shot transfer to downstream tasks.

Also Read: OpenAI Releases Robogym, A Framework To Train Robots In Simulated Environments

The pre-training distilled down to competitive performance on over 30 different existing computer vision datasets, spanning tasks such as OCR, action recognition in videos, geo-localization, and many types of fine-grained object classification. This result suggests that the zero-shot evaluation of task-agnostic models is much more representative of a model’s capability.

The most crucial part is the robustness of the neural net against adversarial data. Since the model is not directly optimized for the benchmarks, it learns much more rich representations that make it adversarially robust.

Even if the model seems to be versatile, it has the following limitations as reported by the researchers are:

  1. It struggles with more abstract or systematic tasks such as counting the number of objects in an image and more complex tasks such as predicting the spatial distance
  2. Zero-shot CLIP struggles with very fine-grained classification, such as telling the difference between car models, variants of aircraft, or flower species
  3. CLIP poorly generalizes to images absent in its pre-training dataset
  4. CLIP’s zero-shot classifiers can be sensitive to wording or phrasing and sometimes require trial and error “prompt engineering” to perform well.

To learn more about CLIP, have a look at the paper and the released code.

Advertisement

Computer Vision Has A New DeIT By Facebook

Facebook DiET

Facebook AI introduces DeIT (Data-efficient Image Transformer), a Transformer-based approach to train computer vision models. Over the years, Transformer has led to several breakthroughs in NLP, but the use of Transformer for image processing has been a new advancement of late. The idea behind Transformer is to move away from the popularly used image processing techniques like Convolutional Neural Networks as the new technique delivers exceptional results while decreasing the need for data and computation.

On 3 December 2020, Google also released a Transformer-based image processing technique — Visual Transformer (ViT), achieving state-of-the-art results on image classification by obtaining superior accuracy on the ImageNet dataset. They used external data that amounts to 300 million training images, which is yet to be released. But within mere 20 days, researchers from Facebook published DeIT, which was trained on a single 8-GPU node in two to three days (53 hours of pre-training, and optionally 20 hours of fine-tuning) with no external data. 

Researchers from FAIR built upon Visual Transformer (ViT) architecture from Google and used patch embeddings as input. But, they introduced a new transformer-specific knowledge distillation procedure based on a distillation token that brought down the training data requirement significantly compared to ViT.

Also Read: Top Image Processing Libraries In Python

It seems that Google Brain and FAIR researchers are trying to one-up each other. The ground details are below.

DeIT achieved competitive results against the state-of-the-art on ImageNet. When the pre-trained model was fine-tuned for fine-grained classification on several popular public benchmarks like CIFAR-10, CIFAR-100, Flowers, Stanford Cars and iNaturalist-18/19, it managed to secure the 2nd position in terms of classification accuracy in iNaturalist-18/19 with competitive scores in the rest of them. 

Currently, the FAIR team has released three models with varying numbers of parameters:-

Facebook

The reported tricks used to achieve such a feat are Knowledge distillation, Multi-head Self Attention layers (MSA) [with heads = 3,6,12] and certain standard image augmentation techniques like Auto-Augment and Random-Augment. They also used ADAMW optimizer and regularization techniques like Mixup and Cutmix to improve performance. 

Check out the paper here and the released code here.

Advertisement

OpenMined, In Collaboration With PyTorch, Introduces A Free Course Of “The Privacy AI Series”

OpenMined Privacy Course

OpenMined has released a course to train next-generation machine learning enthusiasts and practitioners to process sensitive data without breaching privacy. OpenMined is well known as a community focussed on developing tools and frameworks for AI that can work with data that can not be pooled centrally for privacy concerns. This course is a part of their collaboration with PyTorch to offer four free courses (The Privacy AI Series) on machine learning with privacy-preserving techniques. 

Currently, there are four courses planned to be offered — Our Privacy Opportunity, Foundations of Private Computation, Federated Learning Across Enterprises, and Federated Learning on Mobile.

At present, OpenMined has released the first course “Our Privacy Opportunity. The course is being offered free of cost along with a completion certificate. The best part is that you will be working on real-world projects while being mentored by world-class researchers with names ranging from Andrew Trask, PhD Researcher at the University of Oxford, Cynthia Dwork, author of Differential Privacy, Harvard, Ilya Mironov, author of Renyi Differential Privacy, FAIR, and more.

The course is aimed at dealing with current privacy infrastructures, their limitations, and building the foundations for upcoming courses on federated learning. As per the course design, it will take you around the privacy-transparency tradeoff and teach you about the principles of privacy. Moreover, the first course requires you to only invest a little over seven hours. At the end of the course, you will be able to come up with privacy product specifications on your own. 

The course has been structured for beginners and hence, assumes no prerequisites. It begins by defining information flow, then puts lights on failures in terms of privacy and transparency in the information structure. After exposing the lacunas of current information flow designs, the course builds upon structured transparency and its impact.

Register for the first course of The Private AI Series here.

Advertisement

PURE EV Develops Next-Gen AI System To Automatically Resolve Defects In Lithium-Ion Batteries Of Electric Vehicles

Pure EV

IIT Hyderabad-incubated Startup PURE EV has developed an Artificial Intelligence-driven hardware that automates identification and repair of defects in Lithium-Ion Batteries of Electric Vehicles. This next-generation technology completely does away with the requirement for customers to visit Service Centers to address battery defects.

PURE EV Researchers have designed Artificial Neural Network (ANN)-based algorithms for the system called ‘BaTRics Faraday,’ which identify the defects in various series in the battery and also auto heals them to the best of the electro-chemistry potential of the cells. This process is fully automated by the hardware and no manual intervention is required all the way till complete capacity restitution.

Extensive field testing of ‘BaTRics Faraday’ has already been completed. This system can be used for all five Two-Wheelers models launched by PURE EV (Epluto 7g, Etrance Neo, Etrance, Egnite & Etron+). This system will be rolled out in the first quarter of 2021.

Elaborating about this system, Dr. Nishanth Dongari, Founder, PURE EV, and Associate Professor, Department of Mechanical and Aerospace Engineering, IIT Hyderabad, “Lithium batteries are the most critical component of electric vehicles. They contain multiple lithium cells welded together in series and parallel arrangements to meet the desired voltage and ampere-hour (Ah) capacity. In case of any defects coming to batteries in any of the cell series, it leads to significant downtime to the EV owners. Additionally, the usage behavior pattern and Indian environmental conditions put more load on the batteries. It is, indeed, a very difficult task for battery OEM to get the defects rectified through diagnosis and replacement of defective series of cells.”

Further, Dr. Nishanth Dongari said, “Hence the need of the hour is to develop an external intelligent hardware device which resolves battery defects through an external healing process. PURE EV has developed AI-driven hardware which carries out the diagnosis and resolution of defects in the battery through an externally-connected device to the battery. This saves precious man-hours otherwise spent in replacement of series of cells and significantly reduces the battery ‘Turn Around Time’ (TAT).”

Lithium batteries repair is a challenging task currently in the market as the supply chains are not well established and reputed OEMs (Original Equipment Manufacturers) are not yet operational in this emerging segment. Electric Vehicle Customers are reliant on EV OEMs who, in turn, are reliant on battery OEMs.

In such a scenario any innovation that gives assurance to prospective customers stakeholders on the lower turnaround time will significantly boost their confidence level on emerging this technology. It is imperative for the electric vehicles to become mainstream that innovative solutions like ‘BaTRics Faraday’ come to market which ensures that the battery does not turn into an idling asset for EV owners

Highlighting the need for this technology, Mr. Rohit Vadera, Chief Executive Officer, PURE EV, said, “This intelligent device enhances the capability of PURE EV to turn around battery defects within a shorter time period. With the significant takeoff happening for EVs, PURE EV is building the necessary infrastructure and technical capabilities to become a pioneer in battery after-sales service for its esteemed customers.”

Mr. Rohit Vadera added, “PURE EV will be able to provide service in much lesser TAT and with the establishment of company-owned workshops pan India we intend to emerge as a reputed benchmark in the battery after-sales service standards.”

Currently, PURE EV has made this device operational based out of its factory. In future, PURE EV plans to establish company-owned high-end ‘Battery Diagnostics and Repair’ workshops at major demand centers across India. Such workshops will ensure a significantly lower TAT for battery repair across various geographies and the company deems this as one of the critical steps in the vision to become an established name across pan India in the EV space.

PURE EV has an in-house battery manufacturing facility and a research setup based out of IIT Hyderabad campus where the company’s dedicated R&D team works on core areas of battery thermal management system for development of long-range and high-performance Lithium batteries.

Advertisement

Google Cloud Is Offering Free Training On AI, Big Data, & More

Google Cloud Free Training

Google Cloud is offering free training on in-demand skills like AI, Analytics, Kubernetes, and more through its Qwiklabs. On 9 December, Google Cloud extended the offer of no-cost training and allowed users to register by 31 December 2020 to avail of the offer.

Google Cloud has been helping people navigate through the difficult times due to the pandemic by allowing learning to get started with the technologies. Since the lockdown, cloud providers have witnessed a steep rise in the adoption of cloud computing.

Looking at the recent trends, in 2021 organizations would require professionals who are not only proficient in the latest technologies like machine learning, analytics, and more but also familiar with levering such technologies on the cloud to develop products. For any developers, cloud computing skills has moved from nice-to-have to must-have skill as remote working has become the new normal.

To enable learners to master the skills of the future, Google Cloud is offering free access to its Qwiklabs for 30 days. You will have to copy the offer code and then click on enroll to begin. However, you should follow the instruction that mandates the completion of 30 minutes tour of Qwiklabs and Google Cloud to get 30-day training. If you only signup on Qwiklabs and do not finish the tour, you will not get access to the free training.

Every lesson will come with a lab that will give you free access to the Google Cloud Platform to learn for free. Although there are timers to end the session to avoid endless computing, you can start the session again to continue to learn and complete the courses. You will also get badges on completing the courses to showcase your knowledge.

Learn from a wide range of courses like Big Data, Machine Learning, Infrastructure & DevOps, Website & App Development, and more and upskill to stay relevant in 2021 and beyond.

Register for free training from Google Cloud here.

Advertisement

Graphcore Raises $222 Million In Series E At A $2.77 Billion Valuation

Graphcore Series E

Graphpcore, a UK-based AI chip producer, raises $222 million in Series E funding led by Ontario Teachers’ Pensions Plan Board, Fidelity International, and Schroders. Existing investors like Baillie Gifford and Draper Esprit also deepened their tie with Graphocre by participating in Series E funding. According to Graphocore, the investment will allow the company to further enhance its AI chips, software, and expand globally.

Founded in 2016, Graphcore is a pioneer in the developing Intelligence Processing Units (IPUs) that have outperformed Graphics Processing Units (GPUs). IPUs are optimized for processing AI-based workloads on the cloud.

Some of the early adopters of Graphocore’s IPU include Microsoft, Dell, Cirrascale, and more. For one, since November 2019, Microsoft has been offering access to Graphcore’s IPUs to selected users to innovate with high-speed processing of AI applications.

Graphocore has been evolving its processors and in July 2020 announced the second generation of GC200 chip. These chips are a part of its M200 IPU Machine that has four 7-nanometer GC200 chips. The latest GC200 chips have 59.4 billion transistors on a single 823 sq mm die, pushing the processing boundaries for projects involving neural networks.

To help developers make AI applications on IPUs, Graphcore has open-sourced PopLibs libraries to simplify the development process. Today, Graphcore’s IPUs currently support TensorFlow and PyTorch to enable developers to leverage ML-based products’ neural networks.

Advertisement

An Ultimate Guide To Data Science Career Path In 2021

Data Science Career Path

Data science career path strategy keeps evolving due to the varying demand for organizations. Over the years, aspirants with minimal knowledge could land a data position because there was a dearth of talents. However, today millions of people are learning data science, leading to enough aspirants for job openings. Unlike yesteryears, you might not get a job offer if you struggle to differentiate among other applicants in 2021. To ensure you learn and get data science jobs, you have to devise an effective data science learning path in 2021.

Here is a 21-step data science career guide for 2021:-

  1. Develop Problem Solving Aptitude: More often than not, aspirants try to learn data science because of the hype. As a result, they ignore the skill of developing rigor for solving business problems. You need to have the curiosity to find challenges in day-to-day lives and a passion for solving problems. Either there are shortcomings with the way products and services are delivered or issues that are ignored altogether. When data is everywhere, you should find ways to leverage data science techniques and mitigate pain points for businesses in the digital age.
  2. Learn Structure Thinking Framework: Structured thinking is the art of applying a framework to an unstructured problem to simplify the process by understanding intricacies at the macro level. Beginners often try to fit machine learning techniques into problems from the first go because they lack the ability to think structurally for any situation. Instead, it would help if you mindmap how a problem can be solved from the beginning till the end. This does not mean that you will have a perfect framework, but an overall approach to solving problems will streamline the entire process.
  3. Understand the Basics Of Data Science: When you have a problem-solving aptitude and structured thinking ability, you need to acquire skills to solve problems. For this, you should read several blogs and talk to data science practitioners to understand the scope of machine learning, data, and more. Some problems cannot be solved with machine learning techniques. Consequently, you will know where you can apply data science practices and where you cannot.
  4. Explore Different Domains: This is one of the crucial stages in your data science career path. You cannot master a lot of domains in one go as every sector has its share of challenges that may require a completely different approach. It is recommended to figure out how machine learning is used in other domains; this will give you a heads-up regarding standard practices in specific sectors like BFSI, retail, and more. Besides, if you are passionate about a particular domain, you can even effectively strategize your data science career path from the very beginning.
  5. Learn Programming Language: Unfortunately, most of the aspirants start with learning a programming language. Non-technical aspects like the right attitude, critical thinking, structural thinking, and storytelling are equally important. Make sure that you go through the above four steps before learning a programming language. You can either learn Python or R programming languages to start. But, do not fall for questions like Python vs. R programming for data science. Further, get started with IDE or use Jupyter Notebook with Anaconda to isolate environments.
  6. Master Statistics: Since statistics and mathematics are the cores of machine learning, begin with descriptive statistics, and gradually move ahead to master inferential statistics. Most of the time, beginners obtain an overall idea of inferential statistics and rely on libraries to carry out statistical analysis. Although this can help complete the task at hand, a weak foundation can limit your ability to think and explore data.
  7. Grasp Mathematics: Mastering mathematics is vital for a data scientist as one can come up with their own methodologies instead of depending on existing libraries. However, to begin, you should know about logarithm, exponential, linear algebra, and more. As you progress, learn calculus and other optimization techniques.
  8. Attend Meetups/Conferences: Engaging with like-minded people can keep you motivated during your learning curve and improve your storytelling skills. Meetups offer a completely different learning experience than the regular online videos, as you can get real-time suggestions or help for your specific challenges from others. Besides, you can also get inspiration by following top data scientists trying to solve strenuous problems with data science.
  9. Master Python Libraries: After getting familiar with Python’s fundamentals, learn the most common libraries like Pandas, Numpy, Matplotlib, Seaborn, Skit-learn, and more. You would require these libraries for almost every project you will work on. Mastering these libraries will save you a few Google searches to speed up your tasks. 
  10. Learn Exploratory Data Analysis (EDA): Exploratory data analysis is the first step in data analytics, where data is assessed to discover patterns, spot outliers, evaluate the spread, and more. A proper exploratory data analysis can help structure the entire process of the project. It will also play a crucial role in assessing your Python and SQL skills; you can choose from Titanic, Netflix movie recommendations, and house price prediction datasets to practice exploratory data analysis.
  11. Data Visualization & Storytelling: Being proficient in data visualization helps understand the data and allows practitioners to tell compelling stories. Since visualization summarises the entire data to communicate immediately, learning to plot can provide an edge over others.
  12. Supervised Machine Learning: As you advance in your data science career path, supervised machine learning is where your machine learning journey begins. Start with simple methods like classification and regression. You will also come across various terminologies like overfitting, underfitting, bias-variance tradeoff, and more. Other standard techniques include linear regression, logistic regression, ridge regression, lasso regression, decision tree, KNN, and Naive Bayes.
  13. Advanced Supervised Algorithms: After supervised machine learning, you can focus on advanced algorithms like the random forest, XGBoost, Catboost, GBM, SVM, and others. These techniques in several use cases help in further optimizing your algorithm to get superior results.
  14. Unsupervised Algorithms: Unlike supervised learning, there are no corresponding values for the input you provide. Unsupervised learning includes clustering and association to unearth patterns that, in most cases, the human cannot. Some of the popular algorithms are K-Means, Hierarchical clustering, DBSCAN, PCA, LDA, and more.
  15. Advance Hyperparameters Tuning Methods And Model Performance: While the above algorithms can help you obtain fairly optimal results, effective hyperparameter tuning can be the game-changer for your machine learning models. Learn techniques like Grid search, random search, Bayesian, and understand different model performance metrics for classification and regression.
  16. Recommendation Engines & TimeSeries Forecasting: Personalizations has become the differentiating factor for many organizations to capture the market. As a result, expertise in recommendation engines becomes crucial for you to learn. Besides, time-series forecasting is another commonly used technique to understand the occurrence of events and predict outcomes. Consequently, you should know SVD and work on recommendations engine projects.
  17. Participate In Competitions: The best way to remember most of what you learned is by practicing in Hackathons and Kaggle competitions. Besides, you can start teaching others by writing blogs and creating Youtube videos. Creating content and participating in contests puts your focus on acquiring in-depth knowledge about several machine learning topics. At this stage, you can also apply for internships to learn while working on real-world projects at data-driven companies.
  18. Neural Networks: Neural Networks can be a very vast concept depending on the use cases. However, you can learn techniques like Artificial Neural Networks and master frameworks like TensorFlow or PyTorch.
  19. Basics Of NLP: As per various studies, 70 to 80 percent of data in organizations are unstructured. This makes natural language processing a crucial technique to bring value from unstructured data. Essential methods involved in this are tokenization, stemming, and lemmatization.
  20. Basics Of Computer Vision: Computer Vision has gained traction due to the numerous use cases. But bias in computer vision is limiting the adoption of the technology. This, in contrast, opens up the opportunity to blaze the trail and develop reliable computer vision-based products. Some of the crucial techniques to learn are CNN and transfer learning.
  21. Apply For Jobs: Eventually, you can apply for data science jobs to work with experts and advance your data science career path in organizations. Jobs can sometimes be correlated to your visibility in the industry. Therefore, you should increase your visibility by publishing blogs, networking in conferences, and being active on LinkedIn.