Top Data Science Skills To Get Entry-Level Jobs

By

-

September 28, 2020

Aspirants often think data science skills like programming and fundamental analysis techniques are enough to land data science jobs. Although these skills are essential, it does not guarantee you job offers as every other applicant is equipped with similar skills. You cannot become a data scientist by just learning a programming language and some lucrative machine learning libraries. The key to getting a data science job is to differentiate yourself from the rest. Consequently, you need more than just programming languages to increase your chances of getting a job in a highly competitive market.

We list down data science skills that will set you apart from other freshers applying for entry-level data science jobs.

Programming Language

It is a no-brainer to discuss Python vs R, as data science is not only about the tools. You can use any tool as long as you can get the desired output. Learn a programming language of your choice and start analyzing data by getting familiar with popular libraries. A data scientist is a problem solver, not only a python programmer,” says Chiranjiv Roy, chief of data science & AI products of Access2Justice Technologies.

Data Analysis and Machine Learning

Learn data analysis and machine learning processes from any free or paid course to strengthen your basics. You cannot learn everything in machine learning at once, but go one full stretch to understand the concepts. Eventually, you can come back and learn in-depth in any subdomain of machine learning such as computer vision or NLP or others. Trying to learn everything in data science can make you a jack of all trades and master of none, which will not help you during interviews.

Inferential Statistics and Mathematics

Statistics and mathematics are the foundation of data science. Even if you do not know programming skills, you can still analyze data with no-code platforms. However, beginners mostly focus on programming languages, ignoring the foundation of data science–statistics and mathematics. This does not mean that you should ignore programming languages. But, at the same time, you should have in-depth knowledge of statistics and mathematics.

Aspirants, to move fast in data science, mostly due to online courses that focus on teaching too many things at once, skim over statistics and mathematics. They rely on importing statistical modules from libraries and carry out analyses. Such practices fail them during interviews because they struggle to explain the statistical procedures behind the code implementation.

Moving too fast in data science is not the way forward. It takes years to become a data scientist. You should ensure you have obtained a strong foundation before you move on to other concepts in data science.

Data Intuition

Learning numerous data science techniques will not make you a data scientist if you do not know how to apply those. You have to contextualize while working with data to implement the best approach to solve problems. Often data intuition is overlooked by beginners because they believe that it is only vital while handling large data. Although it is an essential skill while managing a colossal amount of information, the ability to make out the most from less data is equally important. Given a data (small or big), you should be able to quickly think of approaches that can be best suited to bring business value.

Storytelling

Storytelling is one of the most important data science skills. While working in organizations, you will have to work with decision-makers, who mostly will be unaware of data science terms. But, if you try to communicate with them using data science jargons, decision-makers might not assimilate your data science approaches. If you fail to communicate with business leaders about your analysis in straightforward or simple terms, your machine learning models might never go into production. Such instances will not only negatively afflict the organizations but also your reputation as data scientists.

Storytelling is not only for internal communication. In organizations, you will have to talk to clients who might not be aware of data science terms. To ensure they understand your models and results, you need to communicate in a way that is easy to understand by all.

Proficiency In Working With Unstructured Data

While learning from data science courses, learners often get tabular or structured data for their projects. In organizations, however, you are not always provided with prepared data. You need to gather unstructured data from various sources. As per a report, 80 to 90 percent of data in organizations is unstructured. Over the years, organizations were focusing on tabular data, but now businesses have realized the essence of unstructured data. Therefore, being proficient in handling unstructured data can differentiate you from other applicants and help you land a data science job.

Software Development Understanding

Usually, beginners fall for deep learning techniques to create appealing projects or use cases. In reality, deep learning is not crucial for most organizations. Making a product out of deep learning models requires huge computation power, which is not always feasible to develop, although it might give the best results.

Most of the business problems can be solved with simple supervised machine learning models. You must try to avoid using fancy deep learning techniques until necessary. This will help software developers to productize their machine learning models. Therefore, never try to bring in deep learning models to demonstrate your data science skills in interviews when the simple machine learning models can, more or less, deliver the same results. Remember, a Jupyter Notebook model running on the cloud will not derive tangible benefits expected to “Solve the problem” hence application development is the key for a data scientist as everyone loves to see an application to play with,” said Chiranjiv.

Top Data Science Podcasts You Should Follow

By

Ratan Kumar

-

September 24, 2020

Data science podcasts are one of the best sources to gain information from some of the best minds in the industry. Machine learning practitioners who have made it big in the field talk about the latest developments, best methodologies, and future of the data science space. Listening to researchers and developers who have furthered machine intelligence technology clarifies confusions in the market as well as brings fresh perspectives to listeners. Whether you are an aspirant or practitioners, you should listen to these artificial intelligence podcasts and stay abreast of the latest trends in the industry. Today, there are numerous machine learning podcasts, making it difficult to follow along. But, you can always be selective while listening according to your area of interest.

We list down 10 data science podcasts that you can subscribe to and stay informed.

Note: This list is in no particular order.

Lex Fridman Podcast

Lex Fridman Podcast, by many, is considered as the best artificial intelligence podcast. Over the years, top researchers, as well as practitioners, have been a part of this podcast. Unlike others, Lex Fridman Podcast host lengthy conversations that can go as long as 4 hours. However, usually, it is 2 hours long. Earlier this podcast was only focused on artificial intelligence, but he is now inviting researchers from other fields like neuroscience, physicists, chemistry, historian, maths, and more. Started in 2018, it is a weekly podcast that quickly gained traction in the data science field; it is a must for any data science enthusiast.

Data Skeptic

Data Skeptic is one of the oldest data science podcasts that has covered a wide range of topics. Since 2014, it has been catering to the curiosity of machine learning practitioners every week. This machine learning podcast is usually 30 minutes long, thereby making it an ideal length of most of the listeners. If you are interested in statistics, critical thinking, and efficiency of approaches in machine learning, this is the go-to podcast for all your needs.

The TWIML AI Podcast

Hosted by Sam Charrington, The TWIML AI Podcast (formerly known as This Week in Machine Learning & Artificial Intelligence) started in 2016 and has over 410 episodes. Charrington hosts top influencers from the data science field to discuss trends and best practices. The episodes are 40 to 60 minutes long, which are just enough for sharing ideas without repeating.

Practical AI

As the name suggests, Practical AI, along with new developments in data science, focuses on real-world implementation. The weekly artificial intelligence podcast has 60 minutes long episodes, featuring technology professionals, researchers, and developers to engage in exciting conversation on machine learning. Started in 2018, this is one of the best data science podcasts that everyone should keep an eye on.

The AI Podcast

The machine learning podcast, The AI Podcast, is produced by NVIDIA–a leading graphic processing unit provider. This podcast was started in 2016 and has over 120 episodes. NVIDIA’s podcasts are top-rated among professionals who are more interested in computing power in artificial intelligence. Top developers, researchers and leaders from NVIDIA share their experience and knowledge about machine learning. Influencers from different organizations like from NASA, Lenovo, Ford and more are invited to bring a fresh perspective to the listeners of this podcast.

Making Data Simple

Hosted by VP of Data and AI Development of IBM, this podcast is another highly recommended content in the data science landscape. Making Data Simple is in many ways different from other data science podcasts; it focuses on demystifying the technologies for the general public. Instead of in-depth research topics, the 30 to 40 minutes of conversation are a perfect starting point for beginners.

Brain Inspired

Brain Inspired is a long-form podcast–around 2hr–that is intensely focused on neuroscience and artificial intelligence. Experts from different walks of life are invited to talk about deep learning and machine learning techniques. This is the best deep learning podcasts for experts or practitioners who have a deep understanding of neural networks. For beginners, the episodes can be overwhelming due to constant bombardment of information about data science techniques.

AI In Business

Although started late last year, AI in Business produces episodes at scale. You may witness 2 to 3 episodes in a week, where Daniel Faggella–the host–talks to the best minds in the data science space. The topics are more general like the evolution of AI chips, trends in machine learning, adoption of facial recognition, and more. This makes it suitable for both beginners and experts of the industry.

Not So Standard Deviation

Recently, Not So Standard Deviation completed its fifth anniversary of the phenomenal data science podcast. Roger Peng and Hilary Parker have been indulging in conversations for the last five years with data science experts to spread knowledge of the latest trends. You can find numerous hour-long episodes to stay informed of the ever-changing data science market.

Talking Machines

Talking Machines is another classical machine learning podcast started in 2015, where the host talks to researchers from several blue-chip companies. However, Since June, the podcast has been paused for a while to reflect on the anti-black racism. Irrespective of the break, you can still listen to the wealth of past talks and gain exciting insights.

Outlook

Several classical podcasts on data science have been either paused or stopped entirely, but you can still access their episodes to obtain knowledge from data science influencers. Although Data Crunch, Data Stories, Partial Derivative, and Linear Digressions are some of the closed or inactive podcasts, you should listen to these artificial intelligence podcasts based on your interests. Besides, you can also listen to other active machine learning podcasts such as SuperDataScience, Eye On AI, and Data Engineering to gain data science knowledge.

Friends App One Of The Fastest-Growing Indian App

By

Purodhika Sharma

-

September 24, 2020

Friends App, India’s latest venture in the Digital World, has attracted heaps of people within a few days of its launch. The app is growing at a considerable pace. With almost 5000+ downloads already, the number is increasing with each passing day.

The app is a social networking app that lets you post short videos within India and anywhere across the globe.

The Government of India wants to engage the utilization of Indian apps among its residents to give Indian technology an open door and acknowledgment for making the vision of Digital India a reality. The app can be perfectly categorized as a “Made in India” venture, being founded by Manju Patil and Mrityunjay Patil, from Bangalore.

The Friends app is a free and user-friendly application launched in July 2020. It is a single platform to create, post, and share posts.

These days, when all we want is to stay connected to our friends and family, this application will keep you close to your near and dear ones across the world. The best part of the app is that you can watch and share viral videos in your language within seconds. The languages supported include Hindi, Malayalam, Tamil, Telugu, Kannada, Marathi, and English.

Now, when TikTok has been banned in various countries, this app has turned out to be a savior for youth. You can capture fascinating moments in the form of videos, and share it with friends with a single tap! Apart from this, the app also allows you to chat and interact with new people, browse through feed, follow the creators you like, among other things.

The application that is available on the Google Play Store is an all-purpose entertainment platform that works with lightning speed.

What makes the Friends app even more certain is that you can easily share your created videos and posts on Facebook or Whatsapp. Even the people who do not have the app can see the videos through the shared link.

The Friends app is a splendid platform for all types of creators. From singers, actors, comedians, dancers, and other talented people, the app is a safe space to showcase your talent. The Friends app users can get innovative with the creation of Whatsapp status, videos, audio clips, GIF, and photos.

The application has been trending on Google Play Store for quite some time now. Not only this, but the Friends app has also got a 4.8 rating on the same store.

It is easy to use and offers data privacy. The application is drafted to understand user preferences, and it tries to filter the content according to the user. It means that the app provides personalized content to all its users to curate their feeds.

You can follow your favorite artists from the glam world to funny and viral videos. See the contents you like and get updated whenever your favored artist posts a video.

With almost 5000+ downloads within a month, the Friends app is already trending on the Google Play Store. It is a secure platform for the creation of content. The number of users is expected to rise inevitably in the coming days, making it one of India’s fastest-growing apps.

Microsoft Gets Exclusive License Of GPT-3

By

Analytics Drift

-

September 23, 2020

Microsoft announces access to GPT-3 with an exclusive license to integrate the largest language model in its products and services. An exclusive license of GPT-3 will allow Microsoft to blaze the trail and develop an advanced Azure platform to further the development of artificial intelligence.

“Our mission at Microsoft is to empower every person and every organization on the planet to achieve more, so we want to make sure that this AI platform is available to everyone–researches, entrepreneurs, hobbyists, businesses–to empower their ambition to create something new and interesting,” wrote Kevin Scott, executive vice president and chief technology officer, Microsoft, in the blog.

But, how did Microsoft get access to the exclusive license of GPT-3?

In June last year, Microsoft invested $1 billion in the AI research lab, OpenAI, to support the development of AGI. During the announcement, they also shed light on their plan of developing a supercomputer for OpenAI researchers to build large-scale AI models. Besides, OpenAI, in return, agreed to provide license of intellectual properties to Microsoft for commercializing its artificial general intelligence (AGI) technologies.

Although Microsoft gets the exclusive license of GPT-3, it will continue to offer access to the models through its Azure-hosted API. The pricing of the GPT-3 was revealed earlier this month, which was notified to the users who received access to OpenAI’s largest language model.

Since the API release for selected researchers, hobbyists, and developers, users have developed solutions that can generate code automatically, write articles, and more. But people were equally critical of such solutions for misleading the general public by making them believe the superiority of GPT-3 that actually does not exist. It’s just a large model that has been trained on a colossal amount of data. It does not have any cognitive.

Even OpenAI CEO, Sam Altman, tweeted about the GPT-3’s hype.

The GPT-3 hype is way too much. It’s impressive (thanks for the nice compliments!) but it still has serious weaknesses and sometimes makes very silly mistakes. AI is going to change the world, but GPT-3 is just a very early glimpse. We have a lot still to figure out.
— Sam Altman (@sama) July 19, 2020

However, with GPT-3 at the helm of Microsoft, the model can provide the company with an edge over its competitors; Microsoft will have access to the in-depth functionality and code of GPT-3, thereby empowering it to innovate better than the ones with just the API.

TikTok’s Deal With Oracle Is The Death Of Tech Innovation

By

Analytics Drift

-

September 20, 2020

TikTok has bogged down to the abuse of power by Trump’s administration and has made a clumsy deal with Oracle and Walmart. Both the US firms will collectively own 20% (Oracle’s 12.5% and Walmart’s 7.5%) of the new company–TikTok Global–that will spring out from ByteDance to go public within a year. TikTok Global will be listed in the US stock exchange after raising pre-IPO fund to allow Walmart and other existing investors like Sequoia Capital, General Atlantic, and Coatue Capital to pour in the money and get a lion share of 53% in ByteDance by the US companies, which currently amounts to up from 40%. The deal also includes generating 25,000 jobs at TikTok in the US.

pic.twitter.com/jWxjnAIwZQ
— TikTok_Comms (@tiktok_comms) September 19, 2020

Oracle will not get access to the core algorithm of TikTok that provides an edge to the video-sharing application. However, TikTok will leverage Oracle’s cloud to store US users’ data. Besides, Oracle will monitor the way data is being processed to ensure China does not has a play in manipulating or getting hold of sensitive information. Even though the deal did not happen the way Trump initially expected, but he was ecstatic with this messy agreement because the intention was to disorient the company, not the privacy.

TikTok’s deal with Oracle, in reality, has no significance pertaining to the privacy of users’ data. The US is paranoid by the technology of TikTok, and its potential to drastically reduce the dominance of social media giants like Facebook, Instagram, Twitter, among others. This is apparent with the exponential rise of TikTok’s user base in the past few years; ByteDance is the most valuable private company in the world, with a valuation of $150 billion. The company made a revenue of $17 billion in 2019 and a net profit of $3 billion.

TikTok has an advanced recommender system that no other social media company can replicate. Not only the Indian companies that introduced short-video sharing applications after the ban of TikTok in India failed to cater to users’ demand but also Instagram’s Reels could not deliver. TikTok is head and shoulders ahead of these inutile applications. More recently, in a desperate attempt to fill the void left by TikTok in India post its ban in June, YouTube has released Shorts in India.

With no competitors in sight, the US government assimilated the unprecedented growth of TikTok and ran a narrative of possible threat to the nation. Trump administration just cannot resist the rise of any other tech company, especially if it is Chinese grown. Huawei, another technology giant that outpowers American companies in 5G, is banned from participating in 5G development in the US. While the unfair practices have been proved fruitful for the US and its firms, how long will this strategy work? Undoubtedly, the Chinese started it by using the governments’ power to minimize the penetration of Google and other tech companies in its country, but the US has started walking on its competitor’s path. If we continue to slacken the proliferation of technology to protect bogus national interests, we will kill innovation. Instead of embracing new technology, we are at a stage where jingoism has taken over innovation.

If baseless beliefs of privacy breach hijack our decisions, we in future might witness other countries do the same. What if India does with Facebook, Google, and Microsoft, what the US did with TikTok? It would be an end game for tech innovation. Unlike TikTok, Facebook, Google, and Microsoft have been guilty of failing to protect users’ data. It’s more logical to ban these companies if they do not sell their stake to the organizations of countries they offer their services to.

Unfortunately, TikTok, a company that revamped the way content is delivered and consumed with innovation, became the scapegoat. All of this, while no fraudulent practices have been established. TikTok’s deal with Oracle is the death of tech innovation.

With GitHub CLI, You Can Now Use GitHub In Your Terminal

By

Analytics Drift

-

September 18, 2020

GitHub CLI has been made available for all after being in beta since its official announcement on 12 February 2020. With GitHub CLI, you can use GitHub from the command line to simplify your workflow with version control. Developers usually switch between terminals and browsers while handling pull requests, mergers, and changes. But, with GitHub CLI, they can now operate entire GitHub from terminals.

Is GitHub CLI Really Effective?

Post the announcement, developers have used GitHub CLI to create 250,000 pull requests, made 350,000 mergers, and generated 20,000 issues. During the release it was only available for GitHub Team and Enterprise Cloud, but not for GitHub Enterprise Server. However, one can now use the solution on-premise with GitHub Enterprise Server, which will further increase the user base of GitHub CLI.

To make it easy for the users, GitHub CLI can also be customized the commands using gh alias set. Therefore, you will not be required to adapt to an entirely new workflow. “And with the powerful gh api allowing you to access the GitHub API directly, there’s no limit to what you can do with gh,” notes the author on GitHub blog.

Besides, you can list pull requests, view the details of request changes, create pull requests, among others. By bringing every functionality right in the terminal/command line, GitHub has mitigated a significant pain point of developers as they do not need to switch between windows for managing projects.

Start cloning repositories and control the projects without leaving your terminal:

What’s best about GitHub is that you do not need to configure anything in the terminal. Just download it and start using the commands and bring GitHub in your terminal.

Over the month, this open-source initiative has been iterated with the help of developers to bring advance features and make the life of users more comfortable. This has helped GitHub in releasing a stable version yesterday, which can enhance the productivity of developers without revamping their practices of managing projects.

How To Land A Data Science Internship?

By

Analytics Drift

-

September 17, 2020

Data science internship opportunities are at plenty, but aspirants struggle to get internship offers that could open up the doors for their career. This is because beginners do not follow the right approach while applying for internships to increase their possibility of differentiating among other applicants. To ensure you get an internship opportunity, you need to follow best practices to increase the likelihood of landing a data science internship. A well-devised strategy while hunting for internships is essential to not only reduce the friction but also get internships at organizations you want to work with.

In this article, we list down a step-by-step guide that can help you get an internship in the data science domain.

Programming And Courses
A strong foundation is what you should be focusing on in the beginning. Learn any programing language be it Python or R and become proficient by practising on platforms like HackerRank. Do not fall for some comparison among the best programing language for data science. Just be good at whatever you choose as a language to solve problems. “A data scientist is a problem solver not only a Python programmer,” said Chiranjiv Roy. Do not waste your time debating or searching for answers about the best programing language that leads nowhere.

You can enrol in free courses on platforms like Udacity, Coursera, and edX, to learn the programming language. Following this, you can take courses for statistics and mathematics. Further, you can also learn data analysis and machine learning techniques on any edtech platform.
Certifications Are Not Necessary To Get Data Science Internships
Another misconception among data science aspirants is that they think certifications are necessary to get internships. This misconception is usually spread by influencers on social media to promote paid data science courses on their pages. Organizations want skills, not certifications; even blue-chip companies do not seek certifications while hiring for data science jobs, let alone internships. If you prefer guided learning, you can enrol in paid courses, but do not pay for the sake of obtaining certificates.
Build Portfolio
Instead of certificates, showcase your worth with data science projects. However, doing the usual projects will not demonstrate your skills. You can start with common projects but gradually progress toward solving real-world problems. Do not always rely on getting data from Kaggle or similar platforms to prepare data for you. In organizations, you are given a business problem and asked to solve using data science skills. Often, you will have to wrangle data from different sources before you can implement machine learning techniques. Doing a project that requires efforts from the gound-up will strengthen your position to receive internship offers.
Avoid Relying On Job/Internship Portals
Most of the aspirants randomly visit several online portals and start applying to every internship posting. Data science being a lucrative job, thousands of people apply for any possible opportunity to get started in the domain. If you follow others and rely only on job portals, you will fail to differentiate yourself from the rest. This does not mean that you should completely ignore online portals but do not burnout yourself by randomly applying for internships to as many as you can. But, where should you focus on getting data science internships? LinkedIn.

LinkedIn is the best platform that you can use to get data science internships. It can be leveraged to share your data science learning/project through posts and articles. This will increase your visibility among decision-makers or other data scientists, which will help you in getting an internship or job offer.

In addition, you can personally reach out to data scientists on LinkedIn, discuss data-science related topics, and eventually ask for referrals for internships. Referrals are one of the best techniques to get internships.

Outlook

While there are numerous ways in which you can showcase your ability, the steps mentioned above can help you land a data science internship quickly. Undoubtedly, winning hackathons, and progressing on Kaggle will help you too, but those can be daunting for beginners as it takes time.

Microsoft Launches AI Classroom Series In India For Free

By

Analytics Drift

-

September 16, 2020

Microsoft and NASSCOM’s FutureSkills collaborated for launching AI Classroom Series to impart artificial intelligence skills to 1 million students by 2021. This initiative, however, is only for Indians who are enrolled in Indian universities and are residents of India. This step is in an attempt to comply with their commitment towards making people ready for the new age technologies. While Microsoft wants to skill 25 million people with its global skilling initiative, NASSCOM is devising and executing plans to promote skilling as a national priority.

Microsoft AI classroom is divided into three modules: Data Science Basics and Introduction to Microsoft AI Platform, Building Machine Learning Models on Azure, and Building Intelligent Solutions using Cognitive Services, which start from 21 September 2020. Since the sessions will be delivered in numerous ways: live demos, workshops, and assignments, one will have to choose the preferred timeslots for three days. Every session will be of 150 minutes long.

For registration, you will need to provide your college name, email id, name, and contact number. After verification, you will receive a confirmation mail about the booked timeslot, which usually takes 5-10 minutes. However, the tedious part is that you will have to register separately for all three sessions.

The registration page fields can be confusing, as it looks like they are targeting working professionals with mandatory fields like Job Role, Company Name, and Company Size. Although Microsoft has mentioned that one can enter the college name in the ‘company name’ field, they have not specified what you should enter in the ‘job role’ field. But, we entered ‘student,’ and the registration was verified.

Source: Microsoft Blog

After completing the modules, you will have to take an assignment to receive a certification of participation, which can be taken between 28 September to 10 October. You will have 20 minutes to answer 30 multiple-choice questions, where you need to score 80% or more to qualify for the certificate. If you fail to pass in the first attempt, you will have another two attempts to clear the assessment.

As a pre-requisites, you will have to set up the environment by installing VS Code, set up Jupyter Notebook in VS Code. You can follow the below steps.

In addition, you will be required to activate your free Azure Student account, create a GitHub profile. One can also leverage the benefits of GitHub Student Developer Pack, which offers access to paid tools for free.

YouTube Releases Tik-Tok’s Clone Called Shorts

By

Analytics Drift

-

September 15, 2020

YouTube released Shorts, a Tik-Tok clone for short-form video experience. At first, new features have been rolled out in India and later on will be introduced in other countries. However, for now, you can only create 15-second-long-videos with YouTube Shorts. “User-generated short videos were born on YouTube starting with our first upload, a short 18-second video called ‘Me at the Zoo,’” mentioned YouTube in its blog.

Similar to TikTok, the beta of Shorts will come with a few tools that will allow creators to edit videos on the go. YouTube is also committed to enhancing the features and providing power at users’ hands to make appealing videos. Currently, it has features to append video clips, record with music, and speed control. YouTube has a new create icon on Android applications, which can be witnessed on the homepage.

Since the ban of Tik-Tok in India, there has been a colossal amount of applications trying to fill the void created by the most popular Chinese short-video app. Tik-Tok had over 200 million users in India before its ban in June. While Instagram is striving to replicate Tik-Tok with Reels, it has failed to engage users mostly because of ineffective recommendation accuracy. Along with useful in-app editing features, the state-of-the-art recommender system of Tik-Tok was its uniques selling point.

instagram keeps pushing this person they poached from tiktok (im guessing with $$$) at the top of my explore feed to get me to look at Reels (the tiktok clone)

would be curious to know if it’s working generally on people. i am not seeing a ton of reels stuff personally pic.twitter.com/a7UXKQSb60
— rat king (@MikeIsaac) August 28, 2020

Unlike other social media platforms, creators were able to produce engaging content right from the Tik-Tok application. Such intuitive features were a huge miss on other social media platforms such as Instagram, Facebook, and LinkedIn. One had to use tools outside these applications to create intriguing videos.

While Reels, Moj, Roposo, among other applications, are replicating the video editing features, recommending according to users’ preference has been a huge miss in these applications. As per reports, Tik-Tok will not be sharing its code after the deal of its US operations with Oracle or any other firm.

We will have to wait and watch to see if YouTube can triumph with Shorts–not only with in-app editing features but also with the recommender system.

With DeepSpeed, You Can Now Build 1-Trillion-Parameters NLP Models

By

Analytics Drift

-

September 14, 2020

Microsoft enhanced its open-source deep learning optimization library DeepSpeed for empowering developers or researchers to make 1-trillion-parameters models. Initially, when the library was released on 13 February 2020, it enabled uses to build 100-billion-parameters models. With the library, practitioners of natural language processing (NLP) can train large models with reduced cost and compute while scaling to unlock new intelligence possibilities.

Today, many developers have embraced large NLP models as a go-to approach for developing superior NLP products with higher accuracy and precision. However, training large models is not straightforward. It requires computational resources for parallel processing, thereby increasing the cost. To mitigate such challenges, Microsoft, in February, also released Zero Redundancy Optimizer (ZeRO), a parallelized optimizer to reduce the need for intensive resources while scaling the models with more parameters.

ZeRO allowed users to train models with 100 billion parameters on the existing GPU clusters by 3x-5x times. In May, Microsoft released ZeRO-2 to further enhance the workflow by allowing 200 billion parameters for model training with 10x the speed of the then state-of-the-art approach.

Now, with the recent release of DeepSpeed, one can even use a single GPU for developing large models. The library includes four new system technologies that support long input sequences, high-end clusters, and low-end clusters.

With DeepSpeed, Microsoft offers a combination of three parallelism approaches—ZeRO powered data parallelism, pipeline parallelism, and tensor-slicing model parallelism.

In addition, with ZeRO-Offload, NLP practitioners can train 10x bigger models using CPU and GPU memory. For instance, with NVIDIA V100 GPU, you can build a model up to 13 billion parameters without running out of memory.

For long sequences with text, image, and audio, DeepSpeed provides sparse attention kernels, which powers 10x longer sequences and 6x faster execution. Besides, its 1-bit Adam–a new algorithm–reduces communication volume by up to 5x. This makes the distributed training in communication-constrained scenarios 3.5x faster.