In a move to influence Google to make necessary changes in its ethical AI practices, three groups — Black In AI, Queer in AI, and Widening NLP — release a joint statement. All three groups work toward elevating underrepresented voices in artificial intelligence and work in collaboration with Google, Apple, IBM, Microsoft, NVIDIA, DeepMind, and more.
Google and DeepMind are some of the gold and diamond sponsors of these groups, making significant contributions to fund the initiatives of the community. But, due to the recent frenzy by Google with its top Ethics AI researchers Timnit Gebru and Dr. Margaret Mitchell.
While both the researchers claim that Google fired them, the search engine giant says that Timnit left the company, and Magaret moved files out of the organization. However, according to Timnit, Google fired her because she voiced her concerns about the companies pressure to pull her name out from the paper that pinpointed flaws in large language models, especially toward the community of color.
“We strongly condemn Google’s actions to dismiss Dr. Timnit Gebru and Dr. Margaret Mitchell, disrupting the lives and work of both researchers and stymying efforts of the Ethical AI team they managed,” reads the joint statement.
In the joint statement, the three groups also extend their support to all others who have more or less witnessed a similar resentment but did not go mainstream. The groups believe that Google’s approach to handling the situation has harmed the Black and Queer community by undermining the importance of inclusion and critical research.
“They not only have caused damage but set a dangerous precedent for what type of research, advocacy, and retaliation is permissible in our community,” believes the three groups.
The joint statement by the Black In AI, Queer in AI, and Widening NLP also asked Google to implement steps demanded by Google Walkout to bring back the trust among researchers and the Black and Queer community.
As a part of Google’s Contact Center AI (CCI) initiative to integrate artificial intelligence in customer support, Google announces Agent Assist for Chat. The solution will help human support agents during live calls and chats by providing relevant answers and FAQs to quickly resolve the issues in real time.
Agent Assist will understand the intent of the customer and pull out the most likely resources that would help human assistants in addressing the problems quickly. According to Google, clients leveraging Agent Assist for Chat have managed conversations concurrently, up by 28%. In addition, it has improved the response time by 15%, thereby reducing the wait time on calls or chats for other customers.
Agent Assist for Chat consists of two crucial components — Smart Reply and Knowledge Assist. While Smart Reply assists in suggesting the right messages extracted from the top performing agents, Knowledge Assist provides articles and FAQs to instantaneously find the perfect answer to customers’ problems.
One of the most critical aspects of Agent Assist is that the machine learning model is trained on your data to enhance the accuracy of recommendations. The is a crucial component as support can vary widely based on the way businesses conduct their operations.
With the public API, organizations can also integrate Agent Assist into their agent desktop to control the agent experience from end-to-end. Superior customer support can become a differentiating factor for organizations in the competitive market to boost their businesses. Using artificial intelligence can empower customer support representatives to provide exceptional customer experience as well as quickly resolve queries in real-time.
As the rate at which data is being collected by organizations is increasing, traditional relational databases have failed to offer the required scale, compute, and flow of data. Consequently, data warehouses became prominent across data-driven organizations for accelerating insights delivery. Today, almost every organization is switching to cloud data warehouses from on-premise infrastructures for streamlining the data workloads across the departments. This is enabling companies to democratize data and boost decision-making for business growth. In the competitive landscape, a proper data warehouse house selection can be the differentiating factor for companies to augment their business processes. However, due to the availability of numerous options for cloud data warehouses, enterprises have struggled to evaluate and find the best fit according to their needs. To simplify the evaluation for companies, this post will focus on comparing famous cloud warehouses — Redshift vs BigQuery vs Snowflake.
While there could be numerous ways in which a data warehouse can be evaluated by organizations, some of the most prominent ways to determine the best data warehouse can be as follows:-
Maintenance In Redshift vs BigQuery vs Snowflake
Organizations embrace managed data warehouses to eliminate maintenance overheads, making it one of the most vital factors while assessing different data warehouse service providers. While Snowflake and BigQuery require little to no maintenance, Redshift requires experts to perform manual maintenance occasionally.
Since storage and compute are not separated in AWS Redshift, you need to set up suitable clusters and optimize the workflow for better performance. But, with BigQuery and Snowflake, initial configuration can be performed without considering different requirements; the flexibility at the later stage by BigQuery and Snowflake removes the necessity of doing due diligence at the very beginning.
AWS also requires you to clear the vacuums — unoccupied spaces — created by the data over a period of time. BigQuery and Snowflake, in contrast, automate the removal of voids to optimize the storage capacity for better performance. Overall, to manage Redshift, an expert familiar with AWS would be vital for you to purge any hindrance during operations. With BigQuery and Snowflake, you do not necessarily need an expert to manage the workflows.
The ability of vertical and horizontal scalability was crucial to the proliferation of cloud data warehouses for organizations. While vertical scaling helps in increasing the load, horizontal scaling enables vast computation. Unlike BigQuery and Snowflake where the storage and compute is different, Redshift has grouped the two called cluster.
Each cluster is a collection of computing resources called nodes, which contains databases. Configuring these clusters is not immediate, disrupting the workflow while amending clusters for vertical or horizontal scaling. But, with Google BigQuery and Snowflake, scaling is performed in a flash to allow users to have continuous access to data warehouses while scaling.
Pricing In Redshift vs BigQuery vs Snowflake
As configurations of clusters are mostly fixed, the pricing with AWS is predictable. You can start with $0.25 per hour and scale according to your needs. However, to optimize costs when you are using it less than usual, you would be required to adjust the clusters on a daily or weekly basis. Therefore, AWS Redshift is popular among companies that have a steady usage of data warehouses. For companies that witness idle time or a surge in usage, it is recommended to fancy Snowflake or BigQuery.
Since Snowflake and BigQuery have different storage and computer pricing, predicting costs is not straightforward. For storage, BigQuery has two pricing models — active storage and long-term storage. While active storage is any table that has been modified in the last 90 days, long-term storage refers to tables that have witnessed an amendment for 90 days.
The active storage plan costs $0.020 per GB, and the long-term storage plan costs $0.010 per GB. Google also offers two pricing models for computing — on-demand pricing and flat-rate pricing. With on-demand, you are charged for each query, which is $5 per TB (the first 1TB per month is free). However, you are not charged for queries that return an error or provide results from cache. And for flat-rate pricing, you will shell out $2,000 for 100 slots — a dedicated query processing capacity.
Snowflake has set the storage pricing of $23 per TB, which is almost similar to BigQuery’s storage cost. However, you will be charged $40 per TB if you opt for on-demand usage. And for computing resources, Snowflake charges $0.00056 per second per credit for Standard Edition.
The pricing of Snowflake is more complicated, but it makes up with its cluster management, which stops clusters when not in use. As a result, you save significantly on processing costs. As per a benchmark, Snowflake is slightly cheaper than BigQuery on regular usage.
Performance In Redshift vs BigQuery vs Snowflake
Evaluating a data warehouse’s performance can be subjective and can differ based on the metric you want to consider. According to a benchmark, separating compute with storage has a unique advantage in processing speed. For instance, Snowflake processes 6 to 60 million rows between 2 to 10 seconds.
But, as per another benchmark that assessed Redshift vs BigQuery vs Snowflake on 24 tables, with the largest one containing 4 million rows of data, the average runtime of 99 TPC-DS queries for BigQuery was 11.18 seconds, Redshift was 8.24, and Snowflake was 8.21 seconds. If your usage is intensive, leverage Redshift or Snowflake is an ideal choice.
Conclusion
Accessing Redshift vs BigQuery vs Snowflake can be a lot easier when your requirements are well-defined. If you are looking for heavy and steady usage without maintenance overhead, Snowflake would be the best choice. On the other hand, if you want flexibility with performance, Redshift should be the go-to service for data warehousing. And in case of varied workloads, BigQuery shall cater to your needs with minimal cost since it charges you for the queries you request.
Apple hires Samy Bengio, former scientists of Google, to lead a new AI research unit. Bengio has joined as a research director at Apple after leaving Google on 28 April 2021.
According to reports, Bengio will report to his former colleague John Giannandrea, who is now a senior vice president of machine learning and AI strategy. Giannandrea had earlier worked at Google for a little less than 8 years and was a senior vice president of engineering.
It is believed that Bengio left Google due to the recent restructuring of the ethical AI team. In his email to the Google research team, obtained by CNBC, he wrote “This is one of the most difficult emails I can think of sending to all of you: I have decided to leave Google in order to pursue other exciting opportunities. There’s no doubt that leaving this wonderful team is really difficult.”
Google AI, especially its ethical AI team, has witness criticism from every corner of the AI community after the ouster of Timmit Gebru and Margaret Mitchell, who were the leads of the ethical AI team. As per Timmit, Google fired her for her work that exposed biases in large language models. Following Timmit’s exit, Bengio shared his disbelief on social media and extend his support to the team and Timmit.
Bengio has moved on to a new challenge after working for more than 14 years at Google. While working for Google, Bengio worked on a wide range of projects, including search and speech. Bengio is also the brother of Yoshua Bengio, a joint recipient of the Turing award 2018 winner.
Julia Computing releases DataFrames.jl 1.0 to allow data scientists to work effectively with tabular data. DataFrames is an equivalent to pandas library that allows data scientists to manipulate large datasets for gaining insights. The latest release of DataFrames brings new capabilities for users to effectively handle and analyze data.
Julia lang is gaining popularity in the data science landscape due to its ability to quickly process a colossal amount of datasets. According to various reports, Julia is faster than Python, giving data scientists an edge while analyzing a plethora of information at once.
However, DataFrames is not the only tool for working with tabular data in Julia. Depending on the use cases, one can also leverage TypedTables and JuliDB. While TypedTables is used to obtain optimized performance when the table does not contains thousands of columns, JulidDB is ideal for when you are handling large datasets that cannot fit in the available memory.
One of the crucial features of Julia is that it allows you to switch between the libraries effectively. For instance, you can use Query.jl code to manipulate data in a DataFrame, JuliaDB, and more. Julia DataFrames is available through Julia packages and can be installed using the command Pkg.add(“DataFrames”).
A wide range of libraries for statistics, machine learning, plotting, data wrangling, and more are integrated with Julia DataFrames to streamline the data science workflows.
AWS Machine Learning Summit will bring industry experts, learners, and scientists together at the conference in June to host a one-day conference. In over 30 sessions, you will learn from experts on how machine learning is impacting business growth, how to get started with machine learning, and other best practices to build machine learning models. The event will also have live question and answer sessions to get your questions answered by industry experts.
With numerous sessions, the AWS Machine Learning Summit is focused on a wide range of audiences. It will have sessions for someone who is getting started to machine learning experts and technology leaders.
Every session is bucketed under Level 100, Level 200, and Level 300. While Level 100 provinces an overview of AWS services, Level 200 is focused on providing best practices. Level 300 is for experts who have deep knowledge of machine learning.
Some of the interesting sessions include the ethical algorithm, analyzing social media for suicide risk using natural language processing, building high-quality computer vision models using only a few examples, deep graph learning at scale, among others.
Various speakers like Swami Sivasubramanian, VP of AI and ML at AWS, Bratin Saha, VP of Machine Learning at AWS, Yoelle Marek, VP of Research, Alexa at AWS, Andrew Ng, founder of DeepLearning AI, and more will share in-depth insights into the artificial intelligence landscape.
The event will be hosted across the Americas, Asia-pacific, Japan, Europe, Middle East, and Africa on 2 or 3 June 2021, based on the geographical timezone.
In the latest updated release of CUDA 11.3, a software development platform for building GPU-accelerated AI-based applications, NVIDIA adds direct Python support.
Now, users can build applications without relying on third-party libraries or frameworks to work with CUDA using Python. Over the years, data scientists used to leverage libraries like TensorFlow, PyTorch, CuPy, Sckit-CUDA, RAPIDS, and more to use CUDA with Python programming language.
Since these libraries had their own interoperability layer between the CUDA API and Python, data scientists’ had to remember different workflow structures. However, with the support of Python for CUDA, the development of data-based applications with NVIDIA GPUs will become a lot easier. CUDA Python is also compatible with NVIDIA Nsight Compute, allowing data scientists to gain kernel insights for performance optimization.
As Python is an interpreted language and working with parallel programming requires low-level programming, NVIDIA ensured interoperability with Driver API and NVRTC. As a result, you will still be required to write the Kernel code in C++, but this release can be the beginning of complete interoperability in the future.
On various performance tests, NVIDIA determined that the performance of CUDA Python is similar to that of working with CUDA using C++.
NVIDIA, in upcoming releases, will introduce the source code on GitHub or package through PIP and Conda to simplify the access to Cuday Python.
With CUDA 11.3, NVIDIA releases several other enhancements for developers using C++ and improved CUDA API for CUDA Graphs. The full list of the release can be accessed here.
Salesforce, a CRM provider, collaborates with NASSCOM’s FutureSkills Prime program to upskill 1 lakh aspirants by 2024 for free. The idea is to bring courses devised by Salesforce’s experts on NASSCOM’s online learning platform, FutureSkills.
Over the years, Salesforce has been offering free courses through its Trailhead platform to not only aspirants from Salesforce but also from any corner of the world. Since 2014, the year of the launch of Trailhead, over 3 million people have learned in-demand skills for free.
However, the requirement for digital skill since the pandemic has rapidly increased, as a result, there is a need to double down on such upskilling initiatives from both organizations and governments alike. As per a report from World Economic Forum, 50% of all employees will need new skills in the next five years.
Salesforce will also provide learners access to its career fails and help get a job post getting upskilled. “Trailhead is designed to remove barriers to learning by empowering anyone to skill up for the jobs of tomorrow. We are excited to be associated with NASSCOM to provide a platform for continuous learning and bridge the digital skills gap in India,” says Arundhati Bhattacharya, CEO and chairperson of Salesforce India.
NASSCOM’s FutureSkills has been continuously adding new courses on the e-learning platform since its inception. With the recent collaboration with Salesforce, the platform will allow aspirants to get mentors’ help, bring new dimensions to the platform.
Coursera, to celebrate its 9th anniversary, is offering some of the most popular courses with an option to get certified for free. The edtech platform has curated 9 courses for learners to enroll and learn for free.
Among the top courses, it has featured courses that can cater to the needs of data science aspirants and other developers, providing a much-needed opportunity for learners during the pandemic.
However, you will only have an option to choose any one of the curated 9 courses that are being offered for free.
Coursera free certification initiatives are not new for the company, during the start of the pandemic, the edtech platform had collaborated with many universities to give free courses to students. It had also collaborated with a few states in India to offer free Coursera certification.
The last day to enroll for Coursera anniversary free courses is on 30 April 2021.
NVIDIA GTC 2021, one of the largest artificial intelligence conferences, was kickstarted on April 12 with several announcements by Jensen Huang, co-founder and CEO of NVIDIA. Unlike in the past, NVIDIA GTC 2021 was free for any enthusiasts to attend. The free pass not only allowed people to watch the live sessions but also engage with other attendees.
Here are the top announcements from the NVIDIA GTC 2021: –
NVIDIA GTC 2021 Announcements
1. NVIDIA Omniverse
“We are building virtual worlds with NVIDIA Omniverse — a miraculous platform that will help build the next wave of AI for robotics and self-driving cars,” Jensen Huang, co-founder and CEO of NVIDIA. NVIDIA Omniverse is multi-GPU real-time simulation and collaboration platform for the 3D ecosystem.
With NVIDIA Omniverse, organizations can effortlessly integrate other technologies from NVIDIA to augment the development workflows. NVIDIA Omniverse consists of Nucleus, Connect, Kit, Simulation, and RTX Renderer to make it suitable for most of the AI-workloads for self-driving cars and robotics. One can also integrate third-party digital content creation (DCC) to further extend the capabilities of the NVIDIA Omniverse ecosystem and build state-of-the-art applications.
NVIDIA’s acquisition of ARM is allowing the largest graphic processing chips provider to become a one-stop shop for all the artificial intelligence processing requirements. The company introduced NVIDIA Grace — a breakthrough data center CPU for AI and HPC workloads. According to NVIDIA, the Grace CPU leverages the flexibility of Arm architecture to deliver up to 30x higher aggregate bandwidth than the current servers and 10x the performance for applications running terabytes of data. “NVIDIA is now a three-chip [CPU, GPU, and DPU] company,” said Jensen at GTC 2021.
3. BlueField-3 DPU and DOCA1.0
The increase in demand for AI-based solutions has caused strain on the cloud infrastructure to deliver the requested resources at scale, leading to the development of data processing units (DPUs). At GTC 2021, one of the top announcements was about NVIDIA BlueField-3, which will have 22 billion transistors, provide 400 Gbps networking, and integrate 16 Arm CPUs. NVIDIA BlueField-3 will offer 10x the processing capability of its predecessor, BlueField-2. NVIDIA also announced its first data center infrastructure SDK — DOCA 1.0 to — help enterprises program BlueField.
4. DGX SuperPOD
NVIDIA introduced an advanced DGX system called DGX SuperPOD, a fully integrated and network-optimized, AI-data-center-as-a-product. Several DGXs are a part of top supercomputers around the world. Even the fifth-largest supercomputer of the world, developed by NVIDIA, uses 4 DGX SuperPODs. The company also upgraded the existing DGX systems like DGX Station to expedite the process of supporting researchers develop advanced products.
5. NVIDIA Megatron
Transformer-based machine learning models like GPT-3 and Microsoft’s Turing-NLG have billions of parameters. However, these take a huge amount of computational power to build models that can process a wide range of tasks like generating code, summarise documents, superior chatbots, and more. “Model sizes are growing exponentially at a pace of doubling every two and half months,” said Jensen at GTC 2021. To allow developers to build large-scale models NVIDIA announced Megatron.
NVIDIA Megatron consists of Trition Inference Server, which will allow the distribution of large models effectively to users, thereby assisting in delivering output within seconds instead of in minutes.
6. NVIDIA Morpheus
NVIDIA announced Morpheus, which tracks the infringements in networks and collects the information that are exposed. Morpheus can visualize the entire network system and notify the absence of security and how it is impacting other applications. With Morpheus, organizations can reduce the manual efforts required for evaluating networks across the enterprise.
7. NGC Pre-Trained Models
NVIDIA also introduced new pre-trained modes — Jarvis (Conversational AI), Merlin (recommender system), Maxine (virtual collaboration), and others — as well as a platform called TAO that will allow developers to further enhance the performance of the pre-trained models from NIVIDIA with their own data. Fleet command, another offering of NVIDIA, can be leveraged to securely monitor and deploy applications at the edge.
8. Drive AV
NVIDIA Drive AV is an open programmable platform that will empower developers to build products for self-driving vehicles and robotics. NVIDIA Drive AV is a full-stack platform for every need of self-driving car manufacturers. NVIDIA is committed to further improving the platform and has released Orin, an autonomous vehicle computer. “Orin will process in one central computer the cluster, infotainment, passenger interaction AI, and very importantly, the confidence view or the perception world model,” said Jensen.