www.analyticsdrift.com
What are the characteristics of a good data model?
www.analyticsdrift.com
A good data model: Has clean, transparent, and comprehendible data. Is capable of scaling in proportions when data charges occur. Has predictable performance. Is adaptive and responsive.
www.analyticsdrift.com
Overfitting and underfitting are modeling errors for which models fail to make accurate predictions.
www.analyticsdrift.com
Data cleansing or cleaning, or wrangling, is a process of identifying and modifying incorrect, incomplete, inaccurate or missing data.
www.analyticsdrift.com
Data visualization is the process of representing data graphically to reflect the important information it contains.
www.analyticsdrift.com
The main difference between variance and covariance is that variance talks about the overall dataset, including all data points, and covariance focuses on two randomly chosen variables in the dataset.
www.analyticsdrift.com
The primary Python libraries used for data analytics are Pandas, NumPy, Matplotlib, and Seaborn.
www.analyticsdrift.com
An outlier is a data point or value in the dataset that is far away from other recorded data points. There are many ways to detect outliers, including the box plot method, the Z-score method, and so on.
What are the data validation techniques used in data analytics?
www.analyticsdrift.com
There are four main data validation techniques: – Field level validation – Form level validation – Data saving validation – Search criteria validation
Differentiate between the WHERE clause and HAVING clause in SQL.
www.analyticsdrift.com
The WHERE clause operates on row data, and the filter occurs before any groupings are made. In contrast, the HAVING clause operates on aggregated data and filters values from a group.
www.analyticsdrift.com
A Pivot table in Excel is a way of summarizing large amounts of data. It brings together information from various locations in a workbook and presents it on a table.
What is time series analysis and time series forecasting?
www.analyticsdrift.com
Time series analysis is the technique to learn new information from time series data by analyzing them using different statistical methods. Time series forecasting can be considered to be based on time series analysis, but in forecasting, the focus is on building a model for predicting future values from previously stored data.
www.analyticsdrift.com
Collaborative filtering is a popular technique used in recommender systems where models provide automatic predictions or filter users’ interests based on past choices.
What is Hypothesis testing, and name a few forms of hypothesis tests?
www.analyticsdrift.com
Hypothesis testing is a statistical technique to determine the significance of a finding or statement. There are many forms of hypothesis tests, including p-test, t-test, chi-square test, ANOVA test, and more.
www.analyticsdrift.com
Clustering is the process of classifying data points into clusters or groups using a clustering algorithm. The types of clustering are based on the similarities in data points and have four basic types: centroid-based clustering, density-based clustering, distribution-based clustering, and hierarchical clustering.