Data Science and Machine Learning for Beginners: Q&A

Data Science and Machine Learning for Beginners: A Complete Step-by-Step Guide Question and Answer  

1. Which one of these is a key feature for Big Data?
Ans: Velocity

2. What is Uniqueness mean when it comes to Data Quality?
Ans: Uniqueness means how distinctive this data is from other sources. This allows for a competitive advantage.

3. Which of these is an example of MetaData?
Ans: The timestamp record of a voicemail left on your phone is metadata

4. Should a histogram be used on a categorical feature or continuous feature?
Ans: continuous

5. Should a Count Plot be used on a Continuous or Categorical feature on the x axis?
Ans: Categorical

6. In a box plot, what does the line in the middle of the box represent?
Ans: Median

7. Which of these is true for using RMSE?
Ans: RMSE punishes larger errors and has the same units as the label.

8. Which of the following situations is suitable for a regression task?
Ans: Predicting the price of a ship given various features

9. What does MAE stand for?
Ans: Mean Absolute error

10.Which of the following is true about K-Means Clustering?
Ans: A user must choose K before running the K Means algorithm.

11.Which of the following is true about clustering?
Ans: Clustering is an unsupervised learning algorithm

12. If a diagnostic test is 99% accurate, would it be suitable for release for the general public?
Ans: No, because without knowing precision and recall, we can't tell how well the diagnosis is performing on different cases.

13. What does PCA stand for?
Ans: Principal Component Analysis

14. Which statement below is TRUE about logistic regression?
Ans: Logistic Regression is used for categorical labels in classification problems.

15. Which of the following is an unsupervised algorithm?
Ans: K means

16. Which of the following algorithms would you expect to have the best performance on a small dataset?
Ans: No way to know until you fit the model on the training data and evaluate the results