Data Science and Machine Learning for Beginners: A Complete Step-by-Step Guide Question and Answer
1. Which one of these is a key feature for Big Data?
Ans: Velocity
2. What is Uniqueness mean when it comes to Data Quality?
Ans: Uniqueness means how distinctive this data is from other sources. This allows for a competitive advantage.
3. Which of these is an example of MetaData?
Ans: The timestamp record of a voicemail left on your phone is metadata
4. Should a histogram be used on a categorical feature or continuous feature?
Ans: continuous
5. Should a Count Plot be used on a Continuous or Categorical feature on the x axis?
Ans: Categorical
6. In a box plot, what does the line in the middle of the box represent?
Ans: Median
7. Which of these is true for using RMSE?
Ans: RMSE punishes larger errors and has the same units as the label.
8. Which of the following situations is suitable for a regression task?
Ans: Predicting the price of a ship given various features
9. What does MAE stand for?
Ans: Mean Absolute error
10.Which of the following is true about K-Means Clustering?
Ans: A user must choose K before running the K Means algorithm.
11.Which of the following is true about clustering?
Ans: Clustering is an unsupervised learning algorithm
12. If a diagnostic test is 99% accurate, would it be suitable for release for the general public?
Ans: No, because without knowing precision and recall, we can't tell how well the diagnosis is performing on different cases.
13. What does PCA stand for?
Ans: Principal Component Analysis
14. Which statement below is TRUE about logistic regression?
Ans: Logistic Regression is used for categorical labels in classification problems.
15. Which of the following is an unsupervised algorithm?
Ans: K means
16. Which of the following algorithms would you expect to have the best performance on a small dataset?
Ans: No way to know until you fit the model on the training data and evaluate the results