Databricks Databricks-Machine-Learning-Associate Actual Free Exam Questions & Community Discussion
A data scientist wants to efficiently tune the hyperparameters of a scikit-learn model. They elect to use the Hyperopt library's fmin operation to facilitate this process. Unfortunately, the final model is not very accurate. The data scientist suspects that there is an issue with the objective_function being passed as an argument to fmin.
They use the following code block to create the objective_function:

Which of the following changes does the data scientist need to make to their objective_function in order to produce a more accurate model?
They use the following code block to create the objective_function:

Which of the following changes does the data scientist need to make to their objective_function in order to produce a more accurate model?
Correct Answer: D
Vote an answer
Explanation: Only visible for EduDump members. You can sign-up / login (it's free).
A data scientist is working with a feature set with the following schema:

The customer_id column is the primary key in the feature set. Each of the columns in the feature set has missing values. They want to replace the missing values by imputing a common value for each feature.
Which of the following lists all of the columns in the feature set that need to be imputed using the most common value of the column?

The customer_id column is the primary key in the feature set. Each of the columns in the feature set has missing values. They want to replace the missing values by imputing a common value for each feature.
Which of the following lists all of the columns in the feature set that need to be imputed using the most common value of the column?
Correct Answer: E
Vote an answer
Explanation: Only visible for EduDump members. You can sign-up / login (it's free).
A data scientist has developed a random forest regressor rfr and included it as the final stage in a Spark MLPipeline pipeline. They then set up a cross-validation process with pipeline as the estimator in the following code block:

Which of the following is a negative consequence of including pipeline as the estimator in the cross-validation process rather than rfr as the estimator?

Which of the following is a negative consequence of including pipeline as the estimator in the cross-validation process rather than rfr as the estimator?
Correct Answer: A
Vote an answer
Explanation: Only visible for EduDump members. You can sign-up / login (it's free).
A machine learning engineer is trying to perform batch model inference. They want to get predictions using the linear regression model saved at the path model_uri for the DataFrame batch_df.
batch_df has the following schema:
customer_id STRING
The machine learning engineer runs the following code block to perform inference on batch_df using the linear regression model at model_uri:

In which situation will the machine learning engineer's code block perform the desired inference?
batch_df has the following schema:
customer_id STRING
The machine learning engineer runs the following code block to perform inference on batch_df using the linear regression model at model_uri:

In which situation will the machine learning engineer's code block perform the desired inference?
Correct Answer: B
Vote an answer
Explanation: Only visible for EduDump members. You can sign-up / login (it's free).
Which statement describes a Spark ML transformer?
Correct Answer: D
Vote an answer
Explanation: Only visible for EduDump members. You can sign-up / login (it's free).
A machine learning engineer has been notified that a new Staging version of a model registered to the MLflow Model Registry has passed all tests. As a result, the machine learning engineer wants to put this model into production by transitioning it to the Production stage in the Model Registry.
From which of the following pages in Databricks Machine Learning can the machine learning engineer accomplish this task?
From which of the following pages in Databricks Machine Learning can the machine learning engineer accomplish this task?
Correct Answer: D
Vote an answer
Explanation: Only visible for EduDump members. You can sign-up / login (it's free).
A data scientist has developed a linear regression model using Spark ML and computed the predictions in a Spark DataFrame preds_df with the following schema:
prediction DOUBLE
actual DOUBLE
Which of the following code blocks can be used to compute the root mean-squared-error of the model according to the data in preds_df and assign it to the rmse variable?
prediction DOUBLE
actual DOUBLE
Which of the following code blocks can be used to compute the root mean-squared-error of the model according to the data in preds_df and assign it to the rmse variable?
Correct Answer: D
Vote an answer
Explanation: Only visible for EduDump members. You can sign-up / login (it's free).
A data scientist learned during their training to always use 5-fold cross-validation in their model development workflow. A colleague suggests that there are cases where a train-validation split could be preferred over k-fold cross-validation when k > 2.
Which of the following describes a potential benefit of using a train-validation split over k-fold cross-validation in this scenario?
Which of the following describes a potential benefit of using a train-validation split over k-fold cross-validation in this scenario?
Correct Answer: C
Vote an answer
Explanation: Only visible for EduDump members. You can sign-up / login (it's free).
0
0
0
10
