Databricks Databricks-Machine-Learning-Associate Actual Free Exam Questions & Community Discussion

Exam Code/Number: Databricks-Machine-Learning-Associate
Exam Name/Title: Databricks Certified Machine Learning Associate Exam
Certification Provider: Databricks
Corresponding Certification: ML Data Scientist

Exam Questions: 76
Updated On: Jun 01, 2026

Question #33

A data scientist wants to efficiently tune the hyperparameters of a scikit-learn model. They elect to use the Hyperopt library's fmin operation to facilitate this process. Unfortunately, the final model is not very accurate. The data scientist suspects that there is an issue with the objective_function being passed as an argument to fmin.
They use the following code block to create the objective_function:

Which of the following changes does the data scientist need to make to their objective_function in order to produce a more accurate model?

A. Add a random_state argument to the RandomForestRegressor operation

B. Replace the fmin operation with the fmax operation

C. Remove the mean operation that is wrapping the cross_val_score operation

D. Replace the r2 return value with -r2

E. Add test set validation process

Discussion 0

Correct Answer: D Vote an answer

Explanation: Only visible for EduDump members. You can sign-up / login (it's free).

Question #34

A data scientist is working with a feature set with the following schema:

The customer_id column is the primary key in the feature set. Each of the columns in the feature set has missing values. They want to replace the missing values by imputing a common value for each feature.
Which of the following lists all of the columns in the feature set that need to be imputed using the most common value of the column?

A. customer_id

B. spend

C. customer_id, loyalty_tier

D. units

E. loyalty_tier

Discussion 0

Correct Answer: E Vote an answer

Explanation: Only visible for EduDump members. You can sign-up / login (it's free).

Question #35

A data scientist has developed a random forest regressor rfr and included it as the final stage in a Spark MLPipeline pipeline. They then set up a cross-validation process with pipeline as the estimator in the following code block:

Which of the following is a negative consequence of including pipeline as the estimator in the cross-validation process rather than rfr as the estimator?

A. The process will have a longer runtime because all stages of pipeline need to be refit or retransformed with each mode

B. The process will leak data from the training set to the test set during the evaluation phase

C. The process will be unable to parallelize tuning due to the distributed nature of pipeline

D. The process will leak data prep information from the validation sets to the training sets for each model

Discussion 0

Correct Answer: A Vote an answer

Explanation: Only visible for EduDump members. You can sign-up / login (it's free).

Question #36

A machine learning engineer is trying to perform batch model inference. They want to get predictions using the linear regression model saved at the path model_uri for the DataFrame batch_df.
batch_df has the following schema:
customer_id STRING
The machine learning engineer runs the following code block to perform inference on batch_df using the linear regression model at model_uri:

In which situation will the machine learning engineer's code block perform the desired inference?

A. When all of the features used by the model at model_uri are in a single Feature Store table

B. When the Feature Store feature set was logged with the model at model_uri

C. This code block will not perform the desired inference in any situation.

D. When the model at model_uri only uses customer_id as a feature

E. When all of the features used by the model at model_uri are in a Spark DataFrame in the PySpark

Discussion 0

Correct Answer: B Vote an answer

Explanation: Only visible for EduDump members. You can sign-up / login (it's free).

Question #37

Which statement describes a Spark ML transformer?

A. A transformer is a learning algorithm that can use a DataFrame to train a model

B. A transformer is a hyperparameter grid that can be used to train a model

C. A transformer chains multiple algorithms together to transform an ML workflow

D. A transformer is an algorithm which can transform one DataFrame into another DataFrame

Discussion 0

Correct Answer: D Vote an answer

Explanation: Only visible for EduDump members. You can sign-up / login (it's free).

Question #38

A machine learning engineer has been notified that a new Staging version of a model registered to the MLflow Model Registry has passed all tests. As a result, the machine learning engineer wants to put this model into production by transitioning it to the Production stage in the Model Registry.
From which of the following pages in Databricks Machine Learning can the machine learning engineer accomplish this task?

A. The home page of the MLflow Model Registry

B. The model page in the MLflow Model Registry

C. The experiment page in the Experiments observatory

D. The model version page in the MLflow Model Registry

Discussion 0

Correct Answer: D Vote an answer

Explanation: Only visible for EduDump members. You can sign-up / login (it's free).

Question #39

A data scientist has developed a linear regression model using Spark ML and computed the predictions in a Spark DataFrame preds_df with the following schema:
prediction DOUBLE
actual DOUBLE
Which of the following code blocks can be used to compute the root mean-squared-error of the model according to the data in preds_df and assign it to the rmse variable?

Discussion 0

Correct Answer: D Vote an answer

Explanation: Only visible for EduDump members. You can sign-up / login (it's free).

Question #40

A data scientist learned during their training to always use 5-fold cross-validation in their model development workflow. A colleague suggests that there are cases where a train-validation split could be preferred over k-fold cross-validation when k > 2.
Which of the following describes a potential benefit of using a train-validation split over k-fold cross-validation in this scenario?

A. A holdout set is not necessary when using a train-validation split

B. Reproducibility is achievable when using a train-validation split

C. Fewer models need to be trained when using a train-validation split

D. Fewer hyperparameter values need to be tested when using a train-validation split

E. Bias is avoidable when using a train-validation split

Discussion 0

Correct Answer: C Vote an answer

Explanation: Only visible for EduDump members. You can sign-up / login (it's free).

Download Free Databricks Databricks-Machine-Learning-Associate Demo

Simply submit your e-mail address below to get started with our free demo of your Databricks Databricks-Machine-Learning-Associate exam.