Databricks-Machine-Learning-Associate Sample Questions Answers

Questions 4

Which statement describes a Spark ML transformer?

Options:

A transformer is an algorithm which can transform one DataFrame into another DataFrame

A transformer is a hyperparameter grid that can be used to train a model

A transformer chains multiple algorithms together to transform an ML workflow

A transformer is a learning algorithm that can use a DataFrame to train a model

Buy Now

Questions 5

A data scientist wants to parallelize the training of trees in a gradient boosted tree to speed up the training process. A colleague suggests that parallelizing a boosted tree algorithm can be difficult.

Which of the following describes why?

Options:

Gradient boosting is not a linear algebra-based algorithm which is required for parallelization

Gradient boosting requires access to all data at once which cannot happen during parallelization.

Gradient boosting calculates gradients in evaluation metrics using all cores which prevents parallelization.

Gradient boosting is an iterative algorithm that requires information from the previous iteration to perform the next step.

Buy Now

Questions 6

A machine learning engineer would like to develop a linear regression model with Spark ML to predict the price of a hotel room. They are using the Spark DataFrametrain_dfto train the model.

The Spark DataFrametrain_dfhas the following schema:

The machine learning engineer shares the following code block:

Which of the following changes does the machine learning engineer need to make to complete the task?

Options:

They need to call the transform method on train df

They need to convert the features column to be a vector

They do not need to make any changes

They need to utilize a Pipeline to fit the model

They need to split thefeaturescolumn out into one column for each feature

Buy Now

Questions 7

A data scientist has developed a random forest regressor rfr and included it as the final stage in a Spark MLPipeline pipeline. They then set up a cross-validation process with pipeline as the estimator in the following code block:

Which of the following is a negative consequence of includingpipelineas the estimator in the cross-validation process rather thanrfras the estimator?

Options:

The process will have a longer runtime because all stages of pipeline need to be refit or retransformed with each mode

The process will leak data from the training set to the test set during the evaluation phase

The process will be unable to parallelize tuning due to the distributed nature of pipeline

The process will leak data prep information from the validation sets to the training sets for each model

Buy Now

Questions 8

A data scientist is utilizing MLflow Autologging to automatically track their machine learning experiments. After completing a series of runs for the experiment experiment_id, the data scientist wants to identify the run_id of the run with the best root-mean-square error (RMSE).

Which of the following lines of code can be used to identify the run_id of the run with the best RMSE in experiment_id?

Options:

OptionA

Option B

Option C

Option D

Buy Now

Questions 9

The implementation of linear regression in Spark ML first attempts to solve the linear regression problem using matrix decomposition, but this method does not scale well to large datasets with a large number of variables.

Which of the following approaches does Spark ML use to distribute the training of a linear regression model for large data?

Options:

Logistic regression

Spark ML cannot distribute linear regression training

Iterative optimization

Least-squares method

Singular value decomposition

Buy Now

Questions 10

Which of the following machine learning algorithms typically uses bagging?

Options:

IGradient boosted trees

K-means

Random forest

Decision tree

Buy Now

Questions 11

Which of the Spark operations can be used to randomly split a Spark DataFrame into a training DataFrame and a test DataFrame for downstream use?

Options:

TrainValidationSplit

DataFrame.where

CrossValidator

TrainValidationSplitModel

DataFrame.randomSplit

Buy Now

Questions 12

A data scientist learned during their training to always use 5-fold cross-validation in their model development workflow. A colleague suggests that there are cases where a train-validation split could be preferred over k-fold cross-validation when k > 2.

Which of the following describes a potential benefit of using a train-validation split over k-fold cross-validation in this scenario?

Options:

A holdout set is not necessary when using a train-validation split

Reproducibility is achievable when using a train-validation split

Fewer hyperparameter values need to be tested when usinga train-validation split

Bias is avoidable when using a train-validation split

Fewer models need to be trained when using a train-validation split

Buy Now

Questions 13

A machine learning engineer is using the following code block to scale the inference of a single-node model on a Spark DataFrame with one million records:

Assuming the default Spark configuration is in place, which of the following is a benefit of using anIterator?

Options:

The data will be limited to a single executor preventing the model from being loaded multiple times

The model will be limited to a single executor preventing the data from being distributed

The model only needs to be loaded once per executor rather than once per batch during the inference process

The data will be distributed across multiple executors during the inference process

Buy Now

Questions 14

Which of the following statements describes a Spark ML estimator?

Options:

An estimator is a hyperparameter arid that can be used to train a model

An estimator chains multiple alqorithms toqether to specify an ML workflow

An estimator is a trained ML model which turns a DataFrame with features into a DataFrame with predictions

An estimator is an alqorithm which can be fit on a DataFrame to produce a Transformer

An estimator is an evaluation tool to assess to the quality of a model

Buy Now

Questions 15

A data scientist wants to efficiently tune the hyperparameters of a scikit-learn model in parallel. They elect to use the Hyperopt library to facilitate this process.

Which of the following Hyperopt tools provides the ability to optimize hyperparameters in parallel?

Options:

fmin

SparkTrials

quniform

search_space

objective_function

Buy Now

Questions 16

Which of the following describes the relationship between native Spark DataFrames and pandas API on Spark DataFrames?

Options:

pandas API on Spark DataFrames are single-node versions of Spark DataFrames with additional metadata

pandas API on Spark DataFrames are more performant than Spark DataFrames

pandas API on Spark DataFrames are made up of Spark DataFrames and additional metadata

pandas API on Spark DataFrames are less mutable versions of Spark DataFrames

Buy Now

Questions 17

Which of the following approaches can be used to view the notebook that was run to create an MLflow run?

Options:

Open the MLmodel artifact in the MLflow run paqe

Click the "Models" link in the row corresponding to the run in the MLflow experiment paqe

Click the "Source" link in the row corresponding to the run in the MLflow experiment page

Click the "Start Time" link in the row corresponding to the run in the MLflow experiment page

Buy Now

Questions 18

A data scientist has produced three new models for a single machine learning problem. In the past, the solution used just one model. All four models have nearly the same prediction latency, but a machine learning engineer suggests that the new solution will be less time efficient during inference.

In which situation will the machine learning engineer be correct?

Options:

When the new solution requires if-else logic determining which model to use to compute each prediction

When the new solution's models have an average latency that is larger than the size of the original model

When the new solution requires the use of fewer feature variables than the original model

When the new solution requires that each model computes a prediction for every record

When the new solution's models have an average size that is larger than the size of the original model

Buy Now

Questions 19

A data scientist wants to use Spark ML to impute missing values in their PySpark DataFrame features_df. They want to replace missing values in all numeric columns in features_df with each respective numeric column’s median value.

They have developed the following code block to accomplish this task:

The code block is not accomplishing the task.

Which reasons describes why the code block is not accomplishing the imputation task?

Options:

It does not impute both the training and test data sets.

The inputCols and outputCols need to be exactly the same.

The fit method needs to be called instead of transform.

It does not fit the imputer on the data to create an ImputerModel.

Buy Now

Questions 20

A data scientist has replaced missing values in their feature set with each respective feature variable’s median value. A colleague suggests that the data scientist is throwing away valuable information by doing this.

Which of the following approaches can they take to include as much information as possible in the feature set?

Options:

Impute the missing values using each respective feature variable's mean value instead of the median value

Refrain from imputing the missing values in favor of letting the machine learning algorithm determine how to handle them

Remove all feature variables that originally contained missing values from the feature set

Create a binary feature variable for each feature that contained missing values indicating whether each row's value has been imputed

Create a constant feature variable for each feature that contained missing values indicating the percentage of rows from the feature that was originally missing

Buy Now

Questions 21

A data scientist is working with a feature set with the following schema:

Thecustomer_idcolumn is the primary key in the feature set. Each of the columns in the feature set has missing values. They want to replace the missing values by imputing a common value for each feature.

Which of the following lists all of the columns in the feature set that need to be imputed using the most common value of the column?

Options:

customer_id, loyalty_tier

loyalty_tier

units

spend

customer_id

Buy Now

Questions 22

A data scientist has been given an incomplete notebook from the data engineering team. The notebook uses a Spark DataFrame spark_df on which the data scientist needs to perform further feature engineering. Unfortunately, the data scientist has not yet learned the PySpark DataFrame API.

Which of the following blocks of code can the data scientist run to be able to use the pandas API on Spark?

Options:

import pyspark.pandas as ps

df = ps.DataFrame(spark_df)

import pyspark.pandas as ps

df = ps.to_pandas(spark_df)

spark_df.to_sql()

import pandas as pd

df = pd.DataFrame(spark_df)

spark_df.to_pandas()

Buy Now

Exam Code: Databricks-Machine-Learning-Associate

Exam Name: Databricks Certified Machine Learning Associate Exam

Last Update: Jun 30, 2025

Questions: 74

PDF + Testing Engine

$66 ~~$164.99~~

Testing Engine (only)

$50 ~~$124.99~~

PDF (only)

$42 ~~$104.99~~

buy now Databricks-Machine-Learning-Associate

Summer Special Sale - Limited Time 60% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: 575363r9

dumpspedia logo

Navigation:

Databricks-Machine-Learning-Associate Sample Questions Answers

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Quick Links

Why Us

Site Secure