Consider scoring new observations in the SCORE procedure versus the SCORE statement in the LOGISTIC procedure.
Which statement is true?
An analyst fits a logistic regression model to predict whether or not a client will default on a loan. One of the predictors in the model is agent, and each agent serves 15-20 clients each. The model fails to converge. The analyst prints the summarized data, showing the number of defaulted loans per agent. See the partial output below:
What is the most likely reason that the model fails to converge?
Refer to the REG procedure output:
Click on the calculator button to display a calculator if needed.
Refer to the exhibit.
Output from a multiple linear regression analysis is shown.
What is the most appropriate statement concerning collinearity between the input variables?
What is a benefit to performing data cleansing (imputation, transformations, etc.) on data after partitioning the data for honest assessment as opposed to performing the data cleansing prior to partitioning the data?
The standard form of a linear regression model is:
Which statement best summarizes the assumptions placed on the errors?
Refer to the lift chart:
What does the reference line at lift = 1 corresponds to?
Refer to the exhibit.
Given alpha=0.02, which conclusion is justified regarding percentage of body fat, comparing small (S), medium (M), and large (L) wrist sizes?
The question will ask you to provide a missing statement. Given the following SAS program:
Which SAS statement will complete the program to correctly score the data set NEW_DATA?
Refer to the REG procedure output:
Calculate the coefficient of determination, R-Square.
Enter your numeric answer in the space below. Round to 4 decimal places (example: n.nnnn).
The PROC LOGISTIC options SELECTION=SCORE and BEST=2 are used in a MODEL statement to generate a series of predictive models. The models are assigned numbers in order from 1 to 99 reflecting the fact that there are 50 candidate input variables. Results from the collection of derived models are used to generate the following plot of overall average profit by model number. Results are restricted to models with at least 9 inputs and at most 40 inputs.
The maximum value for the training data occurs for model number 46, and the maximum value for the validation data occurs for model number 43.
If you base model selection solely on overall average profit, what is the correct choice?
Given the following GLM procedure output:
Which statement is correct at an alpha level of 0.05?
This question will ask you to provide a missing option.
Complete the following syntax to test the homogeneity of variance assumption in the GLM procedure:
means Region /
A linear model has the following characteristics:
Which SAS program fits this model?
A marketing analyst assessed the effect of web page design (A, B, or C) on customers' intent to purchase an expensive product. The focus group was divided randomly into three sub-groups, each of which was asked to view one of the web pages and then give their intent to purchase on a scale from 0 to 100. The analyst also asked the customers to give their income, which was coded as: I (lowest), II (medium), or III (highest). After analyzing the data, the analyst claimed that there was significant interaction and the webpage design mainly influenced high income people.
Which graph supports the analyst's conclusion?
A)
B)
C)
D)
Which SAS program will divide the original data set into 60% training and 40% validation data sets, stratified by county?
Spearman statistics in the CORR procedure are useful for screening for irrelevant variables by investigating the association between which function of the input variables?
Refer to the exhibit:
On the Gains Chart, what is the correct interpretation of the horizontal reference line?
Screening for non-linearity in binary logistic regression can be achieved by visualizing:
This question will ask you to provide a missing option.
A business analyst is investigating the differences in sales figures across 8 sales regions. The analyst is interested in viewing the regression equation parameter estimates for each of the design variables.
Which option completes the program to produce the regression equation parameter estimates?
A non-contributing predictor variable (Pr > |t| =0.658) is added to an existing multiple linear regression model.
What will be the result?