Data Science and Big Data Analytics


Question No : 1

What is required in a presentation for project sponsors?
A. The "Big Picture" takeaways for executive level stakeholders
B. Data warehouse design changes
C. Line by line review of the developed code
D. Detailed statistical basis for the modeling approach used in the project
Answer: A

Question No : 2

Review the following code:
SELECT pn, vn, sum(prc*qty)
FROM sale
ORDER BY 1, 2, 3;
Which combination of subtotals do you expect to be returned by the query?
A. (pn, vn)
B. ( (pn, vn), (pn) )
C. ( (pn, vn) , (pn), (vn) )
D. ( (pn, vn) , (pn), (vn) , ( ) )
Answer: D

Question No : 3

Your company has 3 different sales teams. Each team's sales manager has developed incentive offers to increase the size of each sales transaction. Any sales manager whose incentive program can be shown to increase the size of the average sales transaction will receive a bonus.
Data are available for the number and average sale amount for transactions offering one of the incentives as well as transactions offering no incentive.
The VP of Sales has asked you to determine analytically if any of the incentive programs has resulted in a demonstrable increase in the average sale amount. Which analytical technique would be appropriate in this situation?
A. One-way ANOVA
B. Multi-way ANOVA
C. Student's t-test
D. Wilcoxson Rank Sum Test
Answer: A

Question No : 4

Consider the example of an analysis for fraud detection on credit card usage. You will need to ensure higher-risk transactions that may indicate fraudulent credit card activity are retained in your data for analysis, and not dropped as outliers during pre-processing. What will be your approach for loading data into the analytical sandbox for this analysis?
Answer: A

Question No : 5

Refer to the exhibit.

Which type of data issue would you suspect based on the exhibit?
A. "Saturated" data, indicating potential issues with data definitions
B. Incomplete data, indicating potential issues with data transmission
C. Mis-scaled data, indicating potential issues with data entry
D. The exhibit does not raise any obvious concerns with the data.
Answer: A

Question No : 6

Which type of numeric value does a logistic regression model estimate?
A. Probability
B. A p-value
C. Any integer
D. Any real number
Answer: A

Question No : 7

To ensure a successful analytic project, which key role can provide business domain expertise with a deep understanding of the data and key performance indicators?
A.Business Intelligence Analyst
B.Project Manager
C.Project Sponsor
D.Business User
Answer: A

Question No : 8

Consider a scale that has five (5) values that range from ¡°not important¡± to ¡°very important¡±. Which data classification best describes this data?
A. Ordinal
B. Nominal
C. Real
D. Ratio
Answer: A

Question No : 9

Refer to the exhibit.

Click on the calculator icon in the upper left corner. An analyst is searching a corpus of documents for the topic "solid state disk". In the Exhibit, Table A provides the inverse document frequency for each term across the corpus. Table B provides each term's frequency in four documents selected from corpus. Which of the four documents is most relevant to the analyst's search?
A. Document B
B. Document A
C. Document C
D. Document D
Answer: A

Question No : 10

A business colleague who is new to Hadoop approaches you with a question. The
colleague wants to know the best approach to access their data. The colleague has previously worked extensively with SQL and databases.
Which query interface should be recommended?
Answer: A

Question No : 11

Which word or phrase completes the statement? Structured data is to OLAP data as quasi-structured data is to____
A. Clickstream data
B. XML data
C. Text documents
D. Image files
Answer: A

Question No : 12

To ensure a successful analytic project, which key role can consult and advise the project team on the value of end results and how these will be used on a daily basis?
A.Business User
B.Project Manager
C.Data Scientist
D.Business Intelligence Analyst
Answer: A

Question No : 13

A disk drive manufacturer has a defect rate of less than 1.0% with 98% confidence. A quality assurance team samples 1000 disk drives and finds 14 defective units. Which action should the team recommend?
A. The manufacturing process should be inspected for problems.
B. A larger sample size should be taken to determine if the plant is functioning properly
C. A smaller sample size should be taken to determine if the plant is functioning properly
D. The manufacturing process is functioning properly and no further action is required.
Answer: A

Question No : 14

Refer to the Exhibit.

In the Exhibit, the table shows the values for the input Boolean attributes "A", "B", and "C". It also shows the values for the output attribute "class". Which decision tree is valid for the data?
A. Tree B
B. Tree A
C. Tree C
D. Tree D
Answer: A

Question No : 15

You have been assigned to do a study of the daily revenue effect of a pricing model of online transactions. You have tested all the theoretical models in the previous model planning stage, and all tests have yielded statistically insignificant results. What is your next step?
A. Report that the results are insignificant, and reevaluate the original business question.
B. Run all the models again against a larger sample, leveraging more historical data.
C. Move forward on the model with the highest significance scores relative to the others.
D. Modify samples used by the models and iterate until a significant result occurs.
Answer: A

Question No : 16

You are using the Apriori algorithm to determine the likelihood that a person who owns a home has a good credit score. You have determined that the confidence for the rules used in the algorithm is > 75%. You calculate lift = 1.011 for the rule, "People with good credit are homeowners". What can you determine from the lift calculation?
A. Support for the association is low
B. Leverage of the rules is low
C. The rule is coincidental
D. The rule is true
Answer: C

Question No : 17

What is required in a presentation for business analysts?
A. Budgetary considerations and requests
B. Operational process changes
C. Detailed statistical explanation of the applicable modeling theory
D. The presentation author's credentials
Answer: B

Question No : 18

You have plotted the distribution of savings account sizes for a bank.

Based on the distribution shown in the exhibit, how would you proceed?
A.Data is extremely skewed. Replot the data on a logarithmic scale to get a better understanding of it.
B.Data is extremely skewed but looks bimodal. Replot the data in the range 2,500 - 10,000 to be certain.
C.Accounts of sizes greater than 2,500 are rare and are most likely outliers. Eliminate them from future analysis.
D.Data is extremely skewed. Split the analysis into two cohorts; accounts less than 2,500 and accounts greater than 2,500.
Answer: A

Question No : 19

What is the primary bottleneck in text classification?
A. The availablilty of tagged training data.
B. The ability to parse unstructured text data.
C. The high dimensionality of text data.
D. The fact that text corpora are dynamic.
Answer: A

Question No : 20

What is a property of windows functions in SQL commands?
A.Used to calculate moving averages over various intervals
B.Group rows into a single output row
C.Used between the keywords FROM and WHERE in a SELECT command
D.Ordering data within a window is not required
Answer: A
