CCA Spark and Hadoop Developer Exam CCA175 Exam Dumps

Still worry about preparing for CCA Spark and Hadoop Developer Exam? We have released new CCA175 exam questions and answers online, which help you prepare for Cloudera CCA175 exam well. CCA Spark and Hadoop Developer Exam proves that candidates have core skills to ingest, transform, and process data using Apache Spark™ and core Cloudera Enterprise tools. Come to DumpsBase to get the CCA175 exam dumps as the preparation materials.

Read CCA175 Free Dumps To Check The Dumps Questions

1. Problem Scenario 1:

You have been given MySQL DB with following details.

user=retail_dba

password=cloudera

database=retail_db

table=retail_db.categories

jdbc URL = jdbc:mysql://quickstart:3306/retail_db

Please accomplish following activities.

2. Problem Scenario 2:

There is a parent organization called "ABC Group Inc", which has two child companies named Tech Inc and MPTech.

Both companies employee information is given in two separate text file as below. Please do the following activity for employee details.

Tech Inc.txt

3. Problem Scenario 3: You have been given MySQL DB with following details.

user=retail_dba

password=cloudera

database=retail_db

table=retail_db.categories

jdbc URL = jdbc:mysql://quickstart:3306/retail_db

Please accomplish following activities.

4. Problem Scenario 4: You have been given MySQL DB with following details.

user=retail_dba

password=cloudera

database=retail_db

table=retail_db.categories

jdbc URL = jdbc:mysql://quickstart:3306/retail_db

Please accomplish following activities.

Import Single table categories (Subset data} to hive managed table, where category_id between 1 and 22

5. Import departments table as a text file in /user/cloudera/departments.

6. Store all the Java files in a directory called java_output to evalute the further

7. Also make sure you have imported only two columns from table, which are department_id, department_name

8. Also make sure you use orderid columns for sqoop to use for boundary conditions.

9. Also make sure your results fields are terminated by '|' and lines terminated by 'n

10. Please import data in a non-existing table, means while importing create hive table named hadoopexam.departments_new

11. Now import only new inserted records and append to existring directory . which has been created in first step.

12. Now do the incremental import based on created_date column.

13. Now import the data from following directory into departments_export table, /user/cloudera/departments new

14. Now export this data from hdfs to mysql retail_db.departments table. During upload make sure existing department will just updated and no new departments needs to be inserted.

15. Please import the departments table in a directory called departments_enclosedby and file should be able to process by downstream system.

16. Now import data from mysql table departments to this hive table. Please make sure that data should be visible using below hive command, select" from departments_hive

17. Now import data from mysql table departments_hive01 to this hive table. Please make sure that data should be visible using below hive command. Also, while importing if null value found for department_name column replace it with "" (empty string) and for id column with -999 select * from departments_hive;

18. Now export data from hive table departments_hive01 in departments_hive02. While exporting, please note following. wherever there is a empty string it should be loaded as a null value in mysql.

wherever there is -999 value for int field, it should be created as null value.

19. Import departments table from mysql to hdfs as parquet file in departments_parquet directory.

20. Write a Sqoop Job which will import "retaildb.categories" table to hdfs, in a directory name "categories_targetJob".

21. Problem Scenario 21: You have been given log generating service as below.

startjogs (It will generate continuous logs)

tailjogs (You can check, what logs are being generated)

stopjogs (It will stop the log service)

Path where logs are generated using above service: /opt/gen_logs/logs/access.log

Now write a flume configuration file named flumel.conf, using that configuration file dumps logs in HDFS file system in a directory called flumel. Flume channel should have following property as well. After every 100 message it should be committed, use non-durable/faster channel and it should be able to hold maximum 1000 events

22. Write a hive query to read average salary of all employees.

23. Problem Scenario 23: You have been given log generating service as below.

Start_logs (It will generate continuous logs)

Tail_logs (You can check, what logs are being generated)

Stop_logs (It will stop the log service)

Path where logs are generated using above service: /opt/gen_logs/logs/access.log

Now write a flume configuration file named flume3.conf, using that configuration file dumps logs in HDFS file system in a directory called flumeflume3/%Y/%m/%d/%H/%M

Means every minute new directory should be created). Please us the interceptors to provide timestamp information, if message header does not have header info.

And also note that you have to preserve existing timestamp, if message contains it. Flume channel should have following property as well. After every 100 message it should be committed, use non-durable/faster channel and it should be able to hold maximum 1000 events.

24. While importing, make sure only male employee data is stored.

25. Problem Scenario 25: You have been given below comma separated employee information. That needs to be added in /home/cloudera/flumetest/in.txt file (to do tail source)

sex, name, city

1, alok, mumbai

1, jatin, chennai

1, yogesh, kolkata

2, ragini, delhi

2, jyotsana, pune

1, valmiki, banglore

Create a flume conf file using fastest non-durable channel, which write data in hive warehouse directory, in two separate tables called flumemaleemployee1 and flumefemaleemployee1

(Create hive table as well for given data}. Please use tail source with /home/cloudera/flumetest/in.txt file.

Flumemaleemployee1: will contain only male employees data flumefemaleemployee1: Will contain only woman employees data

26. Data should be written as text to hdfs

27. Data should be written as text to hdfs

28. Data should be written as text to hdfs


 

Add a Comment

Your email address will not be published. Required fields are marked *