June 8, 2026

Amazon DEA-C01 Exam Questions 2026 V13.02: Free Dumps (Part 1, Q1-Q40) for AWS Data Engineer Prep

DumpsBase address your challenge of AWS Certified Data Engineer – Associate (DEA-C01) exam preparation by offering the most updated DEA-C01 dumps. We have updated the dumps to V13.02, which are designed to reflect modern question formats, helping you focus only on relevant topics instead of wasting time on outdated content. We have 294 practice questions and answers in V13.02, offering you a more targeted approach to build your confidence and maintain consistency throughout your preparation journey. To help you know more about the DEA-C01 dumps (V13.02), we will share our free demo questions, as a demo of V13.02. These free questions will help you check the quality of these exam questions. Also, you can find that our updated DEA-C01 dumps (V13.02) ensure you are studying content aligned with the latest exam objectives and difficulty levels. Smart preparation with current Amazon DEA-C01 dumps ultimately leads to better retention and improved exam readiness.

Start with Amazon DEA-C01 free dumps (Part 1, Q1-Q40) of V13.02 below:

1. A transportation company wants to track vehicle movements by capturing geolocation records. The records are 10 bytes in size. The company receives up to 10,000 records every second. Data transmission delays of a few minutes are acceptable because of unreliable network conditions.

The transportation company wants to use Amazon Kinesis Data Streams to ingest the geolocation data. The company needs a reliable mechanism to send data to Kinesis Data Streams. The company needs to maximize the throughput efficiency of the Kinesis shards.

Which solution will meet these requirements in the MOST operationally efficient way?

Kinesis Agent

Kinesis Producer Library (KPL)

Amazon Data Firehose

Kinesis SDK

2. A retail company uses Amazon Aurora PostgreSQL to process and store live transactional data. The company uses an Amazon Redshift cluster for a data warehouse.

An extract, transform, and load (ETL) job runs every morning to update the Redshift cluster with new data from the PostgreSQL database. The company has grown rapidly and needs to cost optimize the Redshift cluster.

A data engineer needs to create a solution to archive historical data. The data engineer must be able to run analytics queries that effectively combine data from live transactional data in PostgreSQL, current data in Redshift, and archived historical data. The solution must keep only the most recent 15 months of data in Amazon Redshift to reduce costs.

Which combination of steps will meet these requirements? (Select TWO.)

Configure the Amazon Redshift Federated Query feature to query live transactional data that is in the PostgreSQL database.

Configure Amazon Redshift Spectrum to query live transactional data that is in the PostgreSQL database.

Schedule a monthly job to copy data that is older than 15 months to Amazon S3 by using the UNLOAD command. Delete the old data from the Redshift cluster. Configure Amazon Redshift Spectrum to access historical data in Amazon S3.

Schedule a monthly job to copy data that is older than 15 months to Amazon S3 Glacier Flexible Retrieval by using the UNLOAD command. Delete the old data from the Redshift duster. Configure Redshift Spectrum to access historical data from S3 Glacier Flexible Retrieval.

Create a materialized view in Amazon Redshift that combines live, current, and historical data from different sources.

3. A company needs a solution that restricts access to Amazon S3 data and encrypts the data by using AWS managed keys. The solution must manage database credentials that an AWS Lambda function uses and must rotate the credentials automatically.

Which solution will meet these requirements?

Use S3 bucket policies to control access. Use server-side encryption with Amazon S3 managed keys (SSE-S3) to encrypt the data. Store the database credentials as Lambda environment variables.

Use IAM policies to control access. Use server-side encryption with AWS KMS keys (SSE-KMS) to encrypt the data. Configure AWS Secrets Manager to store and automatically rotate the credentials by using a Lambda function.

Use S3 ACLs to control access. Use server-side encryption with AWS KMS keys (SSE-KMS) to encrypt the data. Store the credentials in AWS Systems Manager Parameter Store and automatically rotate the credentials by using a Lambda function.

Use IAM policies to control access. Use server-side encryption with Amazon S3 managed keys (SSE-S3) to encrypt the data. Store the credentials in AWS Systems Manager Parameter Store. Configure a scheduled Lambda function to rotate the credentials.

4. A company receives a data file from a partner each day in an Amazon S3 bucket. The company uses a daily AW5 Glue extract, transform, and load (ETL) pipeline to clean and transform each data file. The output of the ETL pipeline is written to a CSV file named Dairy.csv in a second 53 bucket.

Occasionally, the daily data file is empty or is missing values for required fields. When the file is missing data, the company can use the previous day's CSV file.

A data engineer needs to ensure that the previous day's data file is overwritten only if the new daily file is complete and valid.

Which solution will meet these requirements with the LEAST effort?

Invoke an AWS Lambda function to check the file for missing data and to fill in missing values in required fields.

Configure the AWS Glue ETL pipeline to use AWS Glue Data Quality rules. Develop rules in Data Quality Definition Language (DQDL) to check for missing values in required files and empty files.

Use AWS Glue Studio to change the code in the ETL pipeline to fill in any missing values in the required fields with the most common values for each field.

Run a SQL query in Amazon Athena to read the CSV file and drop missing rows. Copy the corrected CSV file to the second S3 bucket.

5. A company uses an Amazon Redshift provisioned cluster as its database. The Redshift cluster has five reserved ra3.4xlarge nodes and uses key distribution.

A data engineer notices that one of the nodes frequently has a CPU load over 90%. SQL Queries that run on the node are queued. The other four nodes usually have a CPU load under 15% during daily operations.

The data engineer wants to maintain the current number of compute nodes. The data engineer also wants to balance the load more evenly across all five compute nodes.

Which solution will meet these requirements?

Change the sort key to be the data column that is most often used in a WHERE clause of the SQL SELECT statement.

Change the distribution key to the table column that has the largest dimension.

Upgrade the reserved node from ra3.4xlarqe to ra3.16xlarqe.

Change the primary key to be the data column that is most often used in a WHERE clause of the SQL SELECT statement.

6. A company uses Amazon Redshift as its data warehouse service. A data engineer needs to design a physical data model.

The data engineer encounters a de-normalized table that is growing in size. The table does not have a suitable column to use as the distribution key.

Which distribution style should the data engineer use to meet these requirements with the LEAST maintenance overhead?

ALL distribution

EVEN distribution

AUTO distribution

KEY distribution

7. A company receives a daily file that contains customer data in .xls format. The company stores the file in Amazon S3. The daily file is approximately 2 GB in size.

A data engineer concatenates the column in the file that contains customer first names and the column that contains customer last names. The data engineer needs to determine the number of distinct customers in the file.

Which solution will meet this requirement with the LEAST operational effort?

Create and run an Apache Spark job in an AWS Glue notebook. Configure the job to read the S3 file and calculate the number of distinct customers.

Create an AWS Glue crawler to create an AWS Glue Data Catalog of the S3 file. Run SQL queries from Amazon Athena to calculate the number of distinct customers.

Create and run an Apache Spark job in Amazon EMR Serverless to calculate the number of distinct customers.

Use AWS Glue DataBrew to create a recipe that uses the COUNT_DISTINCT aggregate function to calculate the number of distinct customers.

8. A data engineer needs to deploy a complex pipeline. The stages of the pipeline must run scripts, but only fully managed and serverless services can be used.

Deploy AWS Glue jobs and workflows. Use AWS Glue to run the jobs and workflows on a schedule.

Use Amazon MWAA to build and schedule the pipeline.

Deploy the script to EC2. Use EventBridge to schedule it.

Use AWS Glue DataBrew and EventBridge to run on a schedule.

9. A data engineer develops an AWS Glue Apache Spark ETL job to perform transformations on a dataset. When the data engineer runs the job, the job returns an error that reads, "No space left on device."

The data engineer needs to identify the source of the error and provide a solution.

Which combinations of steps will meet this requirement MOST cost-effectively? (Select TWO.)

Scale out the workers vertically to address data skewness.

Use the Spark UI and AWS Glue metrics to monitor data skew in the Spark executors.

Scale out the number of workers horizontally to address data skewness.

Enable the --write-shuffle-files-to-s3 job parameter. Use the salting technique.

Use error logs in Amazon CloudWatch to monitor data skew.

10. A company needs to build a data lake in AWS. The company must provide row-level data access and column-level data access to specific teams. The teams will access the data by using Amazon Athena, Amazon Redshift Spectrum, and Apache Hive from Amazon EMR.

Which solution will meet these requirements with the LEAST operational overhead?

Use Amazon S3 for data lake storage. Use S3 access policies to restrict data access by rows and columns. Provide data access through Amazon S3.

Use Amazon S3 for data lake storage. Use Apache Ranger through Amazon EMR to restrict data access by rows and columns. Provide data access by using Apache Pig.

Use Amazon Redshift for data lake storage. Use Redshift security policies to restrict data access by rows and columns. Provide data access by using Apache Spark and Amazon Athena federated queries.

Use Amazon S3 for data lake storage. Use AWS Lake Formation to restrict data access by rows and columns. Provide data access through AWS Lake Formation.

11. A company has an on-premises PostgreSQL database that contains customer data. The company wants to migrate the customer data to an Amazon Redshift data warehouse. The company has established a VPN connection between the on-premises database and AWS.

The on-premises database is continuously updated. The company must ensure that the data in Amazon Redshift is updated as quickly as possible.

Which solution will meet these requirements?

Use the pg_dump utility to generate a backup of the PostgreSQL database. Use the AWS Schema Conversion Tool (AWS SCT) to upload the backup to Amazon Redshift. Set up a cron job to perform a backup. Upload the backup to Amazon Redshift every night.

Create an AWS Database Migration Service (AWS DMS) full-load task. Set Amazon Redshift as the target. Configure the task to use the change data capture (CDC) feature.

Use the pg_dump utility to generate a backup of the PostgreSQL database. Upload the backup to an Amazon S3 bucket. Use the COPY command to import the data into Amazon Redshift.

Create an AWS Database Migration Service (AWS DMS) full-load task. Set Amazon Redshift as the target. Configure the task to perform a full load of the database to Amazon Redshift every night.

12. A company has an application that uses a microservice architecture. The company hosts the application on an Amazon Elastic Kubernetes Services (Amazon EKS) cluster.

The company wants to set up a robust monitoring system for the application. The company needs to analyze the logs from the EKS cluster and the application. The company needs to correlate the cluster's logs with the application's traces to identify points of failure in the whole application request flow.

Which combination of steps will meet these requirements with the LEAST development effort? (Select TWO.)

Use FluentBit to collect logs. Use Open Telemetry to collect traces.

Use Amazon CloudWatch to collect logs. Use Amazon Kinesis to collect traces.

Use Amazon CloudWatch to collect logs. Use Amazon Managed Streaming for Apache Kafka (Amazon MSK) to collect traces.

Use Amazon OpenSearch to correlate the logs and traces.

Use AWS Glue to correlate the logs and traces.

13. A company uses AWS Glue ETL pipelines to process data. The company uses Amazon Athena to analyze data in an Amazon S3 bucket.

To better understand shipping timelines, the company decides to collect and store shipping dates and delivery dates in addition to order data. The company adds a data quality check to ensure that the shipping date is later than the order date and that the delivery date is later than the shipping date. Orders that fail the quality check must be stored in a second Amazon S3 bucket.

Which solution will meet these requirements in the MOST cost-effective way?

Use AWS Glue DataBrew DATEDIFF functions to create two additional columns. Validate the new columns. Write failed records to a second S3 bucket.

Use Amazon Athena to query the three date columns and compare the values. Export failed records to a second S3 bucket.

Use AWS Glue Data Quality to create a custom rule that validates the three date columns. Route records that fail the rule to a second S3 bucket.

Use an AWS Glue crawler to populate the AWS Glue Data Catalog. Use the three date columns to create a filter.

14. A company uses Amazon S3 buckets, AWS Glue tables, and Amazon Athena as components of a data lake. Recently, the company expanded its sales range to multiple new states. The company wants to introduce state names as a new partition to the existing S3 bucket, which is currently partitioned by date.

The company needs to ensure that additional partitions will not disrupt daily synchronization between the AWS Glue Data Catalog and the S3 buckets.

Which solution will meet these requirements with the LEAST operational overhead?

Use the AWS Glue API to manually update the Data Catalog.

Run an MSCK REPAIR TABLE command in Athena.

Schedule an AWS Glue crawler to periodically update the Data Catalog.

Run a REFRESH TABLE command in Athena.

15. A company uses an Amazon Redshift cluster as a data warehouse that is shared across two departments. To comply with a security policy, each department must have unique access permissions.

Department A must have access to tables and views for Department A. Department B must have access to tables and views for Department B.

The company often runs SQL queries that use objects from both departments in one query.

Which solution will meet these requirements with the LEAST operational overhead?

Group tables and views for each department into dedicated schemas. Manage permissions at the schema level.

Group tables and views for each department into dedicated databases. Manage permissions at the database level.

Update the names of the tables and views to follow a naming convention that contains the department names. Manage permissions based on the new naming convention.

Create an IAM user group for each department. Use identity-based IAM policies to grant table and view permissions based on the IAM user group.

16. A company uses Amazon Redshift as its data warehouse. Data encoding is applied to the existing tables of the data warehouse. A data engineer discovers that the compression encoding applied to some of the tables is not the best fit for the data.

The data engineer needs to improve the data encoding for the tables that have sub-optimal encoding.

Which solution will meet this requirement?

Run the ANALYZE command against the identified tables. Manually update the compression encoding of columns based on the output of the command.

Run the ANALYZE COMPRESSION command against the identified tables. Manually update the compression encoding of columns based on the output of the command.

Run the VACUUM REINDEX command against the identified tables.

Run the VACUUM RECLUSTER command against the identified tables.

17. A global finance company needs to implement near real-time cross-Region synchronization of trading data between trading centers in the us-east-1 Region, the eu-west-2 Region, and the ap-northeast-1 Region. The company must ensure that data is encrypted in transit. The solution must ensure data ordering and consistency and must support cross-Region disaster recovery. The solution must provide data latency of less than 500 milliseconds.

Which solution will meet these requirements with the LEAST operational effort?

Deploy Apache Kafka Connect in each AWS Region. Use custom-developed connectors to set up cross-Region data replication. Configure the SSL security protocol.

Use Amazon Managed Streaming for Apache Kafka (Amazon MSK) Replicator to establish fully interconnected replication relationships between MSK clusters in the three AWS Regions. Enable TLS encryption and IAM authentication. Set up cross-Region backup configurations.

Deploy Apache Kafka MirrorMaker 2.0 in each AWS Region. Set up custom replication policies to handle cross-Region data synchronization. Configure the SSL security protocol.

Use Amazon Kinesis Data Streams to receive trading data from each AWS Region. Use Amazon Data Firehose to replicate data between Amazon Managed Streaming for Apache Kafka (Amazon MSK) clusters in each Region. Configure AWS Key Management Service (AWS KMS) encryption and IAM roles to manage access.

18. A data engineer wants to orchestrate a set of extract, transform, and load (ETL) jobs that run on AWS. The ETL jobs contain tasks that must run Apache Spark jobs on Amazon EMR, make API calls to Salesforce, and load data into Amazon Redshift.

The ETL jobs need to handle failures and retries automatically. The data engineer needs to use Python to orchestrate the jobs.

Which service will meet these requirements?

Amazon Managed Workflows for Apache Airflow (Amazon MWAA)

AWS Step Functions

AWS Glue

Amazon EventBridge

19. A company needs to store semi-structured transactional data for an application in a database. The database must be serverless. The application writes the data infrequently, but it reads the data frequently. The application must retrieve the data within milliseconds.

Which solution will meet these requirements with the LEAST operational overhead?

Store the data in an Amazon S3 Standard bucket. Enable S3 Transfer Acceleration.

Store the data in an Amazon S3 Apache Iceberg table. Enable S3 Transfer Acceleration.

Store the data in an Amazon RDS for MySQL cluster. Configure RDS Optimized Reads for the cluster.

Store the data in an Amazon DynamoDB table. Configure a DynamoDB Accelerator cache.

20. A data engineer is configuring Amazon SageMaker Studio to use AWS Glue interactive sessions to prepare data for machine learning (ML) models.

The data engineer receives an access denied error when the data engineer tries to prepare the data by using SageMaker Studio.

Which change should the engineer make to gain access to SageMaker Studio?

Add the AWSGlueServiceRole managed policy to the data engineer's IAM user.

Add a policy to the data engineer's IAM user that includes the sts:AssumeRole action for the AWS Glue and SageMaker service principals in the trust policy.

Add the AmazonSageMakerFullAccess managed policy to the data engineer's IAM user.

Add a policy to the data engineer's IAM user that allows the sts:AddAssociation action for the AWS Glue and SageMaker service principals in the trust policy.

21. A company that operates globally must follow regulations that require data from an AWS Region to be accessible only within that Region.

A data engineer is creating a data pipeline that will create resources in the Region where the data engineer works. The data pipeline should have access to data only from the Region where the data engineer works. The pipeline uses Active Directory as an identity and authentication system. The pipeline uses a custom identity broker application to verify that employees are signed in to Active Directory and to obtain temporary credentials by using the AssumeRole API operation.

Which solution will meet the locality requirements with the LEAST administrative effort?

Create an IAM role that has permissions to create resources. Create a policy for each Region that ensures users can create resources only in that Region. Pass the policy as the session policy when employees obtain the temporary credentials.

Create an IAM role for data engineers in each Region separately. Instruct each data engineer to obtain temporary credentials by assuming the appropriate Region-specific IAM role.

Create an IAM group for each Region. Include the required IAM policies for each IAM group. Add users to each IAM group so that when users log in by obtaining the temporary credentials, the users will receive the appropriate access based on the IAM group.

Create individual IAM policies that allow users to create resources in a specific Region. Assign the policies to each data engineer. Allow users to assume the individually assigned role when the users log in to AW

22. A company runs an AWS Glue workflow every day to process time series data from an Amazon S3 bucket. The workflow loads the data into an Amazon Redshift Serverless table. The company observes that some of the jobs in the workflow occasionally fail.

A data engineer must receive a notification when the Redshift table does not contain the most recent data.

Which solution will meet this requirement in the MOST operationally efficient way?

Configure an Amazon EventBridge Scheduler to run an Amazon Macie job to scan the Redshift table for data freshness. Configure Macie to notify an Amazon Simple Notification Service (Amazon SNS) topic when an AWS Glue job fails.

Schedule an AWS Glue Data Quality job to check the freshness of the data. Create an Amazon EventBridge rule to notify an Amazon Simple Notification Service (Amazon SNS) topic when a data quality rule fails.

Load AWS Glue job logs to an Amazon S3 bucket. Configure an Amazon CloudWatch alarm to send a notification when the job logs in the S3 bucket contain Job.State=FAILE

Create an Amazon CloudWatch dashboard that displays a metric named Failed AWS Glue Jobs that counts AWS Glue job failures during the previous day. Set a CloudWatch alarm to send a notification when the metric value exceeds zero.

23. A data engineer is designing a new data lake architecture for a company. The data engineer plans to use Apache Iceberg tables and AWS Glue Data Catalog to achieve fast query performance and enhanced metadata handling. The data engineer needs to query historical data for trend analysis and optimize storage costs for a large volume of event data.

Which solution will meet these requirements with the LEAST development effort?

Store Iceberg table data files in Amazon S3 Intelligent-Tiering.

Define partitioning schemes based on event type and event date.

Use AWS Glue Data Catalog to automatically optimize Iceberg storage.

Run a custom AWS Glue job to compact Iceberg table data files.

24. A car sales company maintains data about cars that are listed for sale in an area. The company receives data about new car listings from vendors who upload the data daily as compressed files into Amazon S3. The compressed files are up to 5 KB in size. The company wants to see the most up-to-date listings as soon as the data is uploaded to Amazon S3.

A data engineer must automate and orchestrate the data processing workflow of the listings to feed a dashboard. The data engineer must also provide the ability to perform one-time queries and analytical reporting. The query solution must be scalable.

Which solution will meet these requirements MOST cost-effectively?

Use an Amazon EMR cluster to process incoming data. Use AWS Step Functions to orchestrate workflows. Use Apache Hive for one-time queries and analytical reporting. Use Amazon OpenSearch Service to bulk ingest the data into compute optimized instances. Use OpenSearch Dashboards in OpenSearch Service for the dashboard.

Use a provisioned Amazon EMR cluster to process incoming data. Use AWS Step Functions to orchestrate workflows. Use Amazon Athena for one-time queries and analytical reporting. Use Amazon QuickSight for the dashboard.

Use AWS Glue to process incoming data. Use AWS Step Functions to orchestrate workflows. Use Amazon Redshift Spectrum for one-time queries and analytical reporting. Use OpenSearch Dashboards in Amazon OpenSearch Service for the dashboard.

Use AWS Glue to process incoming data. Use AWS Lambda and S3 Event Notifications to orchestrate workflows. Use Amazon Athena for one-time queries and analytical reporting. Use Amazon QuickSight for the dashboard.

25. A data engineer needs to query data from multiple sources to generate an annual report. The analytics team uses Amazon Redshift for analysis. The data engineer needs to integrate Amazon Redshift data with 10 years of historical data from Amazon RDS for PostgreSQL and RDS for MySQL. All the databases are in the same VPC. The data engineer needs a solution that provides seamless data integration with Amazon Redshift.

Which solution will meet these requirements in the MOST cost-effective way?

Use federated queries in Amazon Redshift to fetch data from RDS for PostgreSQL and RDS for MySQ

Apply the necessary transformations within Amazon Redshift.

Use the SELECT INTO OUTFILE S3 statement to export data from Amazon RDS to Amazon S3. Use the COPY command to load the data into Amazon Redshift.

Create a visual extract, transform, and load (ETL) job in AWS Glue to extract the required data and load it to Amazon Redshift.

Use AWS Database Migration Service (AWS DMS) to ingest data from RDS for PostgreSQL and RDS for MySQ

Implement the necessary transformations within Amazon Redshift.

26. A data engineer needs to create an Amazon Athena table based on a subset of data from an existing Athena table named cities_world. The cities_world table contains cities that are located around the world. The data engineer must create a new table named cities_us to contain only the cities from cities_world that are located in the US.

Option A

Option B

Option C

Option D

27. A company has multiple applications that use datasets that are stored in an Amazon S3 bucket. The company has an ecommerce application that generates a dataset that contains personally identifiable information (PII). The company has an internal analytics application that does not require access to the PII.

To comply with regulations, the company must not share PII unnecessarily. A data engineer needs to implement a solution that with redact PII dynamically, based on the needs of each application that accesses the dataset.

Which solution will meet the requirements with the LEAST operational overhead?

Create an S3 bucket policy to limit the access each application has. Create multiple copies of the dataset. Give each dataset copy the appropriate level of redaction for the needs of the application that accesses the copy.

Create an S3 Object Lambda endpoint. Use the S3 Object Lambda endpoint to read data from the S3 bucket. Implement redaction logic within an S3 Object Lambda function to dynamically redact PII based on the needs of each application that accesses the data.

Use AWS Glue to transform the data for each application. Create multiple copies of the dataset. Give each dataset copy the appropriate level of redaction for the needs of the application that accesses the copy.

Create an API Gateway endpoint that has custom authorizers. Use the API Gateway endpoint to read data from the S3 bucket. Initiate a REST API call to dynamically redact PII based on the needs of each application that accesses the data.

28. A data engineer needs to make tabular data available in an Amazon S3Cbased data lake. Users must be able to query the data by using SQL queries in Amazon Redshift, Amazon Athena, and Amazon EMR. The data is updated daily. The data engineer must ensure that updates and deletions are reflected in the data lake.

Which solution will meet these requirements with the LEAST operational overhead?

Store the data in S3 Standard. Configure Apache Hudi with merge-on-read in Amazon EM

Use Apache Spark SQL in Amazon EMR to perform the daily updates and deletions. Use Amazon EMR to schedule compaction jobs. Use AWS Glue to create a data catalog of Hudi tables that are stored in Amazon S3.

Create S3 tables for the tabular data. Use AWS Glue and an S3 tables catalog for Apache Iceberg JAR to perform the daily updates and deletions. Configure a compaction size target. Set up snapshot management and unreferenced file removal for the S3 tables bucket.

Load the data into an Amazon Redshift cluster. Use SQL to perform the daily updates and deletions. Upload the data to an Amazon S3 bucket in Apache Parquet format to create the data lake.

Load the data into an Amazon EMR cluster. Use Apache Spark to perform the daily updates and deletions. Upload the data into an Amazon S3 bucket in Apache Parquet format to create the data lake.

29. A data engineer uses AWS Lake Formation to manage access to data that is stored in an Amazon S3 bucket. The data engineer configures an AWS Glue crawler to discover data at a specific file location in the bucket, s3://examplepath. The crawler execution fails with the following error:

"The S3 location: s3://examplepath is not registered."

The data engineer needs to resolve the error.

Attach an appropriate IAM policy to the IAM role of the AWS Glue crawler to grant the crawler permission to read the S3 location.

Create a new AWS Glue database. Assign the correct permissions to the database for the crawler.

Configure the S3 bucket policy to allow cross-account access.

30. A company maintains multiple extract, transform, and load (ETL) workflows that ingest data from the company's operational databases into an Amazon S3 based data lake. The ETL workflows use AWS Glue and Amazon EMR to process data.

The company wants to improve the existing architecture to provide automated orchestration and to require minimal manual effort.

Which solution will meet these requirements with the LEAST operational overhead?

AWS Glue workflows

AWS Step Functions tasks

AWS Lambda functions

Amazon Managed Workflows for Apache Airflow (Amazon MWAA) workflows

31. A company wants to use Apache Spark jobs that run on an Amazon EMR cluster to process streaming data. The Spark jobs will transform and store the data in an Amazon S3 bucket. The company will use Amazon Athena to perform analysis.

The company needs to optimize the data format for analytical queries.

Which solutions will meet these requirements with the SHORTEST query times? (Select TWO.)

Use Avro format. Use AWS Glue Data Catalog to track schema changes.

Use ORC format. Use AWS Glue Data Catalog to track schema changes.

Use Apache Parquet format. Use an external Amazon DynamoDB table to track schema changes.

Use Apache Parquet format. Use AWS Glue Data Catalog to track schema changes.

Use ORC format. Store schema definitions in separate files in Amazon S3.

32. A company maintains a data warehouse in an on-premises Oracle database. The company wants to build a data lake on AWS. The company wants to load data warehouse tables into Amazon S3 and synchronize the tables with incremental data that arrives from the data warehouse every day.

Each table has a column that contains monotonically increasing values. The size of each table is less than 50 GB. The data warehouse tables are refreshed every night between 1 AM and 2 AM. A business intelligence team queries the tables between 10 AM and 8 PM every day.

Which solution will meet these requirements in the MOST operationally efficient way?

Use an AWS Database Migration Service (AWS DMS) full load plus CDC job to load tables that contain monotonically increasing data columns from the on-premises data warehouse to Amazon S3. Use custom logic in AWS Glue to append the daily incremental data to a full-load copy that is in Amazon S3.

Use an AWS Glue Java Database Connectivity (JDBC) connection. Configure a job bookmark for a column that contains monotonically increasing values. Write custom logic to append the daily incremental data to a full-load copy that is in Amazon S3.

Use an AWS Database Migration Service (AWS DMS) full load migration to load the data warehouse tables into Amazon S3 every day Overwrite the previous day's full-load copy every day.

Use AWS Glue to load a full copy of the data warehouse tables into Amazon S3 every day. Overwrite the previous day's full-load copy every day.

33. A company uses an Amazon QuickSight dashboard to monitor usage of one of the company's applications. The company uses AWS Glue jobs to process data for the dashboard. The company stores the data in a single Amazon S3 bucket. The company adds new data every day.

A data engineer discovers that dashboard queries are becoming slower over time. The data engineer determines that the root cause of the slowing queries is long-running AWS Glue jobs.

Which actions should the data engineer take to improve the performance of the AWS Glue jobs? (Choose two.)

Partition the data that is in the S3 bucket. Organize the data by year, month, and day.

Increase the AWS Glue instance size by scaling up the worker type.

Convert the AWS Glue schema to the DynamicFrame schema class.

Adjust AWS Glue job scheduling frequency so the jobs run half as many times each day.

Modify the 1AM role that grants access to AWS glue to grant access to all S3 features.

34. A company stores a large dataset in an Amazon S3 bucket. A data engineer frequently runs complex queries on the dataset by using Amazon Athena. The data engineer needs to optimize query performance and optimize costs for queries that are run multiple times with the same parameters.

Which solution will meet these requirements?

Convert the dataset to JSON format before running Athena queries.

Use Amazon EMR to pre-process the data before running Athena queries.

Configure query result reuse settings in the Athena workgroup.

Use Amazon Redshift Spectrum to query the data in Amazon S3.

35. A data engineer must manage the ingestion of real-time streaming data into AWS. The data engineer wants to perform real-time analytics on the incoming streaming data by using time-based aggregations over a window of up to 30 minutes. The data engineer needs a solution that is highly fault tolerant.

Which solution will meet these requirements with the LEAST operational overhead?

Use an AWS Lambda function that includes both the business and the analytics logic to perform time-based aggregations over a window of up to 30 minutes for the data in Amazon Kinesis Data Streams.

Use Amazon Managed Service for Apache Flink (previously known as Amazon Kinesis Data Analytics) to analyze the data that might occasionally contain duplicates by using multiple types of aggregations.

Use an AWS Lambda function that includes both the business and the analytics logic to perform aggregations for a tumbling window of up to 30 minutes, based on the event timestamp.

Use Amazon Managed Service for Apache Flink (previously known as Amazon Kinesis Data Analytics) to analyze the data by using multiple types of aggregations to perform time-based analytics over a window of up to 30 minutes.

36. A company uses a variety of AWS and third-party data stores. The company wants to consolidate all the data into a central data warehouse to perform analytics. Users need fast response times for analytics queries.

The company uses Amazon QuickSight in direct query mode to visualize the data. Users normally run queries during a few hours each day with unpredictable spikes.

Which solution will meet these requirements with the LEAST operational overhead?

Use Amazon Redshift Serverless to load all the data into Amazon Redshift managed storage (RMS).

Use Amazon Athena to load all the data into Amazon S3 in Apache Parquet format.

Use Amazon Redshift provisioned clusters to load all the data into Amazon Redshift managed storage (RMS).

Use Amazon Aurora PostgreSQL to load all the data into Aurora.

37. A data engineer needs Amazon Athena queries to finish faster. The data engineer notices that all the files the Athena queries use are currently stored in uncompressed .csv format. The data engineer also notices that users perform most queries by selecting a specific column.

Which solution will MOST speed up the Athena query performance?

Change the data format from .csvto JSON format. Apply Snappy compression.

Compress the .csv files by using Snappy compression.

Change the data format from .csvto Apache Parquet. Apply Snappy compression.

Compress the .csv files by using gzjg compression.

38. A hotel management company receives daily data files from each of its hotels. The company wants to upload its data to AWS. The company plans to use Amazon Athena to access the files. The company needs to protect the files from accidental deletion. The company will develop an application on its on-premises servers to automatically forward the files to a fully managed AWS ingestion service.

Which solution will meet these requirements with the LEAST operational overhead?

Use AWS DataSync to replicate data from the on-premises servers to Amazon Elastic File System (Amazon EFS). Configure automatic backups in AWS Backup.

Use the Amazon Kinesis Agent on the on-premises servers to send data to Amazon Data Firehose. Store the data in an Amazon S3 bucket that has versioning enabled.

Use AWS Glue jobs to ingest data from the on-premises servers into Amazon RD

Enable automated backups for data protection.

Use a self-managed Apache Kafka agent on the on-premises servers to stream data to Amazon Managed Streaming for Apache Kafka (Amazon MSK). Store the data in an Amazon S3 bucket with versioning enabled.

39. A company has used an Amazon Redshift table that is named Orders for 6 months. The company performs weekly updates and deletes on the table. The table has an interleaved sort key on a column that contains AWS Regions.

The company wants to reclaim disk space so that the company will not run out of storage space. The company also wants to analyze the sort key column.

Which Amazon Redshift command will meet these requirements?

VACUUM FULL Orders

VACUUM DELETE ONLY Orders

VACUUM REINDEX Orders

VACUUM SORT ONLY Orders

40. A company wants to ingest streaming data into an Amazon Redshift data warehouse from an Amazon Managed Streaming for Apache Kafka (Amazon MSK) cluster. A data engineer needs to develop a solution that provides low data access time and that optimizes storage costs.

Which solution will meet these requirements with the LEAST operational overhead?

Create an external schema that maps to the MSK cluster. Create a materialized view that references the external schema to consume the streaming data from the MSK topic.

Develop an AWS Glue streaming extract, transform, and load (ETL) job to process the incoming data from Amazon MS

Load the data into Amazon S3. Use Amazon Redshift Spectrum to read the data from Amazon S3.

Create an external schema that maps to the streaming data source. Create a new Amazon Redshift table that references the external schema.

Create an Amazon S3 bucket. Ingest the data from Amazon MS

Create an event-driven AWS Lambda function to load the data from the S3 bucket to a new Amazon Redshift table.

Tags:AWS Certified Data Engineer - Associate (DEA-C01), DEA-C01 exam questions

About The Author

dumps

From our dumpsbase platform you could search what exams you need then test or practice online by yourself. Download the PDF file if you need directly. Any other questions you can mail [email protected]

Start with Amazon DEA-C01 free dumps (Part 1, Q1-Q40) of V13.02 below:

Related Posts

About The Author

dumps