Microsoft 70-475 Actual Free Exam Questions & Community Discussion

  • Exam Code/Number: 70-475
  • Exam Name/Title: Design and Implement Big Data Analytics Solutions
  • Certification Provider: Microsoft
  • Corresponding Certification: Microsoft Azure
  • Exam Questions: 122
  • Updated On: Jun 01, 2026
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
Your company has multiple databases that contain millions of sales transactions.
You plan to implement a data mining solution to identity purchasing fraud.
You need to design a solution that mines 10 terabytes (TB) of sales data. The solution must meet the following requirements:
* Run the analysis to identify fraud once per week.
* Continue to receive new sales transactions while the analysis runs.
* Be able to stop computing services when the analysis is NOT running.
Solution: You create a Microsoft Azure Data Lake job.
Does this meet the goal?
Correct Answer: B Vote an answer
You have an Apache Storm cluster.
You need to ingest data from a Kafka queue.
Which component should you use to consume data emitted from Kaka?
Correct Answer: B Vote an answer
Explanation: Only visible for EduDump members. You can sign-up / login (it's free).
A company named Fabrikam, Inc. plans to monitor financial markets and social networks, and then to correlate global stock movements to social network activity.
You need to recommend a Microsoft Azure HDInsight cluster solution that meets the following requirements:
* Provides continuous availability
* Can process asynchronous feeds
What is the best type of cluster to recommend to achieve the goal? More than one answer choice may achieve the goal. Select the BEST answer.
Correct Answer: D Vote an answer
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while the others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You have a Microsoft Azure deployment that contains the following services:
* Azure Data Lake
* Azure Cosmos DB
* Azure Data Factory
* Azure SQL Database
You load several types of data to Azure Data Lake.
You need to load data from Azure SQL Database to Azure Data Lake.
Solution: You use the AzCopy utility.
Does this meet the goal?
Correct Answer: B Vote an answer
Explanation: Only visible for EduDump members. You can sign-up / login (it's free).
You plan to deploy a Microsoft Azure Data Factory pipeline to run an end-to-end data processing workflow.
You need to recommend winch Azure Data Factory features must be used to meet the Following requirements:
Track the run status of the historical activity.
Enable alerts and notifications on events and metrics.
Monitor the creation, updating, and deletion of Azure resources.
Which features should you recommend? To answer, drag the appropriate features to the correct requirements.
Each feature may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content.
NOTE: Each correct selection is worth one point.
Correct Answer:

Explanation

Box 1: Azure Hdinsight logs
Logs contain historical activities.
Box 2: Azure Data Factory alerts
Box 3: Azure Data Factory events
You have a pipeline that contains an input dataset in Microsoft Azure Table Storage and an output dataset in Azure Blob storage. You have the following JSON data.

Use the drop-down menus to select the answer choice that completes each statement based on the information presented in the JSON data.
NOTE: Each correct selection is worth one point.
Correct Answer:

Explanation

Box 1: Every three days at 10.00
anchorDateTime defines the absolute position in time used by the scheduler to compute dataset slice boundaries.
"frequency": "<Specifies the time unit for data slice production. Supported frequency: Minute, Hour, Day, Week, Month>",
"interval": "<Specifies the interval within the defined frequency. For example, frequency set to 'Hour' and interval set to 1 indicates that new data slices should be produced hourly> Box 2: Every minute up to three times.
retryInterval is the wait time between a failure and the next attempt. This setting applies to present time. If the previous try failed, the next try is after the retryInterval period.
Example: 00:01:00 (1 minute)
Example: If it is 1:00 PM right now, we begin the first try. If the duration to complete the first validation check is 1 minute and the operation failed, the next retry is at 1:00 + 1min (duration) + 1min (retry interval) =
1:02 PM.
For slices in the past, there is no delay. The retry happens immediately.
retryTimeout is the timeout for each retry attempt.
maximumRetry is the number of times to check for the availability of the external data.
You have a Microsoft Azure Data Factory pipeline.
You discover that the pipeline fails to execute because data is missing.
You need to rerun the failure in the pipeline.
Which cmdlet should you use?
Correct Answer: B Vote an answer
Explanation: Only visible for EduDump members. You can sign-up / login (it's free).
You have a large datacenter.
You plan to track the hardware failure notifications that occur in the datacenter. You expect to collect approximated 2 TB of data each month. You need to recommend a solution that meets the following requirements:
* Operators must be informed by email as soon as a hardware failure occurs.
* All event data associated with a hardware failure must be preserved for 24 months.
The solution must minimize costs.
Correct Answer:
You have a Microsoft Azure data factory.
You assign administrative roles to the users in the following table.

You discover that several new data factory instances were created.
You need to ensure that only User5 can create a new data factory instance.
Which two roles should you change? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.
Correct Answer: B,E Vote an answer
You need to recommend a platform architecture for a big data solution that meets the following requirements:
Supports batch processing
Provides a holding area for a 3-petabyte (PB) dataset
Minimizes the development effort to implement the solution
Provides near real time relational querying across a multi-terabyte (TB) dataset Which two platform architectures should you include in the recommendation? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.
Correct Answer: B,E Vote an answer
Explanation: Only visible for EduDump members. You can sign-up / login (it's free).
0
0
0
10