Microsoft 70-775 Actual Free Exam Questions & Community Discussion

  • Exam Code/Number: 70-775
  • Exam Name/Title: Perform Data Engineering on Microsoft Azure HDInsight
  • Certification Provider: Microsoft
  • Corresponding Certification: Microsoft Azure HDInsight
  • Exam Questions: 63
  • Updated On: Jun 03, 2026
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this sections, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You have an Apache Pig table named Sales in Apache HCatalog.
You need to make the data in the table accessible from Apache Pig.
Solution: You use the following script.

Does this meet the goal?
Correct Answer: B Vote an answer
Explanation: Only visible for EduDump members. You can sign-up / login (it's free).
Note: This question is part of a series of questions that use the same scenario. For your convenience, the scenario is repeated in each question. Each question presents a different goal and answer choices, but the text of the scenario is exactly the same in each question in this series.
You are planning a big data infrastructure by using an Apache Spark cluster in Azure HDInsight. The cluster has 24 processor cores and 512 GB of memory.
The architecture of the infrastructure is shown in the exhibit. (Click the Exhibit button.)

The architecture will be used by the following users:
Support analysts who run applications that will use REST to submit Spark jobs.
Business analysts who use JDBC and ODBC client applications from a real-time view. The business analysts run monitoring queries to access aggregate results for 15 minutes. The results will be referenced by subsequent queries.
Data analysts who publish notebooks drawn from batch layer, serving layer, and speed layer queries. All of the notebooks must support native interpreters for data sources that are batch processed. The serving layer queries are written in Apache Hive and must support multiple sessions. Unique GUIDs are used across the data sources, which allow the data analysts to use Spark SQL.
The data sources in the batch layer share a common storage container. The following data sources are used:
Hive for sales data
Apache HBase for operations data
HBase for logistics data by using a single region server
You need to ensure that the data analysts can use the notebooks.
What should you install?
Correct Answer: D Vote an answer
Explanation: Only visible for EduDump members. You can sign-up / login (it's free).
HOTSPOT
Note: This question is part of a series of questions that use the same scenario. For your convenience, the scenario is repeated in each question. Each question presents a different goal and answer choices, but the text of the scenario is exactly the same in each question in this series.
You have an initial dataset that contains the crime data from major cities.
You plan to build training models from the training dat
a. You plan to automate the process of adding more data to the training models and to constantly tune the models by using the additional data, including data that is collected in near real-time. The system will be used to analyze event data gathered from many different sources, such as Internet of Things (IoT) devices, live video surveillance, and traffic activities, and to generate predictions of an increased crime risk at a particular time and place.
You have an incoming data stream from Twitter and an incoming data stream from Facebook, which are event-based only, rather than time-based. You also have a time interval stream every 10 seconds.
The data is in a key/value pair format. The value field represents a number that defines how many times a hashtag occurs within a Facebook post, or how many times a Tweet that contains a specific hashtag is retweeted.
You must use the appropriate data storage, stream analytics techniques, and Azure HDInsight cluster types for the various tasks associated to the processing pipeline.
You are using Microsoft Power BI Desktop to create visualizations of the crime data predictions. You connect to a JSON file that is stored in Azure Blob storage.
After loading the data into Power BI, the query shows the following metadata fields only:
Name
Content
Extension
Date created
Date accessed
Date modified
The actual columns have the following names:
Duration
Zip Code
Start Time
Probability
Crime Type
Likelihood Percent
You need to transform the query so that Power BI can access the actual columns rather than the metadata.
Which two actions should you perform from the Edit Queries menu on the Home ribbon? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Correct Answer:

References: https://social.technet.microsoft.com/wiki/contents/articles/37512.create-power-bi- reports-from-json-data-exposed-by-rest-service.aspx#Convert_JSON_to_table_data
DRAG DROP
You are evaluating the use of Azure HDInsight clusters for various workloads.
Which type of HDInsight cluster should you create for each workload? To answer, drag the appropriate cluster types to the correct workloads. Each cluster type may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content.
NOTE: Each correct selection is worth one point.
Correct Answer:

References: https://www.blue-granite.com/blog/how-to-choose-the-right-hdinsight-cluster
Note: This question is part of a series of questions that use the same or similar answer choices. An answer choice may be correct for more than one question in the series. Each question is independent of the other questions in this series. Information and details provided in a question apply only to that question.
You are implementing a batch processing solution by using Azure HDInsight.
You have data stored in Azure.
You need to ensure that you can access the data by using Azure Active Directory (Azure AD) identities.
What should you do?
Correct Answer: C Vote an answer
Explanation: Only visible for EduDump members. You can sign-up / login (it's free).
Note: This question is part of a series of questions that use the same scenario. For your convenience, the scenario is repeated in each question. Each question presents a different goal and answer choices, but the text of the scenario is exactly the same in each question in this series.
You are planning a big data infrastructure by using an Apache Spark cluster in Azure HDInsight. The cluster has 24 processor cores and 512 GB of memory.
The architecture of the infrastructure is shown in the exhibit. (Click the Exhibit button.)

The architecture will be used by the following users:
Support analysts who run applications that will use REST to submit Spark jobs.
Business analysts who use JDBC and ODBC client applications from a real-time view. The business analysts run monitoring queries to access aggregate results for 15 minutes. The results will be referenced by subsequent queries.
Data analysts who publish notebooks drawn from batch layer, serving layer, and speed layer queries. All of the notebooks must support native interpreters for data sources that are batch processed. The serving layer queries are written in Apache Hive and must support multiple sessions. Unique GUIDs are used across the data sources, which allow the data analysts to use Spark SQL.
The data sources in the batch layer share a common storage container. The following data sources are used:
Hive for sales data
Apache HBase for operations data
HBase for logistics data by using a single region server
You plan to create a new HBase table to collect the operations dat
a. The workload will be distributed across regional servers. Apache Phoenix will be used on top of HBase for the operations data.
You need to ensure that write operations are distributed across the regional servers.
Which setting should you configure when you create the table?
Correct Answer: A Vote an answer
Explanation: Only visible for EduDump members. You can sign-up / login (it's free).
You have an Apache HBase cluster in Azure HDInsight.
You plan to use Apache Pig, Apache Hive, and HBase to access the cluster simultaneously and to process data stored in a single platform.
You need to deliver consistent operations, security, and data governance.
What should you use?
Correct Answer: D Vote an answer
Explanation: Only visible for EduDump members. You can sign-up / login (it's free).
0
0
0
10