Chapter 10 focuses on optimizing the compute and availability of your data workloads running on Azure. Remember that when you provisioned the nodes that will run the batch job, you selected the VM size on which the job will run. You also selected how many nodes you wanted and the number of vCPUs. With some configuration, you can create an autoscale formula that increases the number of nodes when required and decreases them when not. This saves you money by only using the compute resources required. You can use attributes like CPU percentage and utilized memory, as well as the number of running tasks and inbound network traffic, to determine scale up or scale down logic. In addition to scaling Azure Batch nodes, you can also scale both SQL and Spark pools in Azure Synapse Analytics automatically. Again, there is more to come about this in Chapter 10.
Design and Configure Exception Handling
You can handle exceptions in source code like Python, Java, Scala, or C# by including your code within try{…} catch{…} blocks. When something unexpected happens in your code, the catch portion of your code is written specifically to handle that exception. You can decide to either retry the transaction or stop the code path execution completely. There is the same concept in T‐SQL commands that can handle and respond to exceptions. Keep in mind that if you do not place your code within these code blocks, the exception is considered unhandled and can crash the process and place the operating system or container into a bad state from which it cannot recover. This is known as a hung server and can result in schedules failing until the server is identified as being unhealthy, which results in a reboot, failover, or reconfiguration. From a pipeline perspective you have learned about linking pipeline activities (refer to Figure 6.20 and Figure 6.28). That you can link activities based on failure enables you to handle the fact that something unexpectedly went wrong. You may want to stop further activities from running if a failure is reached. Or, in some cases, the failure is expected, and the activity that runs due to the failure contains the logic to correct it. Chapter 9 and Chapter 10 include most of the details and Exercises about these topics.
Debug Spark Jobs Using the Spark UI
As you saw in Figure 6.42 and experienced in Exercise 6.8, there are some very nice features for managing and viewing the status of Spark jobs. There are also techniques for debugging and troubleshooting them. The details of these features are discussed in Chapter 10. There is also some related content later in this chapter, in the section “Manage Spark Jobs in a Pipeline.”