The Copy Data activity is a very powerful and convenient way to ingest data to your data lake. Exercise 3.13 was your first exposure to the Copy Data activity. You used a Copy Data activity to convert and transform data in an Excel spreadsheet to Parquet format. In Exercise 5.2 you used a Copy Data activity in Azure Data Factory to copy and transform data (refer to Figure 5.8). The “Copy Data” section covers the Copy Data activity in a bit more detail. Refer also to Figure 6.60. To enable Data Consistency Verification on a Copy Data activity, click the Settings tab and check the Data Consistency Verification check box, as shown in Figure 6.62.

FIGURE 6.62 Validate batch loads with the Copy Data activity
When this feature is enabled, the Copy Data activity will perform validations between the source and sink after the data has been copied. Validations like checksum, row count, and file size are performed. One additional step to configure Data Consistency Validation is to identify where the output log files are to be written. As shown in Figure 6.62, the Logging Settings group includes a Storage Connection Name drop‐down list box. It requires an Azure Blob Storage or ADLS linked service as the location for storage. The log file is written in JSON format and contains information about the Copy Data activity, including the verification result and whether inconsistent data was found. Table 6.5 provides the possible values and descriptions of the verification results.
TABLE 6.5 Copy Data activity—verification results
Value | Description |
Verified | The data is consistent between the source and the sink. |
Not Verified | Data consistency validation is not enabled for this Copy Data activity. |
Unsupported | Not supported on this copy pair. |
Warning | The data is not consistent between the source and the sink. |
If the data validation fails, you can choose to abort the Copy Data activity or continue processing other files from the source location. By default, the Copy Data activity will abort when validation fails. If you want to continue, select the level of fault tolerance from the Fault Tolerance drop‐down list box. The options include the following:
- Skip Incompatible Rows
- Skip Missing Files
- Skip Forbidden Files
- Skip Files with Invalid Names
Table 6.6 contains the possible outputs from the Copy data activity that determine the action taken when data validation fails.
TABLE 6.6 Copy Data activity—inconsistent data results
Value | Description |
Found | There was inconsistent data found during the Copy Data activity. |
Skipped | Inconsistent data was found during the Copy Data activity and was skipped. |
None | No inconsistent data was found, likely due to the validation not being enabled. |
The log file contains also a timestamp, the name of the file that was processed, the verification result, the inconsistent data result, and a message why the file being considered is inconsistent.
Validation Pipeline Activity
Before the activities in your pipeline attempt to process data files, it might be prudent to confirm their existence. The Validation activity provides the capabilities to do this. Complete Exercise 6.12, where you will implement a Validation activity into an Azure Synapse Analytics pipeline.