Enable Copy Activity Data Consistency Verification – Create and Manage Batch Processing and Pipelines

The Copy Data activity is a very powerful and convenient way to ingest data to your data lake. Exercise 3.13 was your first exposure to the Copy Data activity. You used a Copy Data activity to convert and transform data in an Excel spreadsheet to Parquet format. In Exercise 5.2 you used a Copy Data activity in Azure Data Factory to copy and transform data (refer to Figure 5.8). The “Copy Data” section covers the Copy Data activity in a bit more detail. Refer also to Figure 6.60. To enable Data Consistency Verification on a Copy Data activity, click the Settings tab and check the Data Consistency Verification check box, as shown in Figure 6.62.

FIGURE 6.62 Validate batch loads with the Copy Data activity

When this feature is enabled, the Copy Data activity will perform validations between the source and sink after the data has been copied. Validations like checksum, row count, and file size are performed. One additional step to configure Data Consistency Validation is to identify where the output log files are to be written. As shown in Figure 6.62, the Logging Settings group includes a Storage Connection Name drop‐down list box. It requires an Azure Blob Storage or ADLS linked service as the location for storage. The log file is written in JSON format and contains information about the Copy Data activity, including the verification result and whether inconsistent data was found. Table 6.5 provides the possible values and descriptions of the verification results.

TABLE 6.5 Copy Data activity—verification results

ValueDescription
VerifiedThe data is consistent between the source and the sink.
Not VerifiedData consistency validation is not enabled for this Copy Data activity.
UnsupportedNot supported on this copy pair.
WarningThe data is not consistent between the source and the sink.

If the data validation fails, you can choose to abort the Copy Data activity or continue processing other files from the source location. By default, the Copy Data activity will abort when validation fails. If you want to continue, select the level of fault tolerance from the Fault Tolerance drop‐down list box. The options include the following:

  • Skip Incompatible Rows
  • Skip Missing Files
  • Skip Forbidden Files
  • Skip Files with Invalid Names

Table 6.6 contains the possible outputs from the Copy data activity that determine the action taken when data validation fails.

TABLE 6.6 Copy Data activity—inconsistent data results

ValueDescription
FoundThere was inconsistent data found during the Copy Data activity.
SkippedInconsistent data was found during the Copy Data activity and was skipped.
NoneNo inconsistent data was found, likely due to the validation not being enabled.

The log file contains also a timestamp, the name of the file that was processed, the verification result, the inconsistent data result, and a message why the file being considered is inconsistent.

Validation Pipeline Activity

Before the activities in your pipeline attempt to process data files, it might be prudent to confirm their existence. The Validation activity provides the capabilities to do this. Complete Exercise 6.12, where you will implement a Validation activity into an Azure Synapse Analytics pipeline.

Bill Mettler

Learn More

Leave a Reply

Your email address will not be published. Required fields are marked *