This basic data validation script runs one of each type of data validation test case (T001-T066) shown in the Rule Set markdown (.md) pages. All the SQL validation test cases run sequentially, returning the test id, the test status (pass or fail), and the test description. Only one row is returned per validation test. To keep the process simple, output is streamed to the console (screen as text, not grids).
PROBLEM - The native pgAdmin tool does not support teh output of multiple SQL selects. Unfortunately, only the last SQL SELECT executed will be displayed. That means you cannot use pgAdmin to run this script. You must find another IDE like Toad or DB Visualizer to use this script, OR use the advanced script.
pgAdmin4 will only display the results of the last executed SQL statement when you run a batch of SELECTs like this basic script does. Unfortunately, that means you cannot run this script in pgAdmin.
Your options are:
- Use the advanced script
- Use another IDE like Toad Please email me if you have found a good, free SQL IDE that can output multiple postgersql SELECTs to text on screen, or to multiple grids (mpierce.email@yahoo.com)
Expand if you want to download and deploy the "demo_hr" test dataset...
If you'd like to run the test script as-is first, before copy-pasting the concepts out and applying to your own databases, then you will need to download and deploy the demo_hr test dataset.
- Download the "demo_hr_01_create_database.sql" script from here.
- Run the script on a SQL Server database where you have permissions (local is fine too).
- Download the "demo_hr_02_create_tables.sql" script from here.
- Run the script on the same SQL Server database.
- Download the "demo_hr_03_populate_tables.sql" script from here.
- Run the script on the same SQL Server database.
- Using pgAdmin (or an equivalent SQL IDE), confirm that the tables exist and the data is populated.
Expand if you need instructions on how to download and configure the basic script...
- Download the basic validation script from here.
- Pick an appropriate directory in which to save the script. Open your SQL Editor pointing to the appropriate Postgresql "demo_hr" database.
Expand if you would like to see a review of the script layout and what a typical data validation test case looks like ...>
The script currently consists of 1,156 lines of SQL code broken down as follows:
- Lines 1-52 are the comment block header, containing notes and definitions
- Lines 53-1,156 are the 66 individual example validation test cases (written as SQL SELECTs)
A typical data validation test has SQL code that looks something like this:
This test case validates that no carriage return (CR) or line feed (LF) characters exist in the last_name column across all rows.
Notice the following aspects of the SQL code:
-
Each data validation test case is written as one or more SQL SELECT statements.
-
There is one (or more) inner queries (lines 477-484 above)
- These return many detail rows with business validation logic applied.
- The columns returned vary by validation test case, but typically have a primary key or unique key value returned so you can easily identify which row failed
- There is also always a status field returned with a unique rejection code (eg: REJ-01 above) with the expected result (no CR or LFs), and the actual result including the position of the bad character in the source field.
- Note that you can highlight and run just the inner query SELECT(s) to see all relevant rows with specific failure details
-
There is one outer query (lines 473-476 and 485-485)
- It rolls all the detail rows up to a single summary row with pass or fail judgment.
- It returns column tst_id - the test ID (hard-coded when write script)
- It returns column status - the test result (re-calculated with every test run). Usually "P" for pass or "FAIL"...or add your own such as "WARN", "SKIP", or "BLOCK"
- It returns column tst_dscr - the data validation test description (hard-coded when write script)
The steps to execute the basic data validation script are going to depend on the IDE you are using. Since pgAdmin cannot run and output all these individual SQL SELECTs, you must choose a different editor with its own unique instructions (or use the advanced script)
You could just skip reading all the sections above and jump directly down here. Be sure to expand Step #3 to download a copy of the basic test case script that you will refactor to suit your needs.
Here are my recommendations for writing your validation script(s):
- Reference: I would suggest that you open the basic script as a reference in notepad or text editor of your choice and position it off on a second monitor to the side.
- Main Editor: Open SQL Server Management Studio with a blank new script that you are going to build out for your tests as follows:
- Comment Block: Create your own comment block at the top of your script, pulling anything of value from lines 1-44, but tailoring them to your specific scenario
- Select Schema: You will have a schema (database) that is "under test". Identify it. List it in the comment block. Prefix all SQL code with it going forward (below).
- Table #1: Pick the first table for which you are going to start writing validation cases for. Could be the first table alphabetically. Could be a highest priority table you want to begin writing validation checks against (biggest bang for the buck). Could be the simplest table to ease in with baby steps (low-hanging fruit). You decide.
- Rule Set #1 - Row Counts: Start off with an appropriate "Row Count" test case that applies to the entire table. Pick one from T001 - T004. Change the scehma name prefix, the table name, and if appropriate the column names. Highlight and run it in your SQL editor to validate the SQL. Change the "tst_id" number to your own. Change the tst_descr to your own.
- Rule Set #8 - Diff Checks: These diff checks in T059-T061 operate at the table-level just like Rule Set #1 Row Counts, so I suggest grouping them together at the top of each table's validation tests.
- Rule Set #2 - Keys: Next, if and only if there are missing unique key constraints or foreign key constraints, then I'd copy-paste the appropriate test cases T005-T007 into the SQL editor and make appropriate changes (schema name, table names, field names, tst_id, and tst_descr).
- Rule Set #3 - Heuristics: Next, if and only if it is appropriate to test Null Rates for specific fields of "Table #1" then I'd copy test case T008 into the SQL Editor and make all the appropriate changes (and there are a lot...schema name, table name, many column names, copy-paste or delete CASE WHEN rows and SELECT CAST rows for each column you want to add or remove...see my YouTube video for a demo).
Likewise, if there are particular fields in "Table #1" that you'd like to setup threshold alerts for on frequency of observed values, then you copy-paste-modify T009 the appropriate number of times...once per field. - Rule Set #4-#7: The bulk of the remaining example validation checks are field by field. You can write the test cases one table, and one field at a time...mixing and matching from these sample validation checks as needed. This granular approach can make fails readily apparent. However, you can alterantively combine dozens of these same checks into a single table-scan pass to greatly improve performance (best practice example in T065). Your choice. Bottom line, a lot of copy-paste-modifying SQL code from test cases T010-T058.
- Rule Set #9: Add in any defect regression tests as appropriate
- Finishing Touches: Don't forget to apply the best practice examples in data validation test cases T062-T066
And then repeat items 1-12 above for the next table. And again for the next table, and so on until you are done. Your choice whether to write one giant 5-10,000 line script for the entire schema, or break down into separate scripts for each table or logical table grouping.