Note: These text validation checks are important. Good data loading practices land data into text data type fields, then from there validate and transform into other data types (numeric, date, boolean, etc.). Thus, these are checks you will find yourself using over and over again over time.
- T021 - Not Null
- T022 - Not Null String
- T023 - No Leading or Trailing Spaces
- T024 - In Value List
- T025 - Not In Value List
- T026 - Multi Field Compare
- T027 - Text Length
- T028 - Upper and Lower Case Characters
- T029 - Alpha and Numeric Characters
- T030 - No Quote Characters
- T031 - No CRLF Characters
- T032 - No TAB Characters
- T033 - No NBS Characters
- T034 - No EmDash Characters
- T035 - No VTFFNEL Characters
- T036 - No Period or Dash Characters
- T037 - No Funky ",/:()&#?;" Characters
- T038 - Only Allowed Characters In List
- T039 - Like Wildcards
- T040 - IsNumeric()
- T041 - IsDate("yyyymmdd")
- T042 - IsDate("mm/dd/yyyy")
- T043 - IsDate("mm-dd-yyyy")
- T044 - IsDate("yyyy-mm-dd")
Verify text field is not null. For example, to verify that table countries has no NULLs in field country_name:
SELECT country_name
, CASE WHEN country_name IS NULL THEN 'FAIL' ELSE 'P' END AS status
FROM demo_hr.countries;Verify text field is not null string "" (but in Oracle null strings don't exist, they are converted to nulls...therefore look for a space instead and treat this test case as a place holder equivalent to other database platforms). For example, to verify that table countries has rows where field country_name = a space:
SELECT country_name
, CASE WHEN country_name = '' THEN 'FAIL' ELSE 'P' END AS status
FROM demo_hr.countries;Verify text field has no leading or trailing spaces. For example, to verify that table countries, field country_name has no rows with a leading and/or trailing space:
SELECT country_name
, CASE WHEN country_name LIKE ' %' THEN 'REJ-01: Verify no leading space at country_name|exp=noLeadSpace|act=''' || country_name ||''''
WHEN country_name LIKE '% ' THEN 'REJ-02: Verify no trailing space at country_name|exp=noTrailingSpace|act=''' || country_name ||''''
ELSE 'P'
END AS status
FROM demo_hr.countries;Verify text field value is in the list of approved values. For example, to verify that field job_id of table employees is always in the list of 19 approved values:
SELECT job_id
, CASE WHEN job_id NOT IN('ST_MAN','ST_CLERK','SH_CLERK','SA_REP','SA_MAN','PU_CLERK','PR_REP','MK_REP','MK_MAN','IT_PROG'
,'HR_REP','FI_MGR','FI_ACCOUNT','AD_VP','AD_PRES','AD_ASST','AC_MGR','AC_ACCOUNT','PU_MAN')
THEN 'FAIL'
ELSE 'P'
END AS status
FROM demo_hr.employees;Verify text field value is not in the list of invalid values. For example, to verify that field job_id of table employees is never in the list of 5 invalid values:
SELECT job_id
, CASE WHEN job_id IN('CEO','CFO','COO','CIO','POTUS') THEN 'FAIL' ELSE 'P' END AS status
FROM demo_hr.employees;Verify text field value is comprised of other field values. For example, use the SQL below to verify that field email = first letter of field first_name + field last_name in table employees. Note that there were exceptions to the rule in the data, so these were manually removed from the test in the WHERE clause.
SELECT email, first_name, last_name
, CASE WHEN email <> SUBSTRING(UPPER(SUBSTRING(first_name, 1, 1) || last_name), 1, 8) THEN 'FAIL' ELSE 'P' END AS status
FROM demo_hr.employees
WHERE email NOT IN('DRAPHEAL', 'JAMRLOW', 'JMURMAN', 'LDEHAAN', 'JRUSSEL', 'TJOLSON')
-- DRAPHAEL vs DRAPHEAL, JMARLOW vs JAMRLOW, JMURMAN vs JURMAN, LDE HAAN VS LDEHAAN, JRUSSELL vs JRUSSEL, TOLSON vs TJOLSON
;Verify text field value length is an exact amount or within a range. For example, to verify that the field phone_number length is either 12 (US) or 18 (international) characters in length:
SELECT phone_number
, CASE WHEN LENGTH(phone_number) NOT IN(12,18) THEN 'REJ-01: Verify phone_number length is allowed|exp=112,18|act=' || LENGTH(phone_number)
ELSE 'P'
END AS status
FROM demo_hr.employees;Verify text field characters are uppercase, lowercase, or a mix. For example, to verify that the field last_name is all lowercase after the first character, and that field job_id is all uppercase in table employees:
SELECT job_id, last_name
, CASE WHEN job_id ~ '[[:lower:]]' THEN 'REJ-01: Verify job_id does not contain lower case characters|exp=ucase|act=' || job_id
WHEN NOT SUBSTRING(last_name,1) ~ '[[:upper:]]' THEN 'REJ-02: Verify last_name after first char is all lower case|exp=lcase|act=' || last_name
ELSE 'P'
END AS status
FROM demo_hr.employees;Verify text field characters are alpha, numeric, or a mix. For example, to verify that the field employee_id is numeric only, and field last_name is slpha only in table employees:
SELECT employee_id, last_name
, CASE WHEN some_date_fmt1 ~ '[[:alpha:]]' THEN 'REJ-01: Verify some_date_fmt1 does not contain alpha characters|exp=no-alphas|act=' || some_date_fmt1
WHEN last_name ~ '[[:digit:]]' THEN 'REJ-02: Verify last_name does not contain numeric digits|exp=no-digits|act=' || last_name
ELSE 'P'
END AS status
FROM demo_hr.employees;Verify text field does not have ' or " characters. For example, to verify that the field first_name has no quotes or single quotes in table employees:
SELECT first_name
, CASE WHEN first_name LIKE '%''%' THEN 'REJ-01: Verify first_name does not contain single quote characters|exp=none|act=' || first_name
WHEN first_name LIKE '%"%' THEN 'REJ-02: Verify first_name does not contain quotation characters|exp=none|act=' || first_name
ELSE 'P'
END AS status
FROM demo_hr.employees;Verify text field does not have carriage return (CHAR-13 / "CR") or line feed (CHAR-10 / "LF") characters. For example, to verify that the field last_name has no CRLFs in table employees:
SELECT last_name
, CASE WHEN POSITION(CHR(10) IN last_name) > 0 THEN 'REJ-01: Field last_name has a Line Feed (CHR-10)|exp=none|act=at position ' || CAST(POSITION(CHR(10) IN last_name) AS VARCHAR(4))
WHEN POSITION(CHR(13) IN last_name) > 0 THEN 'REJ-02: Field last_name has a Carriage Return (CHR-13)|exp=none|act=at position ' || CAST(POSITION(CHR(13) IN last_name) AS VARCHAR(4))
ELSE 'P'
END AS status
FROM demo_hr.employees;Verify text field does not have tab (CHAR-9) characters. For example, to verify that the field last_name has no TABs in table employees:
SELECT last_name
, CASE WHEN POSITION(CHR(9) IN last_name) > 0 THEN 'FAIL' ELSE 'P' END AS status
FROM demo_hr.employees;Verify text field does not have non-breaking-space (CHAR-160 / "NBS") characters. For example, to verify that the field last_name has no NBS chars in table employees:
SELECT last_name
, CASE WHEN POSITION(CHR(160) IN last_name) > 0 THEN 'FAIL' ELSE 'P' END AS status
FROM demo_hr.employees;Verify text field does not have an em-dash character (CHAR-151; common Microsoft Office "--" copy-paste conversion causing data load issues). For example, to verify that the field last_name has no em-dashes in table employees:
SELECT last_name
, CASE WHEN POSITION(CHR(151) IN last_name) > 0 THEN 'FAIL' ELSE 'P' END AS status
FROM demo_hr.employees;Verify text field does not have any vertical tab (CHAR-11 / "VT"), form feed (CHAR-12 / "FF"), or next line (CHAR-133 / "NEL") characters. For example, use the SQL below to verify that the field last_name has no VT, FF, or NEL characters in table employees. Note that this SQL checks for all three characters, each on its own CASE...WHEN clause, and that it returns the location within a string where the bad character occurs.
SELECT last_name
, CASE WHEN POSITION(CHR(11) IN last_name) > 0 THEN 'REJ-01: Field last_name has a Vertical Tab (CHR-11)|exp=none|act=at position ' || CAST(POSITION(CHR(11) IN last_name) AS VARCHAR(4))
WHEN POSITION(CHR(12) IN last_name) > 0 THEN 'REJ-02: Field last_name has a Form Feed (CHR-12)|exp=none|act=at position ' || CAST(POSITION(CHR(12) IN last_name) AS VARCHAR(4))
WHEN POSITION(CHR(133) IN last_name) > 0 THEN 'REJ-03: Field last_name has a Next Line (CHR-133)|exp=none|act=at position ' || CAST(POSITION(CHR(133) IN last_name) AS VARCHAR(4))
ELSE 'P'
END AS status
FROM demo_hr.employees;Verify text field does not have any periods or dashes. For example, to verify that the field last_name has no periods or dashes in table employees:
SELECT last_name
, CASE WHEN POSITION('.' IN last_name) > 0 OR POSITION('0' IN last_name) > 0 THEN 'FAIL' ELSE 'P' END AS status
FROM demo_hr.employees;Verify text field does not have any funky ",/:()&#?;" characters. For example, to verify that the field last_name has commas, colons, etc. in table employees:
SELECT last_name
, CASE WHEN last_name ~ '[,/:()&#?;]' THEN 'FAIL' ELSE 'P' END AS status
FROM demo_hr.employees;Verify text field contains only allowed characters from a specific list. For example, use the SQL below to verify that the field phone_number in table employees only has characters ".0123456789". The LIKE expression does the work. Specifically, the []'s indicating look for these characters, and the ^ means look for any character not in this list. So it reads: "find any phone numbers containing characters not in [.0123456789]".
SELECT phone_number
, CASE WHEN phone_number ~ '[^.0123456789]' THEN 'FAIL' ELSE 'P' END AS status
FROM demo_hr.employees;Verify text field matches simple like patterns. For example, use the SQL below to verify that the field phone_number in table employees matches either the US (###.###.####) or international format (011.##.####.#####). The LIKE command use "%" to represent any number of any character and "_" to represent any single character.
SELECT phone_number
, CASE WHEN phone_number NOT LIKE '%.%' THEN 'REJ-01: Verify phone_number contains a ''.''|exp=contains-.|act=' || phone_number
WHEN phone_number NOT LIKE '___.___.____'
AND phone_number NOT LIKE '011.__.____._____%' THEN 'REJ-02: Verify phone_number like pattern "___.___.____" or "011.__.____._____"|exp=yes|act=' || phone_number
ELSE 'P'
END AS status
FROM demo_hr.employees;Verify text field is numeric. For example, use the SQL below to verify that the field zip5 in table employees is numeric.
SELECT zip5
, CASE WHEN NOT zip5 ~ '^\d+(\.\d+)?$' THEN 'FAIL' ELSE 'P' END AS status
FROM demo_hr.employees;Verify text field is a date formatted as "yyyymmdd". For example, use the SQL below to verify that the field some_date_fmt1 in table employees is date format "yyyymmdd".
More details...
- Although it might be more concise to use a regular expression to implement this validation check, I went ahead and used only the native and thus more universal commands LIKE, REPLACE(), LENGTH(), TRIM(), etc. Also, it allows for more specific rejection codes below.
- Note in the first WHEN clause the use of multiple REPLACE() commands that take a date like '20210401" and convert all numeric digits to '' such that the actual converted test string is '' to match the expected value of ''.
- Note in the second WHEN clause that the value is confirmed to be 8 characters in length
- Note in the third thru fifth WHEN clauses that each date part (year, month, day) is confirmed to be within an appropriate range.
- Note that this simple format check is not date-aware; it will not detect leap years or months with < 31 days have the wrong values in place
- Note the use of rejection codes (REJ-01, etc.) at the inner query to clearly return why the validation check failed...run the query alone on fail to see details
SELECT some_date_fmt1
, CASE WHEN REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
some_date_fmt1,'0',''),'1',''),'2',''),'3',''),'4',''),'5',''),'6',''),'7',''),'8',''),'9','')
> '' THEN 'REJ-01: Unexpected chars exist (numeric 0-9 only)|exp=Fmt="yyyymmdd"|act=' + some_date_fmt1
WHEN NOT LEN(TRIM(some_date_fmt1)) = 8 THEN 'REJ-02: Must be 8 Chars|exp=Fmt="yyyymmdd"|act=' + some_date_fmt1
WHEN NOT SUBSTRING(some_date_fmt1,1,4) BETWEEN '1753' AND '9999' THEN 'REJ-03: Year Not Btw 1753-9999|exp=Fmt="yyyymmdd"|act=' + some_date_fmt1
WHEN NOT SUBSTRING(some_date_fmt1,5,2) BETWEEN '01' AND '12' THEN 'REJ-04: Month Not Btw 01-12|exp=Fmt="yyyymmdd"|act=' + some_date_fmt1
WHEN NOT SUBSTRING(some_date_fmt1,7,2) BETWEEN '01' AND '31' THEN 'REJ-05: Day Not Btw 01-31|exp=Fmt="yyyymmdd"|act=' + some_date_fmt1
ELSE 'P'
END AS status
FROM demo_hr.employees;Verify text field is a date formatted as "mm/dd/yyyy". For example, use the SQL below to verify that the field some_date_fmt2 in table employees is date format "mm/dd/yyyy".
More details...
- Although it might be more concise to use a regular expression to implement this validation check, I went ahead and used only the native and thus more universal commands LIKE, REPLACE(), LENGTH(), TRIM(), etc. Also, it allows for more specific rejection codes below.
- Note in the first WHEN clause the use of multiple REPLACE() commands that take a date like '04/01/2021" and convert all numeric digits or the slash character to '' such that the actual converted test string is '' to match the expected value of ''.
- Note in the second WHEN clause that the value is confirmed to be 10 characters in length
- Note in the third thru fifth WHEN clauses that each date part (year, month, day) is confirmed to be within an appropriate range.
- Note that this simple format check is not date-aware; it will not detect leap years or months with < 31 days have the wrong values in place
- Note the use of rejection codes (REJ-01, etc.) at the inner query to clearly return why the validation check failed...run the query alone on fail to see details
SELECT some_date_fmt2
, CASE WHEN REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
some_date_fmt2,'0',''),'1',''),'2',''),'3',''),'4',''),'5',''),'6',''),'7',''),'8',''),'9',''),'/','')
> '' THEN 'REJ-01: Unexpected Chars Exist|exp=Fmt="mm/dd/yyyy"|act=' + some_date_fmt2
WHEN NOT LEN(TRIM(some_date_fmt2)) = 10 THEN 'REJ-02: Must be 10 Chars|exp=Fmt="mm/dd/yyyy"|act=' + some_date_fmt2
WHEN NOT SUBSTRING(some_date_fmt2,7,4) BETWEEN '1753' AND '9999' THEN 'REJ-03: Year Not Btw 1753-9999|exp=Fmt="mm/dd/yyyy"|act=' + some_date_fmt2
WHEN NOT SUBSTRING(some_date_fmt2,1,2) BETWEEN '01' AND '12' THEN 'REJ-04: Month Not Btw 01-12|exp=Fmt="mm/dd/yyyy"|act=' + some_date_fmt2
WHEN NOT SUBSTRING(some_date_fmt2,4,2) BETWEEN '01' AND '31' THEN 'REJ-05: Day Not Btw 01-31|exp=Fmt="mm/dd/yyyy"|act=' + some_date_fmt2
ELSE 'P'
END AS status
FROM demo_hr.employees;Verify text field is a date formatted as "mm-dd-yyyy". For example, use the SQL below to verify that the field some_date_fmt3 in table employees is date format "mm-dd-yyyy".
More details...
- Although it might be more concise to use a regular expression to implement this validation check, I went ahead and used only the native and thus more universal commands LIKE, REPLACE(), LENGTH(), TRIM(), etc. Also, it allows for more specific rejection codes below.
- Note in the first WHEN clause the use of multiple REPLACE() commands that take a date like '04-01-2021" and convert all numeric digits or the dash character to '' such that the actual converted test string is '' to match the expected value of ''.
- Note in the second WHEN clause that the value is confirmed to be 10 characters in length
- Note in the third thru fifth WHEN clauses that each date part (year, month, day) is confirmed to be within an appropriate range.
- Note that this simple format check is not date-aware; it will not detect leap years or months with < 31 days have the wrong values in place
- Note the use of rejection codes (REJ-01, etc.) at the inner query to clearly return why the validation check failed...run the query alone on fail to see details
SELECT some_date_fmt3
, CASE WHEN REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
some_date_fmt3,'0',''),'1',''),'2',''),'3',''),'4',''),'5',''),'6',''),'7',''),'8',''),'9',''),'-','')
> '' THEN 'REJ-01: Unexpected Chars Exist|exp=Fmt="mm-dd-yyyy"|act=' + some_date_fmt3
WHEN NOT LEN(TRIM(some_date_fmt3)) = 10 THEN 'REJ-02: Must be 10 Chars|exp=Fmt="mm-dd-yyyy"|act=' + some_date_fmt3
WHEN NOT SUBSTRING(some_date_fmt3,7,4) BETWEEN '1753' AND '9999' THEN 'REJ-03: Year Not Btw 1753-9999|exp=Fmt="mm-dd-yyyy"|act=' + some_date_fmt3
WHEN NOT SUBSTRING(some_date_fmt3,1,2) BETWEEN '01' AND '12' THEN 'REJ-04: Month Not Btw 01-12|exp=Fmt="mm-dd-yyyy"|act=' + some_date_fmt3
WHEN NOT SUBSTRING(some_date_fmt3,4,2) BETWEEN '01' AND '31' THEN 'REJ-05: Day Not Btw 01-31|exp=Fmt="mm-dd-yyyy"|act=' + some_date_fmt3
ELSE 'P'
END AS status
FROM demo_hr.employees;Verify text field is a date formatted as "yyyy-mm-dd". For example, use the SQL below to verify that the field some_date_fmt4 in table employees is date format "yyyy-mm-dd".
More details...
- Although it might be more concise to use a regular expression to implement this validation check, I went ahead and used only the native and thus more universal commands LIKE, REPLACE(), LENGTH(), TRIM(), etc. Also, it allows for more specific rejection codes below.
- Note in the first WHEN clause the use of multiple REPLACE() commands that take a date like '2021-04-01" and convert all numeric digits or the dash character to '' such that the actual converted test string is '' to match the expected value of ''.
- Note in the second WHEN clause that the value is confirmed to be 10 characters in length
- Note in the third thru fifth WHEN clauses that each date part (year, month, day) is confirmed to be within an appropriate range.
- Note that this simple format check is not date-aware; it will not detect leap years or months with < 31 days have the wrong values in place
- Note the use of rejection codes (REJ-01, etc.) at the inner query to clearly return why the validation check failed...run the query alone on fail to see details
SELECT some_date_fmt4
, CASE WHEN REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
some_date_fmt4,'0',''),'1',''),'2',''),'3',''),'4',''),'5',''),'6',''),'7',''),'8',''),'9',''),'-','')
> '' THEN 'REJ-01: Unexpected Chars Exist|exp=Fmt="yyyy-mm-dd"|act=' + some_date_fmt4
WHEN NOT LEN(TRIM(some_date_fmt4)) = 10 THEN 'REJ-02: Must be 10 Chars|exp=Fmt="yyyy-mm-dd"|act=' + some_date_fmt4
WHEN NOT SUBSTRING(some_date_fmt4,1,4) BETWEEN '1753' AND '9999' THEN 'REJ-03: Year Not Btw 1753-9999|exp=Fmt="yyyy-mm-dd"|act=' + some_date_fmt4
WHEN NOT SUBSTRING(some_date_fmt4,6,2) BETWEEN '01' AND '12' THEN 'REJ-04: Month Not Btw 01-12|exp=Fmt="yyyy-mm-dd"|act=' + some_date_fmt4
WHEN NOT SUBSTRING(some_date_fmt4,9,2) BETWEEN '01' AND '31' THEN 'REJ-05: Day Not Btw 01-31|exp=Fmt="yyyy-mm-dd"|act=' + some_date_fmt4
ELSE 'P'
END AS status
FROM demo_hr.employees;