While I was testing on Oracle SQL:
- missed LEFT JOINs (it was easy to add to the code)
- missed table names starting with schema name (SCHEMA.TABLE)
- missed aliases without AS (this is mentioned in the code but not handled)
- missed recognizing subselects (e.g. FROM (SELECT ... ) )
I touched the code to overcome these issues and found solution for them. Tested with a huge SQL but may need to do some more testing.
New code:
def get_query_table_aliases(query: str) -> Dict[str, str]:
"""
Returns tables aliases mapping from a given query
E.g. SELECT a.* FROM users1 AS a JOIN users2 AS b ON a.ip_address = b.ip_address
will give you {'a': 'users1', 'b': 'users2'}
"""
aliases = dict()
last_keyword_token = None
last_table_name = None
for token in get_query_tokens(query):
# print(token.ttype, token, last_table_name)
# handle "FROM foo alias" syntax (i.e, "AS" keyword is missing)
# if last_table_name and token.ttype is Name:
# aliases[token.value] = last_table_name
# last_table_name = False
if last_table_name:
if token.value=='.':
last_table_name = last_table_name + token.value #add the dot
if token.value==',' or token.is_keyword and token.value.upper()!='AS': #there is no alias
aliases[''] = last_table_name
last_table_name = False
if prev_token.value.upper()=='AS': #previous keyword was AS then we found the alias
aliases[token.value] = last_table_name
last_table_name = False
if token.ttype is Name:
if prev_token.value=='.':
last_table_name = last_table_name + token.value #add Name to last_table_name
else: #found alias
aliases[token.value] = last_table_name
last_table_name = False
if last_keyword_token:
if last_keyword_token.value.upper() in ["FROM", "JOIN", "INNER JOIN","LEFT JOIN"] and token.value!='(':
last_table_name = token.value
last_keyword_token = token if token.is_keyword else False
prev_token = token
return aliases
While I was testing on Oracle SQL:
I touched the code to overcome these issues and found solution for them. Tested with a huge SQL but may need to do some more testing.
New code: