Skip to content

Improve get_query_tokens iterator #98

@macbre

Description

@macbre

Currently it returns a list of sqlparse.sql.Token objects. In all case sql-metadata code tries to keep the state of previous keywords as it iterates over tokens.

Instead of Token object return QueryTokenState dataclass with:

@dataclasses.dataclass
class SQLToken:
    value: str
    is_keyword: bool
    is_name: bool
    is_punctuation: bool
    is_wildcard: bool

    # and the state
    last_keyword: Optional[str]  # uppercased
    previous_token: Optional[Token]

get_query_tokens will be responsible for keeping the state in returned tokens.

Token will also have sub-classes that will indicate their "function" within the query:

  • table names
  • keywords
  • column names
  • functions

http://datacharmer.blogspot.com/2008/03/mysql-proxy-recipes-tokenizing-query.html

MySQL Proxy ships equipped with a tokenizer, a method that, given a query, returns its components as an array of tokens. Each token contains three elements:
name, which is a human readable name of the token (e.g. TK_SQL_SELECT)
id, which is the identifier of the token (e.g. 204)
text, which is the content of the token (e.g. "select").
For example, the query SELECT 1 FROM dual will be returned as the following tokens:

1:
   text          select
   token_name    TK_SQL_SELECT'
   token_id      204
2: 
   text          1
   token_name    TK_INTEGER
   token_id      11
3:
   text          from
   token_name    TK_SQL_FROM
   token_id      105
4: 
   text          dual
   token_name    TK_SQL_DUAL
   token_id      87

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions