New parser, bug fixes and new functionality.#121
Conversation
|
Wow, this is massive! Thanks, @collerek. I'll take a deeper look into your changes in the following days. |
That's my idea behind aliases handling too. I'd like Hence, It's a more complex topic, so let's move the discussion to a different PR / issue and continue with the refactoring / new API here.
I do agree! Thank you very much! |
Co-authored-by: Maciej Brencz <maciej.brencz@gmail.com>
Co-authored-by: Maciej Brencz <maciej.brencz@gmail.com>
Co-authored-by: Maciej Brencz <maciej.brencz@gmail.com>
…x aliases test, remove double black from pyproject.toml, fix date function format in complex query in sql file
Please do so. |
Co-authored-by: Maciej Brencz <maciej.brencz@gmail.com>
|
Wow! That was quite a journey with code reviewing your changes, @collerek. Let's address the remaining open comments and we're ready to merge this one. Once again - dzięki 🙂 🇵🇱 |
|
Hope that's not due to the crappy quality of my code 🤣 Polecam się na przyszłość i dzięki za super libkę 😉 🇵🇱 |
Not at all. |
Let's tackle aliases in the next PR, ok? I'll complete the review and merge this one tomorrow. |
Sure, that's what I had in mind, saw that you added more issues to v.2.0 milestone that's why I'm asking if you are merging this one or wait for some other stuff. We still have two new issues coming from this one (ability to pass file (or file path?) and refactoring token logic into token init, that's why I'm asking what is the plan, no rush or whatever, just curious :) |
Co-authored-by: Maciej Brencz <maciej.brencz@gmail.com>
|
I just realized that removing comments in _preprocess in init is not really necessary and since we use sqlparse now in remove_comments it parses the query two times which hits performance, maybe we should revert and remove the step if it's working without it anyway (more tokens in parsed query with comments but I think it's still quicker than removing comments)? Can you check performance on your machine? (as it is now and without step 3 in preprocess) |
Co-authored-by: Maciej Brencz <maciej.brencz@gmail.com>
Sure, please do that and add a test case in
|
| # normalize newlines | ||
| assert ( | ||
| Parser("SELECT /*my random comment*/ foo, id FROM `db`.`test`").query | ||
| == "SELECT /*my random comment*/ foo, id FROM db.test" | ||
| ) | ||
|
|
I reverted the step 3 and added test in preprocess but since we don't remove it there is not much to test 😅 |
|
We're ready to merge once CI passes 🙂 Thanks for your great work here, @collerek. Looking forward to your next PRs and ideas. Have a good one. |
Hi, first of all thanks for an awesome job with the library!
The rules you invented to classify given types of metadata (columns, tables etc.) were of great value!
Now I needed something like #78 so I had a go for it.
New Parser
In the process I read your #83 and #98 issues and they contain API that I like much more than the old one, so I decided to try implementing the new parser and iteration described in those issues along the way (more or less - i.e. you don't write how would you like to treat multi - token names (like
db.schema.tableetc.). So it's not exactly how you described it but it's close and I think this is the step in the right direction to get it how you see it in the future.Functionalities
Apart from implementing #83 and #98 the new functionalities are:
select, where, order_by, join, insert and update). Should close Identify which part of the query columns are a part of #78. (tested)askeyword. Should close get_query_table_aliases(query) is far from being final #97 (in much simpler way) (tested)Bug fixes
Fixed yet unreported bugs:
update table set column = 'aa') etc.Tests
All tests that were commented out or skipped now run and pass. I divided tests into smaller functions and multiple files to allow easier testing of selected sections of functionality if something fails. Coverage 100% and new tests for new functionalities.
So basically it should close all issues but #35 and at the same time introduces a lot of improvements etc.
Open issues
Now I also want to introduce column aliases but got stuck here a little as some issues require clarification.
Column aliases
As of now when you have a query like:
Select column1 as myname from tablebothcolumn1andmynamego into the columns, while they are the same column.I believe aliases like this should be resolved back to the most base column, and aliases like
mynameshould not end up in columns but I need your opinion on this. That also leaves cases likeselect count(*) as counter, shouldcounterbe included in columns?Motivation
My use case is that I need all real (and only real) database columns used in a parsed query to later reflect used databases and check column types. That means that (in perfect world) I would get:
select col1 as myname from tableshould return onlytable.col1in columnsselect sub.test1 from (select aa as test1 from table) subshould return columns = ["table.aa"] as sub.test1 is a subquery alias and test1 is alias of aa.select aa.col1, col2 from table aashould return ["table.col1", "table.col2"] and there should be a way of returning column with table in a dict. Of course if table cannot be resolved (like select aa, bb from tab1, tab2` where you cannot tell which column is from which table both column should return a list of ["tab1", "tab2"]I can already get and extract subqueries but would need column aliases (and later optionally best match assigned tables to columns). But for that I need confirmation from you that I think right and aliases of columns should not end up in columns. Or alternatively there should be a way to get only aliases (as they are in end result of the query i.e. in select) or only the real database columns.
In my opinion the end product is much more concise, cleaner and easier to maintain, but it's basically re-written from scratch so let me know what you think! :) I updated readme so best to start there, rest is in tests since you do not have documentation.
Resolves #103, resolves #83, resolves #98, resolves #78 and resolves #120.