Fix duplicate nodes being removed in tree imports#7863
Fix duplicate nodes being removed in tree imports#7863alesan99 wants to merge 13 commits intov7.12.0-prereleasefrom
Conversation
|
I have personally checked the following trees so far:
This tree looks like it gains extra records
Though this is likely only because the tree file is formatted differently. It should still be the same number of records if reformatted, but I haven't verified it. |
lexiclevenger
left a comment
There was a problem hiding this comment.
- The taxon tree in Specify and the csv file should have the same number of trees.
Based on the query below, the counts for Ichthyology and Mammology match those in the CSV.
The following do not match::
Fungi: CSV 159,229, Query, 159,232
Herpetology: CSV 24,730, Query 24,732
Invertebrate: CSV 79,276 Query 79,282
Database: lsumz_herps_2025_09_09
There was a problem hiding this comment.
- The taxon tree in Specify and the csv file should have the same number of trees.
--
There is discrepancy in some trees the number of records uploaded versus the number of records in the query.
Botany (bryophyta):
Uploaded: 14425
Queried: 14428
Mammals:
Uploaded: 13557
Queried: 13557
Minerals:
Uploaded: 6202
Queried: 6202
I am changing requests because I had mixed results and also the number of records uploaded were different from the .json file. (eg. minerals was 6189 rows but 6202 got uploaded).
|
It looks like nodes were being added to the incorrect parents all along (even on main) if there were multiple parents with the same name in the same rank. |
bhumikaguptaa
left a comment
There was a problem hiding this comment.
- The taxon tree in Specify and the csv file should have the same number of trees.
--
It works as expected. For botany trees the number of records match the ones in the query. However, I think it is worth noting that the file for geology shows the wrong number of rows, and I verified that the correct number is 6202 (which is the number being uploaded).
|
@bhumikaguptaa I've updated this! It should be
|
There was a problem hiding this comment.
Stated is what is shown in the progress dialog as the total. Actual is the result of the query
Mammals ⚠️
Stated: 13,556
Actual: 13,640
Minerals ✅
'Root' is not counted
Stated: 6,201
Actual: 6,202
Ichthyology ⚠️
Stated: 37,958
Actual: 38,175
Bryophyta ⚠️
Stated: 14,425
Actual: 14,512
kwhuber
left a comment
There was a problem hiding this comment.
Riffing off @grantfitzsimmons stylistically pleasing review:
Stated is what is shown in the progress dialog as the total (and the CSV). Actual is the result of the query
Aves ⚠️
Stated: 34,038
Actual: 34,263
Fungi ⚠️
Stated: 159,228
Actual: 160,294
Herpetology ⚠️
Stated: 24,730
Actual: 24,880
Mammalogy ⚠️
Stated: 13,556
Actual: 13,640
Tested with the query:
Looks like there are still some inconsistencies
|


Fixes #7853
Tree imports de-duplicate nodes with the same name (ParentNode -> ChildNode1 and ParentNode -> ChildNode2 need to have the same).
However, it looks like some tree files intentionally have non-parent nodes with the same names. To allows those to be imported, I disabled de-duplication for leaf nodes.
Dev note: Also added some minor fixes to some mistakes I saw in the code
Checklist
self-explanatory (or properly documented)
specify7/specifyweb/specify/management/commands/run_key_migration_functions.py
Line 50 in ea04665
Testing instructions
"file":)