Skip to content

csv backend updates#645

Merged
bghira merged 8 commits intobghira:fix/csvfrom
williamzhuk:patch-1
Aug 23, 2024
Merged

csv backend updates#645
bghira merged 8 commits intobghira:fix/csvfrom
williamzhuk:patch-1

Conversation

@williamzhuk
Copy link
Copy Markdown
Contributor

  • hashing instead of shortening
  • csv.py renamed to csv_.py to avoid conflict with pandas internal csv.py

- used hashing of filenames instead of shortening
- csv_ instead of csv to avoid potential issues with importing pandas (pandas can get confused as it has its own internal csv.py)
- a bit of debugging of some issues
Comment thread helpers/data_backend/csv_.py Outdated
if isinstance(location, str) or isinstance(location, Path):
if location not in self.df.index:
self.df.loc[location] = pd.Series()
location = path_to_hashed_path(location, self.hash_filenames)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we update this method to use clone() before saving, like the local backend does?

Comment thread helpers/data_backend/factory.py Outdated
from helpers.data_backend.local import LocalDataBackend
from helpers.data_backend.aws import S3DataBackend
from helpers.data_backend.csv import CSVDataBackend
from helpers.data_backend.csv_ import CSVDataBackend
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would prefer we call it csv_url_list if we're changing it

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from helpers.data_backend.csv_ import CSVDataBackend
from helpers.data_backend.csv_url import CSVDataBackend

Copy link
Copy Markdown
Owner

@bghira bghira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the updates, hashing things makes sense. but i'd like the comments resolved first 👍

@bghira
Copy link
Copy Markdown
Owner

bghira commented Aug 14, 2024

i updated it to use file extensions instead of str_pattern to search. it still works. @williamzhuk can you follow-up with the other changes?

williamzhuk and others added 3 commits August 19, 2024 09:44
Co-authored-by: Bagheera <59658056+bghira@users.noreply.github.com>
@bghira bghira changed the base branch from main to fix/csv August 23, 2024 13:44
@bghira bghira merged commit eb92763 into bghira:fix/csv Aug 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants