Skip to content

sources.takeout: add support for new youtube csv format#436

Merged
karlicoss merged 3 commits intokarlicoss:masterfrom
purarue:youtube-chats-csv
Mar 31, 2024
Merged

sources.takeout: add support for new youtube csv format#436
karlicoss merged 3 commits intokarlicoss:masterfrom
purarue:youtube-chats-csv

Conversation

@purarue
Copy link
Contributor

@purarue purarue commented Mar 10, 2024

google takeout recently changed the format to CSV files for youtube comments, I added support for it to google_takeout_parser a few weeks ago.

I haven't taken a stab at trying to de-dupe comments that exist in the old HTML format and the new CSV one yet, it is on my todos, but I thought it would be good to get this in here so that new people making an export can at least get access to their comments. There might be some duplication but better than erroring or not existing

this is very basic right now, it does not have any error checking, so if the user is on an old version of google_takeout_parser, this will just error. Should I add a warning message in the ImportError reminding them to upgrade? Wasnt sure if that was too much

If theres anything else you think should be changed/added for this, let me know

@purarue
Copy link
Contributor Author

purarue commented Mar 10, 2024

hmm, looks like hypothesis test data may be gone:

  Error: fatal: repository 'https://github.com/judell/Hypothesis.git/' not found
  Error: fatal: clone of 'https://github.com/judell/Hypothesis.git' into submodule path '/home/runner/work/promnesia/promnesia/tests/testdata/hypexport/src/hypexport/Hypothesis' failed
  Failed to clone 'src/hypexport/Hypothesis' a second time, aborting

@karlicoss
Copy link
Owner

Yeah also just noticed the CI stuff -- fixed here karlicoss/hypexport@b9f1cab (has some explanation why I used a submodule in the first place). If you rebase should hopefully all good!

@karlicoss
Copy link
Owner

And thanks for the change! Haven't seen this data yet I think, but haven't done exports for some months
Yeah, I think it's worth making these new imports more defensive, otherwise the whole data source will go down. I would probably try to import new ones separetely, if that fails -- warn/emit exception -- and could also assign the 'new' imports to some dummy class
e.g.

class dummy:
    pass

CSVYoutubeLiveChat = dummy

that way the rest of the code with isinstance checks won't need changes

@purarue
Copy link
Contributor Author

purarue commented Mar 13, 2024

yep, gotcha

im a bit busy for the next few days but will get to that when I have some time

@purarue
Copy link
Contributor Author

purarue commented Mar 15, 2024

have not tested on old version yet, but I think something like this should work

will test on old/new versions of google_takeout_parser later and let you know

does look like it at least works on new version:

[INFO    2024-03-15 15:03:11 promnesia dump.py:182] database stats changes: browser +92
[INFO    2024-03-15 15:03:11 promnesia dump.py:182] database stats changes: error -2
[INFO    2024-03-15 15:03:11 promnesia dump.py:182] database stats changes: promnesia_sean.sources.zsh +2
[INFO    2024-03-15 15:03:11 promnesia dump.py:182] database stats changes: takeout +154
[ ~ ] $

@karlicoss
Copy link
Owner

whoops, forgot to press merge! thanks

@karlicoss karlicoss merged commit 31ee24b into karlicoss:master Mar 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants