-
Notifications
You must be signed in to change notification settings - Fork 31
Add data aggregations to data preparation #49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
3c7774c to
f696eb7
Compare
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
…ggregations Signed-off-by: Frankie Siino <fsiino@nvidia.com>
…odel Signed-off-by: Frankie Siino <fsiino@nvidia.com>
nemo_gym/train_data_utils.py
Outdated
| self, dataset_config: DatasetConfig | ||
| ) -> DatasetValidatorState: | ||
| state = DatasetValidatorState() | ||
| data = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having data be a list here is going to increase memory consumption like crazy. let's fold the aggregate_other_metrics call into _validate_samples_and_aggregate_metrics_single_sample
nemo_gym/dataset_viewer.py
Outdated
| def get_aggregate_metrics(data: List[DatasetViewerVerifyResponse], raw_lines: List[str]) -> Dict[str, Any]: | ||
| def get_aggregate_metrics(raw_lines: List[str]) -> Dict[str, Any]: | ||
| dataset_metrics = DatasetMetrics() | ||
| line_dicts = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as below. ideally we would try to save on the memory here
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
This change updates the train_data_utils via `ng_prepare_data` to apply data aggregations to the other keys within an `example.jsonl`. file. --------- Signed-off-by: Frankie Siino <fsiino@nvidia.com> Co-authored-by: bxyu-nvidia <bxyu@nvidia.com>
This change updates the train_data_utils via `ng_prepare_data` to apply data aggregations to the other keys within an `example.jsonl`. file. --------- Signed-off-by: Frankie Siino <fsiino@nvidia.com> Co-authored-by: bxyu-nvidia <bxyu@nvidia.com>
This change updates the train_data_utils via
ng_prepare_datato apply data aggregations to the other keys within anexample.jsonl. file.