Skip to content

Is there a way to reduce memory utilization in dictPageReader/PageReader ? #90

@vikramarsid

Description

@vikramarsid

Describe the bug
I am trying to load multiple concurrent parquet files into memory and try to read them row by row. I am facing OOM issue while I read 10 concurrent file of 50MB each. Do you see any obvious things in the call graph ? Thank you!!

Unit test to reproduce
Please provide a unit test, either as a patch or text snippet or link to your fork. If you can't isolate it into a unit test then please provide steps to reproduce.

parquet-go specific details

  • What version are you using? v0.11.0

Misc Details

  • Are you using AWS Athena, Google BigQuery, presto... ? AWS S3
  • Any other relevant details... how big are the files / rowgroups you're trying to read/write? 10 - 100 MB
  • Do you have memory stats to share? Yes
  • Can you provide a stacktrace? Yes

parquet-go-pprof

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions