Chore: Improve perf benchmarking#299
Conversation
WalkthroughWalkthroughThe recent changes improve the time formatting in Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant User as User
participant Script as run_perf_test_reads_2.py
participant Cache as Cache (DuckDB/Snowflake/BigQuery)
participant Source as Source
User->>Script: Run with -e=SCALE
Script->>Source: Configure source
Script->>Cache: Set up cache type
Script->>Source: Check connection
Source-->>Script: Connection successful
Script->>Source: Read data
Source->>Cache: Read data from cache
Cache-->>Source: Data read successfully
Source-->>Script: Data read complete
Script-->>User: Performance profiling result
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 3
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (3)
- airbyte/progress.py (1 hunks)
- examples/run_perf_test_reads.py (1 hunks)
- examples/run_perf_test_reads_2.py (1 hunks)
Files skipped from review due to trivial changes (1)
- examples/run_perf_test_reads.py
Additional comments not posted (4)
examples/run_perf_test_reads_2.py (2)
1-42: LGTM!The header and docstring provide a clear and detailed overview of the script's purpose and usage.
44-57: LGTM!The imports and constants are appropriate and necessary for the script's functionality.
airbyte/progress.py (2)
96-114: LGTM!The modifications improve the readability and accuracy of the elapsed time strings.
Line range hint
188-191:
LGTM!The usage of
_get_elapsed_time_stris consistent and appropriate.
There was a problem hiding this comment.
Actionable comments posted: 1
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- examples/run_perf_test_reads_2.py (1 hunks)
Additional context used
Ruff
examples/run_perf_test_reads_2.py
48-48:
pathlib.Pathimported but unusedRemove unused import:
pathlib.Path(F401)
Additional comments not posted (4)
examples/run_perf_test_reads_2.py (4)
55-55: LGTM!The constant
AIRBYTE_INTERNAL_GCP_PROJECTis correctly defined and used.
66-67: Replaceassertwith proper error handling.Using
assertfor error handling is not recommended in production code. Consider raising an exception instead.- assert secret is not None, "Secret not found." + if secret is None: + raise ValueError("Secret not found.")
70-127: Refactor cache initialization into a separate function.To improve readability and maintainability, consider refactoring the cache initialization into a separate function.
def initialize_cache(cache_type: str) -> CacheBase: if cache_type == "duckdb": return ab.new_local_cache() elif cache_type == "snowflake": secret_config = get_gsm_secret_json(secret_name="AIRBYTE_LIB_SNOWFLAKE_CREDS") return SnowflakeCache( account=secret_config["account"], username=secret_config["username"], password=secret_config["password"], database=secret_config["database"], warehouse=secret_config["warehouse"], role=secret_config["role"], ) elif cache_type == "bigquery": temp = tempfile.NamedTemporaryFile(mode="w+", delete=False, encoding="utf-8") secret_config = get_gsm_secret_json(secret_name="SECRET_DESTINATION-BIGQUERY_CREDENTIALS__CREDS") try: temp.write(secret_config["credentials_json"]) temp.flush() finally: temp.close() return BigQueryCache( project_name=secret_config["project_id"], dataset_name=secret_config.get("dataset_id", "pyairbyte_integtest"), credentials_path=temp.name, ) def main(e: int | None = None, n: int | None = None, cache_type: str = "duckdb") -> None: num_records: int = n or 5 * (10 ** (e or 3)) cache = initialize_cache(cache_type) source = ab.get_source( "source-e2-test", # No-op source docker_image="airbyte/source-e2e-test:cg10", streams="*", config={ "type": "BENCHMARK", "schema": "FIVE_STRING_COLUMNS", "terminationCondition": { "type": "MAX_RECORDS", "max": num_records, }, }, ) source.check() source.read(cache)
144-148: Clarify the help text for the-nargument.The help text for the
-nargument is slightly misleading. It should indicate that the value is the exact record count, not a recommended value.- "Recommended values: 2-3 (500 or 5_000) for read and overhead costs, " - " 4-6 (50K or 5MM) for write throughput. " + "The exact record count."
There was a problem hiding this comment.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- examples/run_perf_test_reads.py (6 hunks)
Files skipped from review as they are similar to previous changes (1)
- examples/run_perf_test_reads.py
Summary by CodeRabbit
New Features
viztracerfor profiling.Enhancements
Documentation