diff --git a/SQL/Leetcode/Basic select/586. Customer Placing the Largest Number of Orders/gpt/Customer_Placing_the_Largest_Number_of_Orders_mysql.ipynb b/SQL/Leetcode/Basic select/586. Customer Placing the Largest Number of Orders/gpt/Customer_Placing_the_Largest_Number_of_Orders_mysql.ipynb new file mode 100644 index 00000000..6a134279 --- /dev/null +++ b/SQL/Leetcode/Basic select/586. Customer Placing the Largest Number of Orders/gpt/Customer_Placing_the_Largest_Number_of_Orders_mysql.ipynb @@ -0,0 +1,227 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "eeeadbe0", + "metadata": {}, + "source": [ + "# MySQL 8.0.40\n", + "\n", + "## 0) 前提\n", + "\n", + "* エンジン: **MySQL 8**\n", + "* 並び順: 任意(`ORDER BY` を付けない)\n", + "* `NOT IN` は NULL 罠のため回避\n", + "* 判定は **ID 基準**、表示は仕様どおりの列名と順序\n", + "\n", + "## 1) 問題\n", + "\n", + "* `Orders` から **最も多く注文を行った顧客の `customer_number`** を返す\n", + " (テストでは最大が一意。Follow up: 同数最大が複数いても全件返す)\n", + "\n", + "* 入力テーブル例:\n", + "\n", + " ```markdown\n", + " Table: Orders\n", + " +-----------------+----------+\n", + " | Column Name | Type |\n", + " +-----------------+----------+\n", + " | order_number | int | -- PK\n", + " | customer_number | int |\n", + " +-----------------+----------+\n", + " ```\n", + "\n", + "* 出力仕様:\n", + "\n", + " ```markdown\n", + " +-----------------+\n", + " | customer_number |\n", + " +-----------------+\n", + " ```\n", + "\n", + "## 2) 最適解(単一クエリ)\n", + "\n", + "> **ウィンドウ関数+事前集計**で 1 クエリ。`OVER` 内の `ORDER BY` は順位付けのためで、最終 `SELECT` に `ORDER BY` は不要。\n", + "\n", + "```sql\n", + "WITH cnt AS (\n", + " SELECT\n", + " customer_number,\n", + " COUNT(*) AS order_cnt\n", + " FROM Orders\n", + " GROUP BY customer_number\n", + "),\n", + "win AS (\n", + " SELECT\n", + " customer_number,\n", + " DENSE_RANK() OVER (ORDER BY order_cnt DESC) AS rnk\n", + " FROM cnt\n", + ")\n", + "SELECT\n", + " customer_number\n", + "FROM win\n", + "WHERE rnk = 1;\n", + "\n", + "Runtime 429 ms\n", + "Beats 75.41%\n", + "\n", + "```\n", + "\n", + "* これで **一意最大**も**同数最大が複数**も対応(Follow up 充足)\n", + "\n", + "## 3) 代替解\n", + "\n", + "> **最大値をサブクエリで求めて一致フィルタ**。ウィンドウが重い環境や互換用に。\n", + "\n", + "```sql\n", + "WITH cnt AS (\n", + " SELECT customer_number, COUNT(*) AS order_cnt\n", + " FROM Orders\n", + " GROUP BY customer_number\n", + "),\n", + "mx AS (\n", + " SELECT MAX(order_cnt) AS max_cnt FROM cnt\n", + ")\n", + "SELECT c.customer_number\n", + "FROM cnt AS c\n", + "JOIN mx ON c.order_cnt = mx.max_cnt;\n", + "\n", + "Runtime 469 ms\n", + "Beats 41.83%\n", + "\n", + "```\n", + "\n", + "※ `NOT IN` は未使用。`ORDER BY ... LIMIT 1` でも実現できるが、本要件では**結果順は任意**かつ **最大同率全件**を自然に返せる上記方式が安全。\n", + "\n", + "## 4) 要点解説\n", + "\n", + "* **方針**: まず `customer_number` 単位で件数を集計 → 上位判定(`DENSE_RANK` か `MAX` 照合) → 仕様列のみ投影。\n", + "* **NULL / 重複**:\n", + "\n", + " * `customer_number` が NULL の行が存在するなら集計前に `WHERE customer_number IS NOT NULL` を入れる(問題文では想定外だが堅牢性の観点)。\n", + " * `order_number` は PK のため重複はなし。\n", + "* **安定性**: 出力順は問わないため **最終 `ORDER BY` は不要**。上位選別はウィンドウ内で完結。\n", + "\n", + "## 5) 計算量(概算)\n", + "\n", + "* `GROUP BY` 集計: **O(N)**~**O(N log N)**(ヒープ/ソート次第)\n", + "* ウィンドウ `DENSE_RANK`(代替は `MAX` 照合): 集計後のユニーク顧客数を M として **O(M log M)**(内部ソート含む)\n", + " ※ 代替解(`MAX` 照合)は **O(M)** でやや軽量。\n", + "\n", + "## 6) 図解(Mermaid 超保守版)\n", + "\n", + "```mermaid\n", + "flowchart TD\n", + " A[Orders] --> B[customer_number ごとに COUNT]\n", + " B --> C[順位付け DENSE_RANK または MAX 照合]\n", + " C --> D[rnk=1 または order_cnt=max_cnt を抽出]\n", + " D --> E[出力 customer_number]\n", + "```\n" + ] + }, + { + "cell_type": "markdown", + "id": "4392213e", + "metadata": {}, + "source": [ + "## 結論(用途別ベスト)\n", + "\n", + "### 1) 最大が一意(本問題の前提)なら最短クエリが最速になりやすい\n", + "\n", + "```sql\n", + "SELECT customer_number\n", + "FROM Orders\n", + "GROUP BY customer_number\n", + "ORDER BY COUNT(*) DESC\n", + "LIMIT 1;\n", + "\n", + "Runtime 470 ms\n", + "Beats 41.18%\n", + "\n", + "```\n", + "\n", + "* 余計な CTE・結合・ウィンドウ不要。\n", + "* `GROUP BY` 後に件数降順で **先頭 1 件だけ**返すので、実装や実行計画的にもシンプル。\n", + "\n", + "### 2) 同数最大をすべて返したい(Follow-up 汎用)\n", + "\n", + "**ウィンドウ関数なし**で 1 クエリ:\n", + "\n", + "```sql\n", + "SELECT customer_number\n", + "FROM Orders\n", + "GROUP BY customer_number\n", + "HAVING COUNT(*) = (\n", + " SELECT COUNT(*) AS mx\n", + " FROM Orders\n", + " GROUP BY customer_number\n", + " ORDER BY mx DESC\n", + " LIMIT 1\n", + ");\n", + "\n", + "Runtime 433 ms\n", + "Beats 72.43%\n", + "\n", + "```\n", + "\n", + "* 内側サブクエリで「最大件数」だけを 1 行取得 → 外側で一致フィルタ。\n", + "* あなたの `cnt→mx→JOIN` 版よりも結合が無いぶん軽くなることが多いです。\n", + "\n", + "### 3) ウィンドウ関数派(可読性重視)なら CTE をやめて 1 段で\n", + "\n", + "```sql\n", + "SELECT customer_number\n", + "FROM (\n", + " SELECT\n", + " customer_number,\n", + " DENSE_RANK() OVER (ORDER BY COUNT(*) DESC) AS rnk\n", + " FROM Orders\n", + " GROUP BY customer_number\n", + ") t\n", + "WHERE rnk = 1;\n", + "\n", + "Runtime 461 ms\n", + "Beats 48.24%\n", + "\n", + "```\n", + "\n", + "* `COUNT(*)` を直接 `ORDER BY` に使い、そのまま `DENSE_RANK`。\n", + "* 中間 `cnt` CTE のマテリアライズを避けられる分だけ有利になる場合があります。\n", + "\n", + "---\n", + "\n", + "## 実務チューニングのヒント\n", + "\n", + "1. **インデックス**\n", + "\n", + "```sql\n", + "CREATE INDEX idx_orders_customer ON Orders(customer_number);\n", + "```\n", + "\n", + "* `GROUP BY customer_number` の集約が大幅にラクになります(全表スキャン回避/ソート削減)。\n", + "\n", + "2. **CTE は必要最小限に**\n", + " MySQL 8 では CTE がマテリアライズされるケースがあり、単回参照の中間表は**派生表**に畳んだ方が速いことが多いです(上の「3)」がそれ)。\n", + "\n", + "3. **LeetCode の Runtime はノイズ大**\n", + " 実環境では **EXPLAIN** で実行計画を確認し、`rows` 見積もり・ファイルソート有無・テンポラリ使用などをチェックしてく さい。\n", + "\n", + "---\n", + "\n", + "## まとめ\n", + "\n", + "* **一意最大前提**なら ⇒ `GROUP BY ... ORDER BY COUNT(*) DESC LIMIT 1` が最有力。\n", + "* **同率最大も返す**なら ⇒ `HAVING COUNT(*) = (SELECT ... LIMIT 1)` がシンプル&速いことが多い。\n", + "* **ウィンドウ採用**なら ⇒ 中間 CTE 省略の 1 段構成に。\n", + "* **物理対策** ⇒ `customer_number` にインデックス。\n" + ] + } + ], + "metadata": { + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/SQL/Leetcode/Basic select/586. Customer Placing the Largest Number of Orders/gpt/Customer_Placing_the_Largest_Number_of_Orders_pandas.ipynb b/SQL/Leetcode/Basic select/586. Customer Placing the Largest Number of Orders/gpt/Customer_Placing_the_Largest_Number_of_Orders_pandas.ipynb new file mode 100644 index 00000000..008ca8b5 --- /dev/null +++ b/SQL/Leetcode/Basic select/586. Customer Placing the Largest Number of Orders/gpt/Customer_Placing_the_Largest_Number_of_Orders_pandas.ipynb @@ -0,0 +1,219 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "c14dd460", + "metadata": {}, + "source": [ + "# Pandas 2.2.2用\n", + "\n", + "## 0) 前提\n", + "\n", + "* 環境: **Python 3.10.15 / pandas 2.2.2**\n", + "* **指定シグネチャ厳守**(関数名・引数名・返却列・順序)\n", + "* I/O 禁止、不要な `print` や `sort_values` 禁止\n", + "\n", + "## 1) 問題\n", + "\n", + "* `Orders` の中で **最も多く注文を行った顧客の `customer_number`** を返す\n", + " *(Follow up: 複数顧客が同数で最大の場合は全員返す)*\n", + "* 入力 DF: `Orders`(列: `order_number`, `customer_number`)\n", + "* 出力: 列 `customer_number` のみ(順序は任意、重複なし)\n", + "\n", + "## 2) 実装(指定シグネチャ厳守)\n", + "\n", + "> 列最小化 → 集約(`groupby.size`)→ 最大件数で抽出。`sort_values` 不要。\n", + "\n", + "```python\n", + "import pandas as pd\n", + "\n", + "def find_customer_with_most_orders(orders: pd.DataFrame) -> pd.DataFrame:\n", + " \"\"\"\n", + " Returns:\n", + " pd.DataFrame: 列名と順序は ['customer_number']\n", + " \"\"\"\n", + " # 列最小化(堅牢性のため NULL は除外)\n", + " base = orders[['customer_number']].dropna(subset=['customer_number'])\n", + "\n", + " if base.empty:\n", + " # 入力が空 or すべて NULL の場合は仕様列のみの空DFを返す\n", + " return pd.DataFrame(columns=['customer_number'])\n", + "\n", + " # customer_number ごとに件数を計上(ソートしない)\n", + " cnt = base.groupby('customer_number', as_index=False).size()\n", + "\n", + " # 最大件数\n", + " max_cnt = cnt['size'].max()\n", + "\n", + " # 最大件数に一致する顧客のみを抽出(同数最大にも対応)\n", + " out = cnt.loc[cnt['size'].eq(max_cnt), ['customer_number']].reset_index(drop=True)\n", + "\n", + " return out\n", + "\n", + "Analyze Complexity\n", + "Runtime 353 ms\n", + "Beats 6.62%\n", + "Memory 67.14 MB\n", + "Beats 65.39%\n", + "\n", + "```\n", + "\n", + "* 返却列は **`customer_number`** のみ。\n", + "* `sort_values` を使わずに最大比較で抽出しているため、順序は任意条件を満たします。\n", + "\n", + "## 3) アルゴリズム説明\n", + "\n", + "* 使用 API\n", + "\n", + " * `DataFrame.dropna(subset=...)`: 不要な NULL 行の除外\n", + " * `DataFrame.groupby(...).size()`: グループ件数の軽量集計\n", + " * ブールインデクシング(`eq`): 最大件数一致でのフィルタ\n", + " * `reset_index(drop=True)`: 返却整形(列順は `customer_number` のみ)\n", + "* **NULL / 重複 / 型**\n", + "\n", + " * `customer_number` が NULL の行は集計対象外(SQL 仕様を意識した前処理)\n", + " * `groupby.size()` は重複を自然に件数に含める(`order_number` の一意性に依存しない)\n", + " * 返却はユニーク顧客のみで、型は元列に準拠\n", + "\n", + "## 4) 計算量(概算)\n", + "\n", + "* `groupby.size`: **O(N)**(ハッシュ集計想定、グループ数を M とするとメモリは O(M))\n", + "* 最大値比較・フィルタ: **O(M)**\n", + "\n", + "## 5) 図解(Mermaid 超保守版)\n", + "\n", + "```mermaid\n", + "flowchart TD\n", + " A[Orders] --> B[列最小化 customer_number のみ]\n", + " B --> C[groupby.size で件数集計]\n", + " C --> D[最大件数を算出]\n", + " D --> E[最大件数に一致する行を抽出]\n", + " E --> F[出力 customer_number]\n", + "```\n", + "\n", + "さらに速くするなら、**`groupby.size()` をやめて “1 列の Series に絞り込み → ソートなし `value_counts` or `numpy.bincount`”** が効きます。特に **`factorize + bincount`** は Pandas 2.2 系でかなり速いです。\n", + "\n", + "以下、**同じシグネチャ**で置き換え候補を 2 つ提示します。\n", + "\n", + "---\n", + "\n", + "## 改訂版A(汎用・簡潔:`value_counts(sort=False)`)\n", + "\n", + "```python\n", + "import pandas as pd\n", + "\n", + "def find_customer_with_most_orders(orders: pd.DataFrame) -> pd.DataFrame:\n", + " \"\"\"\n", + " Returns:\n", + " pd.DataFrame: ['customer_number'] のみ。最大同率が複数でも全件返す\n", + " \"\"\"\n", + " # 1列のSeriesに限定(DataFrameの中間コピーを避ける)\n", + " s = orders['customer_number'].dropna()\n", + " if s.empty:\n", + " return pd.DataFrame(columns=['customer_number'])\n", + "\n", + " # ソートを完全にスキップ(デフォはソートありで遅い)\n", + " vc = s.value_counts(sort=False) # index=顧客ID, values=件数(未ソート)\n", + " mx = int(vc.max())\n", + " winners = vc.index[vc.values == mx]\n", + "\n", + " # 仕様列のみ\n", + " return pd.DataFrame({'customer_number': winners})\n", + "\n", + "Analyze Complexity\n", + "Runtime 276 ms\n", + "Beats 80.46%\n", + "Memory 66.80 MB\n", + "Beats 90.95%\n", + "\n", + "```\n", + "\n", + "**ポイント**\n", + "\n", + "* `value_counts(sort=False)` で**内部ソートを回避**(ここが効きます)\n", + "* 1 列の Series だけ扱うことで**メモリアロケーション削減**\n", + "* タイは `== mx` で一括抽出(`sort_values` 不要)\n", + "\n", + "---\n", + "\n", + "## 改訂版B(最速志向:`factorize + numpy.bincount`)\n", + "\n", + "> 入力の `customer_number` が **任意の型(int/str 混在でも可)**でも動きます。内部で符号化(コード化)してから **`np.bincount`** で超高速カウントします。\n", + "\n", + "```python\n", + "import pandas as pd\n", + "import numpy as np\n", + "\n", + "def find_customer_with_most_orders(orders: pd.DataFrame) -> pd.DataFrame:\n", + " \"\"\"\n", + " Returns:\n", + " pd.DataFrame: ['customer_number'] のみ。最大同率が複数でも全件返す\n", + " \"\"\"\n", + " s = orders['customer_number'].dropna()\n", + " if s.empty:\n", + " return pd.DataFrame(columns=['customer_number'])\n", + "\n", + " # factorize: 値 -> 連番コード(0..K-1), uniquesは元の値\n", + " codes, uniques = pd.factorize(s, sort=False) # sort=False で追加ソート回避\n", + " # 連番コードの頻度を一気に数える(最速)\n", + " cnt = np.bincount(codes) # shape=(K,)\n", + " mx = int(cnt.max())\n", + " winner_pos = np.flatnonzero(cnt == mx) # 同率最大のコード位置\n", + " winners = uniques.take(winner_pos) # 元の顧客番号へ復元\n", + "\n", + " return pd.DataFrame({'customer_number': winners})\n", + "\n", + "Analyze Complexity\n", + "Runtime 280 ms\n", + "Beats 76.08%\n", + "Memory 66.74 MB\n", + "Beats 90.95%\n", + "\n", + "```\n", + "\n", + "**なぜ速い?**\n", + "\n", + "* `factorize` は C 実装のハッシュ化で**ユニーク抽出**が高速\n", + "* その後は **連番整数**への `np.bincount` で**純 NumPy の O(N)** カウント\n", + "* ソートを一切しない(`idxmax` や `nlargest` 不要)\n", + "\n", + "---\n", + "\n", + "## 追加の実務チューニング\n", + "\n", + "* **dtype を軽量化**:`customer_number` が数値なら `Int32` / `int32` へ(オブジェクト列は遅い)\n", + "\n", + " ```python\n", + " # 読み込み時 or 前処理時に(例)\n", + " orders['customer_number'] = pd.to_numeric(orders['customer_number'], errors='coerce').astype('Int32')\n", + " ```\n", + "* **不要列を渡さない**:上位の呼び出し側で `orders[['customer_number']]` にして渡すと更に僅かに有利\n", + "* **欠損が無い前提なら `dropna()` を省略**(分岐コスト削減)\n", + "\n", + "---\n", + "\n", + "## 計算量(改訂版)\n", + "\n", + "* 改訂A:`value_counts(sort=False)` はハッシュ集計で **O(N)**、最大抽出・フィルタは **O(U)**(U=ユニーク顧客数)\n", + "* 改訂B:`factorize` **O(N)** → `bincount` **O(N)** → 最大・比較 **O(U)**\n", + " いずれも**ソートなし**なので、元実装より**安定して低レイテンシ**になりやすいです。\n", + "\n", + "---\n", + "\n", + "## まとめ\n", + "\n", + "* 手軽に速く:**改訂A(`value_counts(sort=False)`)**\n", + "* とにかく速く:**改訂B(`factorize + np.bincount`)**\n", + " 大規模でも伸びがよく、LeetCode などの **Runtime を大幅に縮めやすい**構成です。\n", + "\n" + ] + } + ], + "metadata": { + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/SQL/Leetcode/Basic select/586. Customer Placing the Largest Number of Orders/gpt/Customer_Placing_the_Largest_Number_of_Orders_posgres.ipynb b/SQL/Leetcode/Basic select/586. Customer Placing the Largest Number of Orders/gpt/Customer_Placing_the_Largest_Number_of_Orders_posgres.ipynb new file mode 100644 index 00000000..817d38ac --- /dev/null +++ b/SQL/Leetcode/Basic select/586. Customer Placing the Largest Number of Orders/gpt/Customer_Placing_the_Largest_Number_of_Orders_posgres.ipynb @@ -0,0 +1,165 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "84f78fb1", + "metadata": {}, + "source": [ + "# PostgreSQL 16.6+\n", + "\n", + "## 0) 前提\n", + "\n", + "* エンジン: **PostgreSQL 16.6+**\n", + "* 並び順: 任意(最終結果に `ORDER BY` は付けない)\n", + "* `NOT IN` 回避(`EXISTS` / `LEFT JOIN ... IS NULL` を推奨)\n", + "* 判定は **ID 基準**、表示は仕様どおり\n", + "\n", + "## 1) 問題\n", + "\n", + "* `Orders` から **最も多く注文した顧客の `customer_number`** を返す\n", + " *テスト条件: 最大は一意。Follow up: 最大同率が複数でも全件返す。*\n", + "\n", + "* 入力:\n", + "\n", + " ```text\n", + " Table: Orders\n", + " +-----------------+----------+\n", + " | Column Name | Type |\n", + " +-----------------+----------+\n", + " | order_number | int | -- PK\n", + " | customer_number | int |\n", + " +-----------------+----------+\n", + " ```\n", + "\n", + "* 出力:\n", + "\n", + " ```text\n", + " +-----------------+\n", + " | customer_number |\n", + " +-----------------+\n", + " ```\n", + "\n", + "## 2) 最適解(単一クエリ)\n", + "\n", + "> **集計 → ウィンドウ順位付け**。同率最大にも対応(Follow up 充足)。\n", + "\n", + "```sql\n", + "WITH cnt AS (\n", + " SELECT\n", + " customer_number,\n", + " COUNT(*) AS order_cnt\n", + " FROM Orders\n", + " GROUP BY customer_number\n", + "),\n", + "win AS (\n", + " SELECT\n", + " customer_number,\n", + " DENSE_RANK() OVER (ORDER BY order_cnt DESC) AS rnk\n", + " FROM cnt\n", + ")\n", + "SELECT\n", + " customer_number\n", + "FROM win\n", + "WHERE rnk = 1;\n", + "\n", + "Runtime 229 ms\n", + "Beats 89.86%\n", + "\n", + "```\n", + "\n", + "### 代替(最大が一意の前提で最短)\n", + "\n", + "> 一意最大だけで良いなら、最小コストで済むことが多いです。\n", + "\n", + "```sql\n", + "SELECT customer_number\n", + "FROM Orders\n", + "GROUP BY customer_number\n", + "ORDER BY COUNT(*) DESC\n", + "LIMIT 1;\n", + "\n", + "Runtime 230 ms\n", + "Beats 88.20%\n", + "\n", + "```\n", + "\n", + "### 代替(Follow up:最大同率すべて)\n", + "\n", + "> ウィンドウを使わず、**最大値=サブクエリ**で一致抽出。\n", + "\n", + "```sql\n", + "SELECT customer_number\n", + "FROM Orders\n", + "GROUP BY customer_number\n", + "HAVING COUNT(*) = (\n", + " SELECT COUNT(*) AS mx\n", + " FROM Orders\n", + " GROUP BY customer_number\n", + " ORDER BY mx DESC\n", + " LIMIT 1\n", + ");\n", + "\n", + "Runtime 232 ms\n", + "Beats 84.30%\n", + "\n", + "```\n", + "\n", + "### (参考)LATERAL で「上位 k(ここでは 1)」を直接引く\n", + "\n", + "```sql\n", + "SELECT s.customer_number\n", + "FROM LATERAL (\n", + " SELECT customer_number\n", + " FROM Orders\n", + " GROUP BY customer_number\n", + " ORDER BY COUNT(*) DESC\n", + " LIMIT 1\n", + ") AS s;\n", + "\n", + "Runtime 231 ms\n", + "Beats 86.09%\n", + "\n", + "```\n", + "\n", + "## 3) 要点解説\n", + "\n", + "* **ウィンドウ関数**: `DENSE_RANK() OVER (ORDER BY COUNT(*) DESC)` を使えば、同数最大も自然に拾える。\n", + " 集計結果にだけ順位付けするため、まず `GROUP BY` でデータを縮小してから適用。\n", + "* **代替の `HAVING = (SELECT ... LIMIT 1)`**: ウィンドウ不要で読みやすく、結合も発生しないため軽い計画になりやすいです。\n", + "* **インデックス推奨**: 集計キーに B-tree\n", + "\n", + " ```sql\n", + " CREATE INDEX IF NOT EXISTS idx_orders_customer ON Orders (customer_number);\n", + " ```\n", + "\n", + " `GROUP BY customer_number` のハッシュ集計/ソートが効率化されます。\n", + "* **NULL 取扱い**: 仕様上 `customer_number` が必須なら問題なし。NULL 行が混入の可能性がある場合は\n", + " `WHERE customer_number IS NOT NULL` を前置きして堅牢化。\n", + "\n", + "## 4) 計算量(概算)\n", + "\n", + "* `GROUP BY`(顧客数を M、全行数を N とすると): **O(N)**~**O(N log N)**\n", + " (ハッシュ集計なら近似 O(N))\n", + "* ウィンドウ `DENSE_RANK` は *集計後* の M 行に対して **O(M log M)**(内部ソート)。\n", + " 代替の `HAVING = (SELECT ... LIMIT 1)` は **O(M)** 近似。\n", + "\n", + "## 5) 図解(Mermaid 超保守版)\n", + "\n", + "```mermaid\n", + "flowchart TD\n", + " A[Orders] --> B[\"customer_number ごとに COUNT 集計\"]\n", + " B --> C[\"ウィンドウ DENSE_RANK で順位付け\"]\n", + " C --> D[rnk=1 を抽出]\n", + " D --> E[出力 customer_number]\n", + "```\n" + ] + } + ], + "metadata": { + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}