|
| 1 | +{ |
| 2 | + "cells": [ |
| 3 | + { |
| 4 | + "cell_type": "markdown", |
| 5 | + "id": "bc30cee6", |
| 6 | + "metadata": {}, |
| 7 | + "source": [ |
| 8 | + "# MySQL 8.0.40\n", |
| 9 | + "\n", |
| 10 | + "## 0) 前提\n", |
| 11 | + "\n", |
| 12 | + "* エンジン: **MySQL 8**\n", |
| 13 | + "* 並び順: 任意(`ORDER BY` を付けない)\n", |
| 14 | + "* `NOT IN` は NULL 罠のため回避\n", |
| 15 | + "* 判定は **ID 基準**(ここでは `class`)、表示は仕様どおりの列名と順序\n", |
| 16 | + "\n", |
| 17 | + "## 1) 問題\n", |
| 18 | + "\n", |
| 19 | + "* `Courses` から **受講生が5人以上いる class を求める**\n", |
| 20 | + "* 入力テーブル例: `Courses(student, class)`(主キー: `(student, class)`)\n", |
| 21 | + "* 出力仕様: 列は `class` のみ/順序任意/重複なし\n", |
| 22 | + "\n", |
| 23 | + "## 2) 最適解(単一クエリ)\n", |
| 24 | + "\n", |
| 25 | + "> ウィンドウ関数で各クラスの人数を数え、閾値で抽出して最終投影。\n", |
| 26 | + "\n", |
| 27 | + "```sql\n", |
| 28 | + "WITH win AS (\n", |
| 29 | + " SELECT\n", |
| 30 | + " class,\n", |
| 31 | + " COUNT(*) OVER (PARTITION BY class) AS cnt\n", |
| 32 | + " FROM Courses\n", |
| 33 | + ")\n", |
| 34 | + "SELECT DISTINCT\n", |
| 35 | + " class\n", |
| 36 | + "FROM win\n", |
| 37 | + "WHERE cnt >= 5;\n", |
| 38 | + "\n", |
| 39 | + "Runtime 311 ms\n", |
| 40 | + "Beats 58.52%\n", |
| 41 | + "\n", |
| 42 | + "```\n", |
| 43 | + "\n", |
| 44 | + "* `(student, class)` が PK のため同一学生の同一クラス重複は存在せず、`COUNT(*)` で十分\n", |
| 45 | + "* 結果順は任意なので `ORDER BY` なし\n", |
| 46 | + "\n", |
| 47 | + "## 3) 代替解\n", |
| 48 | + "\n", |
| 49 | + "> 単純集約で十分なサイズ・要件なら `GROUP BY ... HAVING` が最軽量。\n", |
| 50 | + "\n", |
| 51 | + "```sql\n", |
| 52 | + "SELECT\n", |
| 53 | + " class\n", |
| 54 | + "FROM Courses\n", |
| 55 | + "GROUP BY class\n", |
| 56 | + "HAVING COUNT(*) >= 5;\n", |
| 57 | + "\n", |
| 58 | + "Runtime 308 ms\n", |
| 59 | + "Beats 62.45%\n", |
| 60 | + "\n", |
| 61 | + "```\n", |
| 62 | + "\n", |
| 63 | + "## 4) 要点解説\n", |
| 64 | + "\n", |
| 65 | + "* **方針**: クラス単位で人数を数え、しきい値(5)以上のみ返す\n", |
| 66 | + "* **NULL / 重複**: 主キー制約により `(student, class)` の重複はなし。`class` 自体が NULL の行がある想定なら、条件側で `class IS NOT NULL` を併記(今回は問題仕様上不要)\n", |
| 67 | + "* **安定性**: 並び順指定なしで I/O を節約\n", |
| 68 | + "\n", |
| 69 | + "## 5) 計算量(概算)\n", |
| 70 | + "\n", |
| 71 | + "* ウィンドウ(最適解): パーティション内で **O(N)**〜**O(N log N)**(実装依存)\n", |
| 72 | + "* 集約(代替解): ハッシュ集約で **O(N)** 近似\n", |
| 73 | + "\n", |
| 74 | + "## 6) 図解(Mermaid 超保守版)\n", |
| 75 | + "\n", |
| 76 | + "```mermaid\n", |
| 77 | + "flowchart TD\n", |
| 78 | + " A[入力 テーブル Courses] --> B[クラス単位で人数カウント]\n", |
| 79 | + " B --> C[人数が5以上を抽出]\n", |
| 80 | + " C --> D[出力 class 列のみ]\n", |
| 81 | + "```\n", |
| 82 | + "結論から言うと、この問題では **`GROUP BY ... HAVING` が最もシンプルで、実務でもまずこれを使います。**\n", |
| 83 | + "ただし速度面をもう一段伸ばす余地はあります。\n", |
| 84 | + "\n", |
| 85 | + "## 速くするための実務的ポイント\n", |
| 86 | + "\n", |
| 87 | + "### 1) 二次インデックスを追加(最有効)\n", |
| 88 | + "\n", |
| 89 | + "`GROUP BY class` の集約を軽くするには **`class` にインデックス**を張るのが一番効きます。\n", |
| 90 | + "\n", |
| 91 | + "```sql\n", |
| 92 | + "-- 目的: クラス単位の集約をインデックス範囲走査で処理\n", |
| 93 | + "CREATE INDEX idx_courses_class ON Courses(class);\n", |
| 94 | + "-- 余力があれば順序付与用に\n", |
| 95 | + "-- CREATE INDEX idx_courses_class_student ON Courses(class, student);\n", |
| 96 | + "```\n", |
| 97 | + "\n", |
| 98 | + "* PK が `(student, class)` なので、そのままだと `class` での集約に不利。\n", |
| 99 | + "* `idx_courses_class` があると MySQL は **インデックス順に走査しながらグループ化**でき、\n", |
| 100 | + " 場合によっては **テンポラリやファイルソートを回避**します(`EXPLAIN` で `Using index for group-by` を目指す)。\n", |
| 101 | + "\n", |
| 102 | + "### 2) クエリは `GROUP BY ... HAVING` を採用\n", |
| 103 | + "\n", |
| 104 | + "ウィンドウ関数版は **同じクラスの行を全て数えてから DISTINCT** するので一手間多く、通常やや不利です。\n", |
| 105 | + "以下で十分最適です。\n", |
| 106 | + "\n", |
| 107 | + "```sql\n", |
| 108 | + "SELECT\n", |
| 109 | + " class\n", |
| 110 | + "FROM Courses\n", |
| 111 | + "GROUP BY class\n", |
| 112 | + "HAVING COUNT(*) >= 5;\n", |
| 113 | + "```\n", |
| 114 | + "\n", |
| 115 | + "### 3) 「存在判定」最適化(インデックスがある前提の代替案)\n", |
| 116 | + "\n", |
| 117 | + "**「5件目が存在するか」だけ**を確かめる相関サブクエリは、クラスごとに **最大5行だけ**見れば良いので、\n", |
| 118 | + "**`(class)` か `(class, student)` インデックス**がある環境では速くなることがあります。\n", |
| 119 | + "\n", |
| 120 | + "```sql\n", |
| 121 | + "-- (class, student) インデックスがあると更に安定\n", |
| 122 | + "SELECT DISTINCT c.class\n", |
| 123 | + "FROM Courses c\n", |
| 124 | + "WHERE EXISTS (\n", |
| 125 | + " SELECT 1\n", |
| 126 | + " FROM Courses i\n", |
| 127 | + " WHERE i.class = c.class\n", |
| 128 | + " ORDER BY i.student\n", |
| 129 | + " LIMIT 4, 1 -- 5件目が取れれば「5人以上」と判定\n", |
| 130 | + ");\n", |
| 131 | + "```\n", |
| 132 | + "\n", |
| 133 | + "> 注意: これは **インデックスの効き**に強く依存します。`EXPLAIN` で内側が `range` / `ref` になっているか確認を。\n", |
| 134 | + "\n", |
| 135 | + "### 4) 実行計画チェック\n", |
| 136 | + "\n", |
| 137 | + "`EXPLAIN` で見るポイント\n", |
| 138 | + "\n", |
| 139 | + "* `type`: `range` / `ref`(全表 `ALL` は避けたい)\n", |
| 140 | + "* `key`: 上記の新インデックスが選ばれているか\n", |
| 141 | + "* `Extra`: `Using index for group-by` が出るとご機嫌\n", |
| 142 | + "\n", |
| 143 | + "## まとめ(提案の優先度)\n", |
| 144 | + "\n", |
| 145 | + "1. ✅ **`CREATE INDEX idx_courses_class (class)` を追加**\n", |
| 146 | + "2. ✅ 本番クエリは **`GROUP BY ... HAVING COUNT(*) >= 5`**\n", |
| 147 | + "3. ⭕ 負荷やデータ分布次第で、**`EXISTS + LIMIT 4,1`** 案を A/B して速い方を採用\n", |
| 148 | + "\n", |
| 149 | + "この3点で、提示の ~310ms からの短縮が十分見込めます。インデックス追加が難しい(LeetCode 等)なら、現状の **`GROUP BY ... HAVING` が最適解**です。\n" |
| 150 | + ] |
| 151 | + } |
| 152 | + ], |
| 153 | + "metadata": { |
| 154 | + "language_info": { |
| 155 | + "name": "python" |
| 156 | + } |
| 157 | + }, |
| 158 | + "nbformat": 4, |
| 159 | + "nbformat_minor": 5 |
| 160 | +} |
0 commit comments