|
| 1 | +{ |
| 2 | + "cells": [ |
| 3 | + { |
| 4 | + "cell_type": "markdown", |
| 5 | + "metadata": {}, |
| 6 | + "source": [ |
| 7 | + "# MySQL 8.0.40\n", |
| 8 | + "\n", |
| 9 | + "## 0) 前提\n", |
| 10 | + "\n", |
| 11 | + "* エンジン: **MySQL 8**\n", |
| 12 | + "* 並び順: 任意(`ORDER BY` を付けない)\n", |
| 13 | + " ※本問題は仕様で **`visit_date` 昇順** を要求 → 最終行に `ORDER BY visit_date`\n", |
| 14 | + "* `NOT IN` は NULL 罠のため回避\n", |
| 15 | + "* 判定は **ID 基準**(連続 ID かつ各行 `people >= 100`)、表示は仕様どおりの列名と順序\n", |
| 16 | + "\n", |
| 17 | + "## 1) 問題\n", |
| 18 | + "\n", |
| 19 | + "* `3 つ以上の連続した id を持ち、各行の people >= 100 のレコードを表示する。結果は visit_date 昇順。`\n", |
| 20 | + "* 入力テーブル例: `Stadium(id INT, visit_date DATE, people INT)`\n", |
| 21 | + "* 出力仕様: `id, visit_date, people` を **連続 ID の島(gaps-and-islands)** のうち長さ ≥ 3 の行のみ。最終並びは `visit_date ASC`。\n", |
| 22 | + "\n", |
| 23 | + "## 2) 最適解(単一クエリ)\n", |
| 24 | + "\n", |
| 25 | + "> people ≥ 100 を先に絞り込み、`id - ROW_NUMBER()` で **連続 ID の島キー** を作り、長さ ≥ 3 の島だけ残す。\n", |
| 26 | + "\n", |
| 27 | + "```sql\n", |
| 28 | + "WITH pre AS (\n", |
| 29 | + " SELECT id, visit_date, people\n", |
| 30 | + " FROM Stadium\n", |
| 31 | + " WHERE people >= 100\n", |
| 32 | + "),\n", |
| 33 | + "grp AS (\n", |
| 34 | + " SELECT\n", |
| 35 | + " id,\n", |
| 36 | + " visit_date,\n", |
| 37 | + " people,\n", |
| 38 | + " id - ROW_NUMBER() OVER (ORDER BY id) AS grp_key\n", |
| 39 | + " FROM pre\n", |
| 40 | + "),\n", |
| 41 | + "big_islands AS (\n", |
| 42 | + " SELECT grp_key\n", |
| 43 | + " FROM grp\n", |
| 44 | + " GROUP BY grp_key\n", |
| 45 | + " HAVING COUNT(*) >= 3\n", |
| 46 | + ")\n", |
| 47 | + "SELECT\n", |
| 48 | + " g.id,\n", |
| 49 | + " g.visit_date,\n", |
| 50 | + " g.people\n", |
| 51 | + "FROM grp AS g\n", |
| 52 | + "JOIN big_islands AS b\n", |
| 53 | + " USING (grp_key)\n", |
| 54 | + "ORDER BY g.visit_date;\n", |
| 55 | + "\n", |
| 56 | + "Runtime 352 ms\n", |
| 57 | + "Beats 56.94%\n", |
| 58 | + "\n", |
| 59 | + "```\n", |
| 60 | + "\n", |
| 61 | + "## 3) 代替解\n", |
| 62 | + "\n", |
| 63 | + "> `LAG` を使って「連続しているか」をフラグ化し、累積和で島を採番する方法。\n", |
| 64 | + "\n", |
| 65 | + "```sql\n", |
| 66 | + "WITH pre AS (\n", |
| 67 | + " SELECT id, visit_date, people\n", |
| 68 | + " FROM Stadium\n", |
| 69 | + " WHERE people >= 100\n", |
| 70 | + "),\n", |
| 71 | + "marked AS (\n", |
| 72 | + " SELECT\n", |
| 73 | + " id,\n", |
| 74 | + " visit_date,\n", |
| 75 | + " people,\n", |
| 76 | + " CASE WHEN id = LAG(id) OVER (ORDER BY id) + 1 THEN 0 ELSE 1 END AS is_break\n", |
| 77 | + " FROM pre\n", |
| 78 | + "),\n", |
| 79 | + "islands AS (\n", |
| 80 | + " SELECT\n", |
| 81 | + " id,\n", |
| 82 | + " visit_date,\n", |
| 83 | + " people,\n", |
| 84 | + " SUM(is_break) OVER (ORDER BY id) AS grp_key\n", |
| 85 | + " FROM marked\n", |
| 86 | + "),\n", |
| 87 | + "big_islands AS (\n", |
| 88 | + " SELECT grp_key\n", |
| 89 | + " FROM islands\n", |
| 90 | + " GROUP BY grp_key\n", |
| 91 | + " HAVING COUNT(*) >= 3\n", |
| 92 | + ")\n", |
| 93 | + "SELECT i.id, i.visit_date, i.people\n", |
| 94 | + "FROM islands AS i\n", |
| 95 | + "JOIN big_islands AS b USING (grp_key)\n", |
| 96 | + "ORDER BY i.visit_date;\n", |
| 97 | + "\n", |
| 98 | + "Runtime 335 ms\n", |
| 99 | + "Beats 76.87%\n", |
| 100 | + "\n", |
| 101 | + "```\n", |
| 102 | + "\n", |
| 103 | + "## 4) 要点解説\n", |
| 104 | + "\n", |
| 105 | + "* **判定基準は ID の連続**:日付は連続でなくてよい(問題文のとおり)。\n", |
| 106 | + "* **Gaps-and-Islands パターン**:`id - ROW_NUMBER()` が同じ値の集合は ID が連番の「島」になる。\n", |
| 107 | + "* 先に `people >= 100` を絞ることでウィンドウ行数を縮小し、高速化。\n", |
| 108 | + "* `NOT IN` 不使用。結合は `JOIN ... USING (grp_key)` を採用。\n", |
| 109 | + "* 並び順は仕様に従い **`visit_date ASC`**。\n", |
| 110 | + "\n", |
| 111 | + "## 5) 計算量(概算)\n", |
| 112 | + "\n", |
| 113 | + "* フィルタ後レコード数を `n` とすると:\n", |
| 114 | + "\n", |
| 115 | + " * ウィンドウ関数(`ROW_NUMBER` / `LAG`): **O(n log n)**(`ORDER BY id`)\n", |
| 116 | + " * `GROUP BY grp_key`: **O(n)**〜**O(n log n)**\n", |
| 117 | + " * 結合: **O(n)** 近似\n", |
| 118 | + "* インデックス推奨: `PRIMARY KEY(id)` / `INDEX(people)`(`people >= 100` の選択度が高いほど効く)\n", |
| 119 | + "\n", |
| 120 | + "## 6) 図解(Mermaid 超保守版)\n", |
| 121 | + "\n", |
| 122 | + "```mermaid\n", |
| 123 | + "flowchart TD\n", |
| 124 | + " A[入力 Stadium] --> B[前処理 people >= 100]\n", |
| 125 | + " B --> C[ウィンドウ id - ROW_NUMBER で島キー]\n", |
| 126 | + " C --> D[島ごとに COUNT>=3 を抽出]\n", |
| 127 | + " D --> E[該当島と結合して投影]\n", |
| 128 | + " E --> F[visit_date 昇順で出力]\n", |
| 129 | + "```\n", |
| 130 | + "\n", |
| 131 | + "まだ少しだけ速く・シンプルにできます。主な改善点は **`big_islands` との結合をやめて、ウィンドウ `COUNT()` で島の長さを直接フィルタ**することと、**適切なインデックス**です。\n", |
| 132 | + "\n", |
| 133 | + "---\n", |
| 134 | + "\n", |
| 135 | + "## 改善版(JOIN 排除・1 回のスキャンで判定)\n", |
| 136 | + "\n", |
| 137 | + "```sql\n", |
| 138 | + "WITH pre AS (\n", |
| 139 | + " SELECT id, visit_date, people\n", |
| 140 | + " FROM Stadium\n", |
| 141 | + " WHERE people >= 100\n", |
| 142 | + "),\n", |
| 143 | + "grp AS (\n", |
| 144 | + " SELECT\n", |
| 145 | + " id,\n", |
| 146 | + " visit_date,\n", |
| 147 | + " people,\n", |
| 148 | + " id - ROW_NUMBER() OVER (ORDER BY id) AS grp_key\n", |
| 149 | + " FROM pre\n", |
| 150 | + ")\n", |
| 151 | + "SELECT id, visit_date, people\n", |
| 152 | + "FROM (\n", |
| 153 | + " SELECT\n", |
| 154 | + " g.*,\n", |
| 155 | + " COUNT(*) OVER (PARTITION BY grp_key) AS island_len\n", |
| 156 | + " FROM grp AS g\n", |
| 157 | + ") x\n", |
| 158 | + "WHERE island_len >= 3\n", |
| 159 | + "ORDER BY visit_date;\n", |
| 160 | + "\n", |
| 161 | + "Runtime 353 ms\n", |
| 162 | + "Beats 55.35%\n", |
| 163 | + "\n", |
| 164 | + "```\n", |
| 165 | + "\n", |
| 166 | + "**ポイント**\n", |
| 167 | + "\n", |
| 168 | + "* `big_islands` と `JOIN` を削除 → マテリアライズや結合コストを削減\n", |
| 169 | + "* 同一 `grp_key`(連番の島)内の行数を `COUNT(*) OVER (PARTITION BY grp_key)` で算出し、外側で `WHERE island_len >= 3`\n", |
| 170 | + "* 可読性も向上\n", |
| 171 | + "\n", |
| 172 | + "実行計画上は「`ORDER BY id` のウィンドウ → `PARTITION BY grp_key` のウィンドウ → 最終フィルタ」の二段で済みます。\n", |
| 173 | + "\n", |
| 174 | + "---\n", |
| 175 | + "\n", |
| 176 | + "## 代替の等価書き換え(`ROW_NUMBER` を 1 回に)\n", |
| 177 | + "\n", |
| 178 | + "MySQL は同一 SELECT 句でエイリアスを別のウィンドウ関数の `PARTITION BY` に直接使えないため、上のように 2 段に分けます。もし 1 段に詰めたい場合は、CTE を 1 個にして派生表で包むのが最小です。\n", |
| 179 | + "\n", |
| 180 | + "```sql\n", |
| 181 | + "SELECT id, visit_date, people\n", |
| 182 | + "FROM (\n", |
| 183 | + " SELECT\n", |
| 184 | + " id,\n", |
| 185 | + " visit_date,\n", |
| 186 | + " people,\n", |
| 187 | + " COUNT(*) OVER (PARTITION BY (id - ROW_NUMBER() OVER (ORDER BY id))) AS island_len\n", |
| 188 | + " FROM Stadium\n", |
| 189 | + " WHERE people >= 100\n", |
| 190 | + ") t\n", |
| 191 | + "WHERE island_len >= 3\n", |
| 192 | + "ORDER BY visit_date;\n", |
| 193 | + "\n", |
| 194 | + "Error\n", |
| 195 | + "0 / 15 testcases passed\n", |
| 196 | + "You cannot nest a window function in the specification of window '<unnamed window>'.\n", |
| 197 | + "```\n", |
| 198 | + "\n", |
| 199 | + "> ただし上記は一部バージョンでオプティマイザが式の再計算を増やす可能性があるため、安定運用なら **CTE 2 段**(前掲の改善版)を推奨します。\n", |
| 200 | + "\n", |
| 201 | + "---\n", |
| 202 | + "\n", |
| 203 | + "## インデックス最適化\n", |
| 204 | + "\n", |
| 205 | + "フィルタが `people >= 100`、ウィンドウが `ORDER BY id`、出力で `visit_date` を返すため、次を推奨します。\n", |
| 206 | + "\n", |
| 207 | + "```sql\n", |
| 208 | + "-- people で範囲抽出しつつ id の順序性を活かす\n", |
| 209 | + "CREATE INDEX ix_stadium_people_id ON Stadium (people, id);\n", |
| 210 | + "\n", |
| 211 | + "-- さらにカバリングさせたいなら(ストレージと更新コストと相談)\n", |
| 212 | + "CREATE INDEX ix_stadium_people_id_date ON Stadium (people, id, visit_date);\n", |
| 213 | + "```\n", |
| 214 | + "\n", |
| 215 | + "効果:\n", |
| 216 | + "\n", |
| 217 | + "* `pre` で `people` 条件の範囲スキャン\n", |
| 218 | + "* そのまま `id` 昇順の並びを得やすく、`ROW_NUMBER() OVER (ORDER BY id)` のソートコストを低減\n", |
| 219 | + "* 最終 `ORDER BY visit_date` は別ソートになりますが、対象行は **島長 ≥ 3** に絞られているためコストは小さくなります\n", |
| 220 | + " (要件的には `visit_date ASC` 必須ですが、仕様上「id ↑ ⇒ date ↑」なので、許容される環境なら `ORDER BY id` で等価にできます)\n", |
| 221 | + "\n", |
| 222 | + "---\n", |
| 223 | + "\n", |
| 224 | + "## 追加の微調整\n", |
| 225 | + "\n", |
| 226 | + "* データ量が少ない/中程度なら現状でも十分。大規模(数百万行〜)なら統計更新と `ANALYZE TABLE Stadium;` を適宜実施。\n", |
| 227 | + "* CTE は MySQL 8 では多くの場合インライン化されますが、環境によっては派生表のマテリアライズが起きます。実行計画を見て重い場合は **派生表に `/*+ NO_MERGE() */` / `/*+ MERGE() */` ヒント**の検討(バージョン依存)も。\n", |
| 228 | + "\n", |
| 229 | + "---\n", |
| 230 | + "\n", |
| 231 | + "## まとめ\n", |
| 232 | + "\n", |
| 233 | + "* **JOIN を外し、ウィンドウ `COUNT()` で直接フィルタ**:短く速く\n", |
| 234 | + "* **`(people, id[, visit_date])` の複合インデックス**:ソート・走査コスト削減\n", |
| 235 | + "* これで一般に **数〜十数 % 程度の短縮**が見込めます(データ分布とバージョン次第)\n", |
| 236 | + "\n", |
| 237 | + "MySQL 8 は **ウィンドウ関数の“入れ子”を禁止**しており、`PARTITION BY (id - ROW_NUMBER() OVER (...))` のような書き方はできません。そのため、**`ROW_NUMBER()` を先に別レイヤーで計算してから**、外側で `COUNT() OVER (PARTITION BY ...)` を使う形に分解してください。\n", |
| 238 | + "\n", |
| 239 | + "## 動く修正版(派生表2段でネスト回避)\n", |
| 240 | + "\n", |
| 241 | + "```sql\n", |
| 242 | + "SELECT id, visit_date, people\n", |
| 243 | + "FROM (\n", |
| 244 | + " SELECT\n", |
| 245 | + " t.*,\n", |
| 246 | + " COUNT(*) OVER (PARTITION BY (id - rn)) AS island_len\n", |
| 247 | + " FROM (\n", |
| 248 | + " SELECT\n", |
| 249 | + " id,\n", |
| 250 | + " visit_date,\n", |
| 251 | + " people,\n", |
| 252 | + " ROW_NUMBER() OVER (ORDER BY id) AS rn\n", |
| 253 | + " FROM Stadium\n", |
| 254 | + " WHERE people >= 100\n", |
| 255 | + " ) AS t\n", |
| 256 | + ") AS x\n", |
| 257 | + "WHERE island_len >= 3\n", |
| 258 | + "ORDER BY visit_date;\n", |
| 259 | + "\n", |
| 260 | + "Runtime 332 ms\n", |
| 261 | + "Beats 80.65%\n", |
| 262 | + "\n", |
| 263 | + "```\n", |
| 264 | + "\n", |
| 265 | + "* 内側:`ROW_NUMBER()` を `rn` として計算\n", |
| 266 | + "* 中間:`grp_key = id - rn` を式で作る(ここでは単なる通常列演算)\n", |
| 267 | + "* 外側:`COUNT(*) OVER (PARTITION BY (id - rn))` で島の長さを算出して `>= 3` を抽出\n", |
| 268 | + "\n", |
| 269 | + "> ポイント:**ウィンドウ関数の引数や `PARTITION BY` 式の中に別のウィンドウ関数を置かない**こと。必ず一段外に出してから使う。\n", |
| 270 | + "\n", |
| 271 | + "## CTE 版(読みやすさ重視・推奨)\n", |
| 272 | + "\n", |
| 273 | + "```sql\n", |
| 274 | + "WITH pre AS (\n", |
| 275 | + " SELECT id, visit_date, people\n", |
| 276 | + " FROM Stadium\n", |
| 277 | + " WHERE people >= 100\n", |
| 278 | + "),\n", |
| 279 | + "grp AS (\n", |
| 280 | + " SELECT\n", |
| 281 | + " id,\n", |
| 282 | + " visit_date,\n", |
| 283 | + " people,\n", |
| 284 | + " id - ROW_NUMBER() OVER (ORDER BY id) AS grp_key\n", |
| 285 | + " FROM pre\n", |
| 286 | + ")\n", |
| 287 | + "SELECT id, visit_date, people\n", |
| 288 | + "FROM (\n", |
| 289 | + " SELECT\n", |
| 290 | + " g.*,\n", |
| 291 | + " COUNT(*) OVER (PARTITION BY grp_key) AS island_len\n", |
| 292 | + " FROM grp AS g\n", |
| 293 | + ") x\n", |
| 294 | + "WHERE island_len >= 3\n", |
| 295 | + "ORDER BY visit_date;\n", |
| 296 | + "```\n", |
| 297 | + "\n", |
| 298 | + "こちらは既にご提案済みの「JOIN 省略版」で、**ネストなし**・可読性良好です。\n", |
| 299 | + "\n", |
| 300 | + "---\n", |
| 301 | + "\n", |
| 302 | + "### 参考メモ\n", |
| 303 | + "\n", |
| 304 | + "* MySQL 8 の制約:`You cannot nest a window function in the specification of window ...`\n", |
| 305 | + " → **サブクエリ(派生表 or CTE)で段階計算**が定石です。\n", |
| 306 | + "* パフォーマンス面では、どちらの書き方も**結合を無くし、スキャン回数を減らせる**ため、先の `big_islands` 版より有利になりやすいです。\n", |
| 307 | + "* 追加最適化:`CREATE INDEX ix_stadium_people_id ON Stadium(people, id);` は引き続き有効です。\n", |
| 308 | + "\n" |
| 309 | + ] |
| 310 | + } |
| 311 | + ], |
| 312 | + "metadata": { |
| 313 | + "language_info": { |
| 314 | + "name": "python" |
| 315 | + } |
| 316 | + }, |
| 317 | + "nbformat": 4, |
| 318 | + "nbformat_minor": 2 |
| 319 | +} |
0 commit comments