Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -102,4 +102,4 @@ credentials.json
# Node.js関連
node_modules/
package-lock.json
bun.lock
bun.lock
2 changes: 1 addition & 1 deletion .python-version
Original file line number Diff line number Diff line change
@@ -1 +1 @@
3.12.11
3.12.11
Original file line number Diff line number Diff line change
@@ -0,0 +1,319 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# MySQL 8.0.40\n",
"\n",
"## 0) 前提\n",
"\n",
"* エンジン: **MySQL 8**\n",
"* 並び順: 任意(`ORDER BY` を付けない)\n",
" ※本問題は仕様で **`visit_date` 昇順** を要求 → 最終行に `ORDER BY visit_date`\n",
"* `NOT IN` は NULL 罠のため回避\n",
"* 判定は **ID 基準**(連続 ID かつ各行 `people >= 100`)、表示は仕様どおりの列名と順序\n",
"\n",
"## 1) 問題\n",
"\n",
"* `3 つ以上の連続した id を持ち、各行の people >= 100 のレコードを表示する。結果は visit_date 昇順。`\n",
"* 入力テーブル例: `Stadium(id INT, visit_date DATE, people INT)`\n",
"* 出力仕様: `id, visit_date, people` を **連続 ID の島(gaps-and-islands)** のうち長さ ≥ 3 の行のみ。最終並びは `visit_date ASC`。\n",
"\n",
"## 2) 最適解(単一クエリ)\n",
"\n",
"> people ≥ 100 を先に絞り込み、`id - ROW_NUMBER()` で **連続 ID の島キー** を作り、長さ ≥ 3 の島だけ残す。\n",
"\n",
"```sql\n",
"WITH pre AS (\n",
" SELECT id, visit_date, people\n",
" FROM Stadium\n",
" WHERE people >= 100\n",
"),\n",
"grp AS (\n",
" SELECT\n",
" id,\n",
" visit_date,\n",
" people,\n",
" id - ROW_NUMBER() OVER (ORDER BY id) AS grp_key\n",
" FROM pre\n",
"),\n",
"big_islands AS (\n",
" SELECT grp_key\n",
" FROM grp\n",
" GROUP BY grp_key\n",
" HAVING COUNT(*) >= 3\n",
")\n",
"SELECT\n",
" g.id,\n",
" g.visit_date,\n",
" g.people\n",
"FROM grp AS g\n",
"JOIN big_islands AS b\n",
" USING (grp_key)\n",
"ORDER BY g.visit_date;\n",
"\n",
"Runtime 352 ms\n",
"Beats 56.94%\n",
"\n",
"```\n",
"\n",
"## 3) 代替解\n",
"\n",
"> `LAG` を使って「連続しているか」をフラグ化し、累積和で島を採番する方法。\n",
"\n",
"```sql\n",
"WITH pre AS (\n",
" SELECT id, visit_date, people\n",
" FROM Stadium\n",
" WHERE people >= 100\n",
"),\n",
"marked AS (\n",
" SELECT\n",
" id,\n",
" visit_date,\n",
" people,\n",
" CASE WHEN id = LAG(id) OVER (ORDER BY id) + 1 THEN 0 ELSE 1 END AS is_break\n",
" FROM pre\n",
"),\n",
"islands AS (\n",
" SELECT\n",
" id,\n",
" visit_date,\n",
" people,\n",
" SUM(is_break) OVER (ORDER BY id) AS grp_key\n",
" FROM marked\n",
"),\n",
"big_islands AS (\n",
" SELECT grp_key\n",
" FROM islands\n",
" GROUP BY grp_key\n",
" HAVING COUNT(*) >= 3\n",
")\n",
"SELECT i.id, i.visit_date, i.people\n",
"FROM islands AS i\n",
"JOIN big_islands AS b USING (grp_key)\n",
"ORDER BY i.visit_date;\n",
"\n",
"Runtime 335 ms\n",
"Beats 76.87%\n",
"\n",
"```\n",
"\n",
"## 4) 要点解説\n",
"\n",
"* **判定基準は ID の連続**:日付は連続でなくてよい(問題文のとおり)。\n",
"* **Gaps-and-Islands パターン**:`id - ROW_NUMBER()` が同じ値の集合は ID が連番の「島」になる。\n",
"* 先に `people >= 100` を絞ることでウィンドウ行数を縮小し、高速化。\n",
"* `NOT IN` 不使用。結合は `JOIN ... USING (grp_key)` を採用。\n",
"* 並び順は仕様に従い **`visit_date ASC`**。\n",
"\n",
"## 5) 計算量(概算)\n",
"\n",
"* フィルタ後レコード数を `n` とすると:\n",
"\n",
" * ウィンドウ関数(`ROW_NUMBER` / `LAG`): **O(n log n)**(`ORDER BY id`)\n",
" * `GROUP BY grp_key`: **O(n)**〜**O(n log n)**\n",
" * 結合: **O(n)** 近似\n",
"* インデックス推奨: `PRIMARY KEY(id)` / `INDEX(people)`(`people >= 100` の選択度が高いほど効く)\n",
"\n",
"## 6) 図解(Mermaid 超保守版)\n",
"\n",
"```mermaid\n",
"flowchart TD\n",
" A[入力 Stadium] --> B[前処理 people >= 100]\n",
" B --> C[ウィンドウ id - ROW_NUMBER で島キー]\n",
" C --> D[島ごとに COUNT>=3 を抽出]\n",
" D --> E[該当島と結合して投影]\n",
" E --> F[visit_date 昇順で出力]\n",
"```\n",
"\n",
"まだ少しだけ速く・シンプルにできます。主な改善点は **`big_islands` との結合をやめて、ウィンドウ `COUNT()` で島の長さを直接フィルタ**することと、**適切なインデックス**です。\n",
"\n",
"---\n",
"\n",
"## 改善版(JOIN 排除・1 回のスキャンで判定)\n",
"\n",
"```sql\n",
"WITH pre AS (\n",
" SELECT id, visit_date, people\n",
" FROM Stadium\n",
" WHERE people >= 100\n",
"),\n",
"grp AS (\n",
" SELECT\n",
" id,\n",
" visit_date,\n",
" people,\n",
" id - ROW_NUMBER() OVER (ORDER BY id) AS grp_key\n",
" FROM pre\n",
")\n",
"SELECT id, visit_date, people\n",
"FROM (\n",
" SELECT\n",
" g.*,\n",
" COUNT(*) OVER (PARTITION BY grp_key) AS island_len\n",
" FROM grp AS g\n",
") x\n",
"WHERE island_len >= 3\n",
"ORDER BY visit_date;\n",
"\n",
"Runtime 353 ms\n",
"Beats 55.35%\n",
"\n",
"```\n",
"\n",
"**ポイント**\n",
"\n",
"* `big_islands` と `JOIN` を削除 → マテリアライズや結合コストを削減\n",
"* 同一 `grp_key`(連番の島)内の行数を `COUNT(*) OVER (PARTITION BY grp_key)` で算出し、外側で `WHERE island_len >= 3`\n",
"* 可読性も向上\n",
"\n",
"実行計画上は「`ORDER BY id` のウィンドウ → `PARTITION BY grp_key` のウィンドウ → 最終フィルタ」の二段で済みます。\n",
"\n",
"---\n",
"\n",
"## 代替の等価書き換え(`ROW_NUMBER` を 1 回に)\n",
"\n",
"MySQL は同一 SELECT 句でエイリアスを別のウィンドウ関数の `PARTITION BY` に直接使えないため、上のように 2 段に分けます。もし 1 段に詰めたい場合は、CTE を 1 個にして派生表で包むのが最小です。\n",
"\n",
"```sql\n",
"SELECT id, visit_date, people\n",
"FROM (\n",
" SELECT\n",
" id,\n",
" visit_date,\n",
" people,\n",
" COUNT(*) OVER (PARTITION BY (id - ROW_NUMBER() OVER (ORDER BY id))) AS island_len\n",
" FROM Stadium\n",
" WHERE people >= 100\n",
") t\n",
"WHERE island_len >= 3\n",
"ORDER BY visit_date;\n",
"\n",
"Error\n",
"0 / 15 testcases passed\n",
"You cannot nest a window function in the specification of window '<unnamed window>'.\n",
"```\n",
"\n",
"> ただし上記は一部バージョンでオプティマイザが式の再計算を増やす可能性があるため、安定運用なら **CTE 2 段**(前掲の改善版)を推奨します。\n",
"\n",
"---\n",
"\n",
"## インデックス最適化\n",
"\n",
"フィルタが `people >= 100`、ウィンドウが `ORDER BY id`、出力で `visit_date` を返すため、次を推奨します。\n",
"\n",
"```sql\n",
"-- people で範囲抽出しつつ id の順序性を活かす\n",
"CREATE INDEX ix_stadium_people_id ON Stadium (people, id);\n",
"\n",
"-- さらにカバリングさせたいなら(ストレージと更新コストと相談)\n",
"CREATE INDEX ix_stadium_people_id_date ON Stadium (people, id, visit_date);\n",
"```\n",
"\n",
"効果:\n",
"\n",
"* `pre` で `people` 条件の範囲スキャン\n",
"* そのまま `id` 昇順の並びを得やすく、`ROW_NUMBER() OVER (ORDER BY id)` のソートコストを低減\n",
"* 最終 `ORDER BY visit_date` は別ソートになりますが、対象行は **島長 ≥ 3** に絞られているためコストは小さくなります\n",
" (要件的には `visit_date ASC` 必須ですが、仕様上「id ↑ ⇒ date ↑」なので、許容される環境なら `ORDER BY id` で等価にできます)\n",
"\n",
"---\n",
"\n",
"## 追加の微調整\n",
"\n",
"* データ量が少ない/中程度なら現状でも十分。大規模(数百万行〜)なら統計更新と `ANALYZE TABLE Stadium;` を適宜実施。\n",
"* CTE は MySQL 8 では多くの場合インライン化されますが、環境によっては派生表のマテリアライズが起きます。実行計画を見て重い場合は **派生表に `/*+ NO_MERGE() */` / `/*+ MERGE() */` ヒント**の検討(バージョン依存)も。\n",
"\n",
"---\n",
"\n",
"## まとめ\n",
"\n",
"* **JOIN を外し、ウィンドウ `COUNT()` で直接フィルタ**:短く速く\n",
"* **`(people, id[, visit_date])` の複合インデックス**:ソート・走査コスト削減\n",
"* これで一般に **数〜十数 % 程度の短縮**が見込めます(データ分布とバージョン次第)\n",
"\n",
"MySQL 8 は **ウィンドウ関数の“入れ子”を禁止**しており、`PARTITION BY (id - ROW_NUMBER() OVER (...))` のような書き方はできません。そのため、**`ROW_NUMBER()` を先に別レイヤーで計算してから**、外側で `COUNT() OVER (PARTITION BY ...)` を使う形に分解してください。\n",
"\n",
"## 動く修正版(派生表2段でネスト回避)\n",
"\n",
"```sql\n",
"SELECT id, visit_date, people\n",
"FROM (\n",
" SELECT\n",
" t.*,\n",
" COUNT(*) OVER (PARTITION BY (id - rn)) AS island_len\n",
" FROM (\n",
" SELECT\n",
" id,\n",
" visit_date,\n",
" people,\n",
" ROW_NUMBER() OVER (ORDER BY id) AS rn\n",
" FROM Stadium\n",
" WHERE people >= 100\n",
" ) AS t\n",
") AS x\n",
"WHERE island_len >= 3\n",
"ORDER BY visit_date;\n",
"\n",
"Runtime 332 ms\n",
"Beats 80.65%\n",
"\n",
"```\n",
"\n",
"* 内側:`ROW_NUMBER()` を `rn` として計算\n",
"* 中間:`grp_key = id - rn` を式で作る(ここでは単なる通常列演算)\n",
"* 外側:`COUNT(*) OVER (PARTITION BY (id - rn))` で島の長さを算出して `>= 3` を抽出\n",
"\n",
"> ポイント:**ウィンドウ関数の引数や `PARTITION BY` 式の中に別のウィンドウ関数を置かない**こと。必ず一段外に出してから使う。\n",
"\n",
"## CTE 版(読みやすさ重視・推奨)\n",
"\n",
"```sql\n",
"WITH pre AS (\n",
" SELECT id, visit_date, people\n",
" FROM Stadium\n",
" WHERE people >= 100\n",
"),\n",
"grp AS (\n",
" SELECT\n",
" id,\n",
" visit_date,\n",
" people,\n",
" id - ROW_NUMBER() OVER (ORDER BY id) AS grp_key\n",
" FROM pre\n",
")\n",
"SELECT id, visit_date, people\n",
"FROM (\n",
" SELECT\n",
" g.*,\n",
" COUNT(*) OVER (PARTITION BY grp_key) AS island_len\n",
" FROM grp AS g\n",
") x\n",
"WHERE island_len >= 3\n",
"ORDER BY visit_date;\n",
"```\n",
"\n",
"こちらは既にご提案済みの「JOIN 省略版」で、**ネストなし**・可読性良好です。\n",
"\n",
"---\n",
"\n",
"### 参考メモ\n",
"\n",
"* MySQL 8 の制約:`You cannot nest a window function in the specification of window ...`\n",
" → **サブクエリ(派生表 or CTE)で段階計算**が定石です。\n",
"* パフォーマンス面では、どちらの書き方も**結合を無くし、スキャン回数を減らせる**ため、先の `big_islands` 版より有利になりやすいです。\n",
"* 追加最適化:`CREATE INDEX ix_stadium_people_id ON Stadium(people, id);` は引き続き有効です。\n",
"\n"
]
}
],
"metadata": {
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Loading