Skip to content

Commit 0397471

Browse files
authored
Merge pull request #192 from myoshi2891/dev-from-macmini
Dev from macmini
2 parents ad741a1 + 2e08c92 commit 0397471

6 files changed

Lines changed: 815 additions & 15 deletions

File tree

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -102,4 +102,4 @@ credentials.json
102102
# Node.js関連
103103
node_modules/
104104
package-lock.json
105-
bun.lock
105+
bun.lock

.python-version

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
3.12.11
1+
3.12.11
Lines changed: 319 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,319 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# MySQL 8.0.40\n",
8+
"\n",
9+
"## 0) 前提\n",
10+
"\n",
11+
"* エンジン: **MySQL 8**\n",
12+
"* 並び順: 任意(`ORDER BY` を付けない)\n",
13+
" ※本問題は仕様で **`visit_date` 昇順** を要求 → 最終行に `ORDER BY visit_date`\n",
14+
"* `NOT IN` は NULL 罠のため回避\n",
15+
"* 判定は **ID 基準**(連続 ID かつ各行 `people >= 100`)、表示は仕様どおりの列名と順序\n",
16+
"\n",
17+
"## 1) 問題\n",
18+
"\n",
19+
"* `3 つ以上の連続した id を持ち、各行の people >= 100 のレコードを表示する。結果は visit_date 昇順。`\n",
20+
"* 入力テーブル例: `Stadium(id INT, visit_date DATE, people INT)`\n",
21+
"* 出力仕様: `id, visit_date, people` を **連続 ID の島(gaps-and-islands)** のうち長さ ≥ 3 の行のみ。最終並びは `visit_date ASC`。\n",
22+
"\n",
23+
"## 2) 最適解(単一クエリ)\n",
24+
"\n",
25+
"> people ≥ 100 を先に絞り込み、`id - ROW_NUMBER()` で **連続 ID の島キー** を作り、長さ ≥ 3 の島だけ残す。\n",
26+
"\n",
27+
"```sql\n",
28+
"WITH pre AS (\n",
29+
" SELECT id, visit_date, people\n",
30+
" FROM Stadium\n",
31+
" WHERE people >= 100\n",
32+
"),\n",
33+
"grp AS (\n",
34+
" SELECT\n",
35+
" id,\n",
36+
" visit_date,\n",
37+
" people,\n",
38+
" id - ROW_NUMBER() OVER (ORDER BY id) AS grp_key\n",
39+
" FROM pre\n",
40+
"),\n",
41+
"big_islands AS (\n",
42+
" SELECT grp_key\n",
43+
" FROM grp\n",
44+
" GROUP BY grp_key\n",
45+
" HAVING COUNT(*) >= 3\n",
46+
")\n",
47+
"SELECT\n",
48+
" g.id,\n",
49+
" g.visit_date,\n",
50+
" g.people\n",
51+
"FROM grp AS g\n",
52+
"JOIN big_islands AS b\n",
53+
" USING (grp_key)\n",
54+
"ORDER BY g.visit_date;\n",
55+
"\n",
56+
"Runtime 352 ms\n",
57+
"Beats 56.94%\n",
58+
"\n",
59+
"```\n",
60+
"\n",
61+
"## 3) 代替解\n",
62+
"\n",
63+
"> `LAG` を使って「連続しているか」をフラグ化し、累積和で島を採番する方法。\n",
64+
"\n",
65+
"```sql\n",
66+
"WITH pre AS (\n",
67+
" SELECT id, visit_date, people\n",
68+
" FROM Stadium\n",
69+
" WHERE people >= 100\n",
70+
"),\n",
71+
"marked AS (\n",
72+
" SELECT\n",
73+
" id,\n",
74+
" visit_date,\n",
75+
" people,\n",
76+
" CASE WHEN id = LAG(id) OVER (ORDER BY id) + 1 THEN 0 ELSE 1 END AS is_break\n",
77+
" FROM pre\n",
78+
"),\n",
79+
"islands AS (\n",
80+
" SELECT\n",
81+
" id,\n",
82+
" visit_date,\n",
83+
" people,\n",
84+
" SUM(is_break) OVER (ORDER BY id) AS grp_key\n",
85+
" FROM marked\n",
86+
"),\n",
87+
"big_islands AS (\n",
88+
" SELECT grp_key\n",
89+
" FROM islands\n",
90+
" GROUP BY grp_key\n",
91+
" HAVING COUNT(*) >= 3\n",
92+
")\n",
93+
"SELECT i.id, i.visit_date, i.people\n",
94+
"FROM islands AS i\n",
95+
"JOIN big_islands AS b USING (grp_key)\n",
96+
"ORDER BY i.visit_date;\n",
97+
"\n",
98+
"Runtime 335 ms\n",
99+
"Beats 76.87%\n",
100+
"\n",
101+
"```\n",
102+
"\n",
103+
"## 4) 要点解説\n",
104+
"\n",
105+
"* **判定基準は ID の連続**:日付は連続でなくてよい(問題文のとおり)。\n",
106+
"* **Gaps-and-Islands パターン**:`id - ROW_NUMBER()` が同じ値の集合は ID が連番の「島」になる。\n",
107+
"* 先に `people >= 100` を絞ることでウィンドウ行数を縮小し、高速化。\n",
108+
"* `NOT IN` 不使用。結合は `JOIN ... USING (grp_key)` を採用。\n",
109+
"* 並び順は仕様に従い **`visit_date ASC`**。\n",
110+
"\n",
111+
"## 5) 計算量(概算)\n",
112+
"\n",
113+
"* フィルタ後レコード数を `n` とすると:\n",
114+
"\n",
115+
" * ウィンドウ関数(`ROW_NUMBER` / `LAG`): **O(n log n)**(`ORDER BY id`)\n",
116+
" * `GROUP BY grp_key`: **O(n)**〜**O(n log n)**\n",
117+
" * 結合: **O(n)** 近似\n",
118+
"* インデックス推奨: `PRIMARY KEY(id)` / `INDEX(people)`(`people >= 100` の選択度が高いほど効く)\n",
119+
"\n",
120+
"## 6) 図解(Mermaid 超保守版)\n",
121+
"\n",
122+
"```mermaid\n",
123+
"flowchart TD\n",
124+
" A[入力 Stadium] --> B[前処理 people >= 100]\n",
125+
" B --> C[ウィンドウ id - ROW_NUMBER で島キー]\n",
126+
" C --> D[島ごとに COUNT>=3 を抽出]\n",
127+
" D --> E[該当島と結合して投影]\n",
128+
" E --> F[visit_date 昇順で出力]\n",
129+
"```\n",
130+
"\n",
131+
"まだ少しだけ速く・シンプルにできます。主な改善点は **`big_islands` との結合をやめて、ウィンドウ `COUNT()` で島の長さを直接フィルタ**することと、**適切なインデックス**です。\n",
132+
"\n",
133+
"---\n",
134+
"\n",
135+
"## 改善版(JOIN 排除・1 回のスキャンで判定)\n",
136+
"\n",
137+
"```sql\n",
138+
"WITH pre AS (\n",
139+
" SELECT id, visit_date, people\n",
140+
" FROM Stadium\n",
141+
" WHERE people >= 100\n",
142+
"),\n",
143+
"grp AS (\n",
144+
" SELECT\n",
145+
" id,\n",
146+
" visit_date,\n",
147+
" people,\n",
148+
" id - ROW_NUMBER() OVER (ORDER BY id) AS grp_key\n",
149+
" FROM pre\n",
150+
")\n",
151+
"SELECT id, visit_date, people\n",
152+
"FROM (\n",
153+
" SELECT\n",
154+
" g.*,\n",
155+
" COUNT(*) OVER (PARTITION BY grp_key) AS island_len\n",
156+
" FROM grp AS g\n",
157+
") x\n",
158+
"WHERE island_len >= 3\n",
159+
"ORDER BY visit_date;\n",
160+
"\n",
161+
"Runtime 353 ms\n",
162+
"Beats 55.35%\n",
163+
"\n",
164+
"```\n",
165+
"\n",
166+
"**ポイント**\n",
167+
"\n",
168+
"* `big_islands` と `JOIN` を削除 → マテリアライズや結合コストを削減\n",
169+
"* 同一 `grp_key`(連番の島)内の行数を `COUNT(*) OVER (PARTITION BY grp_key)` で算出し、外側で `WHERE island_len >= 3`\n",
170+
"* 可読性も向上\n",
171+
"\n",
172+
"実行計画上は「`ORDER BY id` のウィンドウ → `PARTITION BY grp_key` のウィンドウ → 最終フィルタ」の二段で済みます。\n",
173+
"\n",
174+
"---\n",
175+
"\n",
176+
"## 代替の等価書き換え(`ROW_NUMBER` を 1 回に)\n",
177+
"\n",
178+
"MySQL は同一 SELECT 句でエイリアスを別のウィンドウ関数の `PARTITION BY` に直接使えないため、上のように 2 段に分けます。もし 1 段に詰めたい場合は、CTE を 1 個にして派生表で包むのが最小です。\n",
179+
"\n",
180+
"```sql\n",
181+
"SELECT id, visit_date, people\n",
182+
"FROM (\n",
183+
" SELECT\n",
184+
" id,\n",
185+
" visit_date,\n",
186+
" people,\n",
187+
" COUNT(*) OVER (PARTITION BY (id - ROW_NUMBER() OVER (ORDER BY id))) AS island_len\n",
188+
" FROM Stadium\n",
189+
" WHERE people >= 100\n",
190+
") t\n",
191+
"WHERE island_len >= 3\n",
192+
"ORDER BY visit_date;\n",
193+
"\n",
194+
"Error\n",
195+
"0 / 15 testcases passed\n",
196+
"You cannot nest a window function in the specification of window '<unnamed window>'.\n",
197+
"```\n",
198+
"\n",
199+
"> ただし上記は一部バージョンでオプティマイザが式の再計算を増やす可能性があるため、安定運用なら **CTE 2 段**(前掲の改善版)を推奨します。\n",
200+
"\n",
201+
"---\n",
202+
"\n",
203+
"## インデックス最適化\n",
204+
"\n",
205+
"フィルタが `people >= 100`、ウィンドウが `ORDER BY id`、出力で `visit_date` を返すため、次を推奨します。\n",
206+
"\n",
207+
"```sql\n",
208+
"-- people で範囲抽出しつつ id の順序性を活かす\n",
209+
"CREATE INDEX ix_stadium_people_id ON Stadium (people, id);\n",
210+
"\n",
211+
"-- さらにカバリングさせたいなら(ストレージと更新コストと相談)\n",
212+
"CREATE INDEX ix_stadium_people_id_date ON Stadium (people, id, visit_date);\n",
213+
"```\n",
214+
"\n",
215+
"効果:\n",
216+
"\n",
217+
"* `pre` で `people` 条件の範囲スキャン\n",
218+
"* そのまま `id` 昇順の並びを得やすく、`ROW_NUMBER() OVER (ORDER BY id)` のソートコストを低減\n",
219+
"* 最終 `ORDER BY visit_date` は別ソートになりますが、対象行は **島長 ≥ 3** に絞られているためコストは小さくなります\n",
220+
" (要件的には `visit_date ASC` 必須ですが、仕様上「id ↑ ⇒ date ↑」なので、許容される環境なら `ORDER BY id` で等価にできます)\n",
221+
"\n",
222+
"---\n",
223+
"\n",
224+
"## 追加の微調整\n",
225+
"\n",
226+
"* データ量が少ない/中程度なら現状でも十分。大規模(数百万行〜)なら統計更新と `ANALYZE TABLE Stadium;` を適宜実施。\n",
227+
"* CTE は MySQL 8 では多くの場合インライン化されますが、環境によっては派生表のマテリアライズが起きます。実行計画を見て重い場合は **派生表に `/*+ NO_MERGE() */` / `/*+ MERGE() */` ヒント**の検討(バージョン依存)も。\n",
228+
"\n",
229+
"---\n",
230+
"\n",
231+
"## まとめ\n",
232+
"\n",
233+
"* **JOIN を外し、ウィンドウ `COUNT()` で直接フィルタ**:短く速く\n",
234+
"* **`(people, id[, visit_date])` の複合インデックス**:ソート・走査コスト削減\n",
235+
"* これで一般に **数〜十数 % 程度の短縮**が見込めます(データ分布とバージョン次第)\n",
236+
"\n",
237+
"MySQL 8 は **ウィンドウ関数の“入れ子”を禁止**しており、`PARTITION BY (id - ROW_NUMBER() OVER (...))` のような書き方はできません。そのため、**`ROW_NUMBER()` を先に別レイヤーで計算してから**、外側で `COUNT() OVER (PARTITION BY ...)` を使う形に分解してください。\n",
238+
"\n",
239+
"## 動く修正版(派生表2段でネスト回避)\n",
240+
"\n",
241+
"```sql\n",
242+
"SELECT id, visit_date, people\n",
243+
"FROM (\n",
244+
" SELECT\n",
245+
" t.*,\n",
246+
" COUNT(*) OVER (PARTITION BY (id - rn)) AS island_len\n",
247+
" FROM (\n",
248+
" SELECT\n",
249+
" id,\n",
250+
" visit_date,\n",
251+
" people,\n",
252+
" ROW_NUMBER() OVER (ORDER BY id) AS rn\n",
253+
" FROM Stadium\n",
254+
" WHERE people >= 100\n",
255+
" ) AS t\n",
256+
") AS x\n",
257+
"WHERE island_len >= 3\n",
258+
"ORDER BY visit_date;\n",
259+
"\n",
260+
"Runtime 332 ms\n",
261+
"Beats 80.65%\n",
262+
"\n",
263+
"```\n",
264+
"\n",
265+
"* 内側:`ROW_NUMBER()` を `rn` として計算\n",
266+
"* 中間:`grp_key = id - rn` を式で作る(ここでは単なる通常列演算)\n",
267+
"* 外側:`COUNT(*) OVER (PARTITION BY (id - rn))` で島の長さを算出して `>= 3` を抽出\n",
268+
"\n",
269+
"> ポイント:**ウィンドウ関数の引数や `PARTITION BY` 式の中に別のウィンドウ関数を置かない**こと。必ず一段外に出してから使う。\n",
270+
"\n",
271+
"## CTE 版(読みやすさ重視・推奨)\n",
272+
"\n",
273+
"```sql\n",
274+
"WITH pre AS (\n",
275+
" SELECT id, visit_date, people\n",
276+
" FROM Stadium\n",
277+
" WHERE people >= 100\n",
278+
"),\n",
279+
"grp AS (\n",
280+
" SELECT\n",
281+
" id,\n",
282+
" visit_date,\n",
283+
" people,\n",
284+
" id - ROW_NUMBER() OVER (ORDER BY id) AS grp_key\n",
285+
" FROM pre\n",
286+
")\n",
287+
"SELECT id, visit_date, people\n",
288+
"FROM (\n",
289+
" SELECT\n",
290+
" g.*,\n",
291+
" COUNT(*) OVER (PARTITION BY grp_key) AS island_len\n",
292+
" FROM grp AS g\n",
293+
") x\n",
294+
"WHERE island_len >= 3\n",
295+
"ORDER BY visit_date;\n",
296+
"```\n",
297+
"\n",
298+
"こちらは既にご提案済みの「JOIN 省略版」で、**ネストなし**・可読性良好です。\n",
299+
"\n",
300+
"---\n",
301+
"\n",
302+
"### 参考メモ\n",
303+
"\n",
304+
"* MySQL 8 の制約:`You cannot nest a window function in the specification of window ...`\n",
305+
" → **サブクエリ(派生表 or CTE)で段階計算**が定石です。\n",
306+
"* パフォーマンス面では、どちらの書き方も**結合を無くし、スキャン回数を減らせる**ため、先の `big_islands` 版より有利になりやすいです。\n",
307+
"* 追加最適化:`CREATE INDEX ix_stadium_people_id ON Stadium(people, id);` は引き続き有効です。\n",
308+
"\n"
309+
]
310+
}
311+
],
312+
"metadata": {
313+
"language_info": {
314+
"name": "python"
315+
}
316+
},
317+
"nbformat": 4,
318+
"nbformat_minor": 2
319+
}

0 commit comments

Comments
 (0)