Skip to content

Commit 81add53

Browse files
committed
Fixed merge conflits
2 parents 1524cf8 + af542b2 commit 81add53

File tree

276 files changed

+324304
-313960
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

276 files changed

+324304
-313960
lines changed

.claude/CLAUDE.md

Lines changed: 287 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,287 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4+
5+
## Repository Overview
6+
7+
A comprehensive geographical database (151k+ cities, 5k+ states, 250 countries) available in 9 formats. This is a **data repository** focused on data integrity and multi-format exports.
8+
9+
## Architecture: Two-Phase Build System
10+
11+
The repository uses a **bidirectional workflow** where data flows between JSON (version control) and MySQL (canonical state):
12+
13+
```
14+
contributions/ → [Python Import] → MySQL → [PHP Export] → json/, csv/, xml/, yml/, sql/
15+
├── cities/*.json world (Symfony Console) sqlite/, mongodb/, etc.
16+
├── states.json (canonical)
17+
└── countries.json
18+
```
19+
20+
**Phase 1: Python Import** (`bin/scripts/sync/import_json_to_mysql.py`)
21+
- Reads `contributions/` JSON files
22+
- Dynamic schema detection (auto-adds new columns to MySQL)
23+
- IDs auto-assigned by MySQL AUTO_INCREMENT
24+
- Handles 209+ country-specific city files
25+
26+
**Phase 2: PHP Export** (`bin/Commands/Export*.php`)
27+
- Symfony Console commands (one per format)
28+
- Reads directly from MySQL via SELECT queries
29+
- Memory limit: unlimited (handles 151k+ records)
30+
- Auto-discovered by `bin/console` application
31+
32+
## Data Contribution Workflows
33+
34+
### Workflow 1: JSON-First (Contributors)
35+
```bash
36+
# 1. Edit contributions/cities/US.json (omit 'id' for new records)
37+
# 2. Push changes
38+
# 3. GitHub Actions auto-runs:
39+
python3 bin/scripts/sync/import_json_to_mysql.py # JSON → MySQL (IDs assigned)
40+
cd bin && php console export:json # MySQL → all formats
41+
```
42+
43+
### Workflow 2: SQL-First (Maintainers)
44+
```bash
45+
mysql -uroot -proot world # Make changes
46+
python3 bin/scripts/sync/sync_mysql_to_json.py # Sync MySQL → JSON
47+
git add contributions/ && git commit # Commit JSON changes
48+
```
49+
50+
### Optional: Auto-Normalize JSON
51+
```bash
52+
# Pre-assign IDs before committing (connects to MySQL for next ID)
53+
python3 bin/scripts/sync/normalize_json.py contributions/cities/US.json
54+
```
55+
56+
## Common Development Commands
57+
58+
### Initial Setup
59+
```bash
60+
cd bin
61+
composer install --no-interaction --prefer-dist # PHP dependencies (Symfony Console, etc.)
62+
```
63+
64+
### Database Setup
65+
```bash
66+
# Start MySQL
67+
sudo systemctl start mysql.service
68+
69+
# Create database
70+
mysql -uroot -proot -e "CREATE DATABASE world CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;"
71+
72+
# Import canonical SQL dump
73+
mysql -uroot -proot --default-character-set=utf8mb4 world < sql/world.sql
74+
75+
# Validate
76+
mysql -uroot -proot -e "USE world; SELECT COUNT(*) FROM cities;" # Should be ~151,024
77+
```
78+
79+
### Import & Export (Local Testing)
80+
```bash
81+
# Import JSON to MySQL (from contributions/)
82+
python3 bin/scripts/sync/import_json_to_mysql.py
83+
84+
# Export MySQL to all formats
85+
cd bin
86+
php console export:json # 4 seconds
87+
php console export:csv # 1 second
88+
php console export:xml # 9 seconds
89+
php console export:yaml # 17 seconds
90+
php console export:sql-server # 3 seconds
91+
php console export:mongodb # 1 second
92+
93+
# Sync MySQL back to JSON (validation or SQL-first workflow)
94+
python3 bin/scripts/sync/sync_mysql_to_json.py
95+
```
96+
97+
### Database Migration (Optional)
98+
```bash
99+
# PostgreSQL
100+
cd nmig
101+
npm install && npm run build
102+
cp ../nmig.config.json config/config.json
103+
npm start
104+
105+
# SQLite
106+
pip install mysql-to-sqlite3
107+
mysql2sqlite -d world --mysql-password root -u root -f sqlite/world.sqlite3
108+
109+
# DuckDB (reference only, not supported)
110+
pip install duckdb
111+
python3 bin/scripts/export/import_duckdb.py --input sqlite/world.sqlite3 --output duckdb/world.db
112+
```
113+
114+
## Key Architecture Patterns
115+
116+
### PHP Console Application (`bin/console`)
117+
- Symfony Console Application extending `Application`
118+
- Auto-discovers commands in `bin/Commands/` via DirectoryIterator
119+
- Sets `memory_limit = -1` for large dataset exports
120+
- Registers Phinx migration commands (migrate, seed, etc.)
121+
- Each export = independent Command class
122+
123+
### Dynamic Schema Detection (`import_json_to_mysql.py`)
124+
```python
125+
# Auto-detects new columns in JSON and adds to MySQL
126+
new_columns = self.detect_new_columns(table_name, json_data)
127+
if new_columns:
128+
self.add_columns_to_table(table_name, new_columns) # ALTER TABLE
129+
```
130+
131+
**How it works:**
132+
1. Compares JSON fields vs MySQL SHOW COLUMNS
133+
2. Infers MySQL types from JSON values (smart type detection)
134+
3. Executes ALTER TABLE for missing columns
135+
4. Bidirectional schema evolution supported
136+
137+
### Export Command Pattern (`bin/Commands/Export*.php`)
138+
```php
139+
class ExportJson extends Command {
140+
protected static $defaultName = 'export:json';
141+
142+
protected function execute(InputInterface $input, OutputInterface $output): int {
143+
$db = Config::getConfig()->getDB(); // MySQL connection
144+
$result = $db->query("SELECT * FROM cities ORDER BY name");
145+
// Build arrays, write to json/ directory
146+
$this->filesystem->dumpFile($rootDir . '/json/cities.json', ...);
147+
}
148+
}
149+
```
150+
151+
**Pattern for new exports:**
152+
1. Extend `Command`, set `$defaultName` (e.g., `export:csv`)
153+
2. Read from MySQL via `Config::getConfig()->getDB()`
154+
3. Transform data to target format
155+
4. Write to corresponding directory (csv/, xml/, etc.)
156+
157+
### GitHub Actions Workflow (`.github/workflows/export.yml`)
158+
**Auto-runs on:**
159+
- Manual trigger (workflow_dispatch)
160+
- Changes to `bin/Commands/Export**`
161+
162+
**Steps:**
163+
1. Setup: MySQL, PostgreSQL, MongoDB
164+
2. Create database + run schema migrations
165+
3. Import: `import_json_to_mysql.py` (JSON → MySQL)
166+
4. Validation: `sync_mysql_to_json.py` (MySQL → JSON round-trip)
167+
5. Exports: All formats via PHP console commands
168+
6. Compress: Large files (.gz for cities.json, world.sql, etc.)
169+
7. Pull Request: Auto-created with all exports
170+
171+
## File Organization
172+
173+
```
174+
contributions/ # Source of truth (edit these)
175+
├── cities/ # 209+ country files (US.json, IN.json, etc.)
176+
├── states/states.json
177+
└── countries/countries.json
178+
179+
bin/
180+
├── console # Symfony Console app (CLI entrypoint)
181+
├── Commands/ # Export*.php classes (auto-discovered)
182+
├── config/
183+
│ ├── app.yaml # DB credentials for scripts
184+
│ └── phinx.yaml # Migration config
185+
└── scripts/
186+
├── sync/ # Python: JSON ↔ MySQL bidirectional sync
187+
└── export/ # Python: Format conversions (SQLite, DuckDB)
188+
189+
sql/world.sql # Canonical MySQL dump (auto-generated)
190+
```
191+
192+
## Data Schema Essentials
193+
194+
### Cities (Most Common)
195+
- `id` - OMIT for new records (MySQL AUTO_INCREMENT)
196+
- `name`, `state_id`, `state_code`, `country_id`, `country_code`, `latitude`, `longitude` - REQUIRED
197+
- `timezone` (IANA), `wikiDataId` - Optional
198+
- `created_at`, `updated_at`, `flag` - OMIT (auto-managed by MySQL)
199+
200+
### Finding Foreign Keys
201+
```bash
202+
grep -A 5 '"name": "California"' contributions/states/states.json # state_id
203+
grep -A 5 '"name": "United States"' contributions/countries/countries.json # country_id
204+
```
205+
206+
## Configuration
207+
208+
**Database credentials:**
209+
- Scripts: `bin/config/app.yaml`
210+
- PHP exports: `Config::getConfig()->getDB()` reads from app.yaml
211+
- Default: `root:root@localhost/world`
212+
- **Local Environment**: MySQL runs without password (`root:@localhost/world`)
213+
214+
**Override for GitHub Actions:**
215+
```bash
216+
python3 bin/scripts/sync/import_json_to_mysql.py --host $DB_HOST --user $DB_USER --password $DB_PASSWORD
217+
```
218+
219+
**Local MySQL commands (no password):**
220+
```bash
221+
mysql -uroot world # Connect to database
222+
mysql -uroot -e "USE world; SHOW TABLES;" # Run queries
223+
```
224+
225+
## Important Rules
226+
227+
**DO:**
228+
- Edit `contributions/` JSON only (source of truth)
229+
- Omit `id` for new records (auto-assigned)
230+
- Run `normalize_json.py` to pre-assign IDs (optional)
231+
- Document fixes in `.github/fixes-docs/FIX_<issue_number>_SUMMARY.md` (ONE file per issue)
232+
- When adding states + cities: run JSON→MySQL→JSON between tasks for ID assignment
233+
- Use `ExportJson.php` as reference for new export commands
234+
235+
**DO NOT:**
236+
- Edit auto-generated dirs: `json/`, `csv/`, `xml/`, `yml/`, `sql/`, etc.
237+
- Commit large exports without explicit request
238+
- Edit `sql/world.sql` directly (prefer JSON-first workflow)
239+
- Add `flag`, `created_at`, `updated_at` manually (MySQL manages these)
240+
- Run exports locally without cleaning up afterward
241+
242+
## Performance Expectations
243+
244+
- MySQL import: ~3 seconds (151k+ records)
245+
- JSON export: ~4 seconds
246+
- CSV export: ~1 second
247+
- XML export: ~9 seconds
248+
- YAML export: ~17 seconds
249+
- DuckDB conversion: ~8 minutes (set 20+ min timeout)
250+
- GitHub Actions: 10-15 minutes (full pipeline)
251+
252+
## Validation Queries
253+
254+
```sql
255+
-- Data integrity check
256+
SELECT 'Cities', COUNT(*) FROM cities UNION
257+
SELECT 'States', COUNT(*) FROM states UNION
258+
SELECT 'Countries', COUNT(*) FROM countries;
259+
260+
-- Sample validation
261+
SELECT COUNT(*) FROM cities WHERE country_code = 'US'; -- ~19,824
262+
```
263+
264+
## Common Issues
265+
266+
- **Composer hangs**: Use `--no-interaction --prefer-dist`
267+
- **MySQL connection failed**: `sudo systemctl start mysql.service`
268+
- **DuckDB timeout**: Takes 8+ minutes, set timeout to 20+ minutes
269+
- **Export files missing**: Run exports from `bin/` directory
270+
- **Round-trip validation fails**: Check for schema mismatches between JSON and MySQL
271+
272+
## Timezone Management
273+
274+
**Tools (Keep these):**
275+
- `bin/scripts/analysis/timezone_summary.py` - Generate timezone analysis reports
276+
- `bin/scripts/fixes/timezone_mappings.json` - Geographic timezone reference data
277+
- `.github/fixes-docs/TIMEZONE_FIX_SUMMARY.md` - Documentation of fixes applied (2025-10-18)
278+
279+
**Completed Fixes (2025-10-18):**
280+
- Fixed 81 states across 9 countries (US, CA, RU, BR, MX, AU, AR, ID, KZ, CN)
281+
- Improved timezone utilization from 240 to 299 unique timezones
282+
- All changes validated against country timezone definitions
283+
284+
**Generate Timezone Reports:**
285+
```bash
286+
python3 bin/scripts/analysis/timezone_summary.py # Full analysis report
287+
```

0 commit comments

Comments
 (0)