|
| 1 | +# CLAUDE.md |
| 2 | + |
| 3 | +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. |
| 4 | + |
| 5 | +## Repository Overview |
| 6 | + |
| 7 | +A comprehensive geographical database (151k+ cities, 5k+ states, 250 countries) available in 9 formats. This is a **data repository** focused on data integrity and multi-format exports. |
| 8 | + |
| 9 | +## Architecture: Two-Phase Build System |
| 10 | + |
| 11 | +The repository uses a **bidirectional workflow** where data flows between JSON (version control) and MySQL (canonical state): |
| 12 | + |
| 13 | +``` |
| 14 | +contributions/ → [Python Import] → MySQL → [PHP Export] → json/, csv/, xml/, yml/, sql/ |
| 15 | + ├── cities/*.json world (Symfony Console) sqlite/, mongodb/, etc. |
| 16 | + ├── states.json (canonical) |
| 17 | + └── countries.json |
| 18 | +``` |
| 19 | + |
| 20 | +**Phase 1: Python Import** (`bin/scripts/sync/import_json_to_mysql.py`) |
| 21 | +- Reads `contributions/` JSON files |
| 22 | +- Dynamic schema detection (auto-adds new columns to MySQL) |
| 23 | +- IDs auto-assigned by MySQL AUTO_INCREMENT |
| 24 | +- Handles 209+ country-specific city files |
| 25 | + |
| 26 | +**Phase 2: PHP Export** (`bin/Commands/Export*.php`) |
| 27 | +- Symfony Console commands (one per format) |
| 28 | +- Reads directly from MySQL via SELECT queries |
| 29 | +- Memory limit: unlimited (handles 151k+ records) |
| 30 | +- Auto-discovered by `bin/console` application |
| 31 | + |
| 32 | +## Data Contribution Workflows |
| 33 | + |
| 34 | +### Workflow 1: JSON-First (Contributors) |
| 35 | +```bash |
| 36 | +# 1. Edit contributions/cities/US.json (omit 'id' for new records) |
| 37 | +# 2. Push changes |
| 38 | +# 3. GitHub Actions auto-runs: |
| 39 | +python3 bin/scripts/sync/import_json_to_mysql.py # JSON → MySQL (IDs assigned) |
| 40 | +cd bin && php console export:json # MySQL → all formats |
| 41 | +``` |
| 42 | + |
| 43 | +### Workflow 2: SQL-First (Maintainers) |
| 44 | +```bash |
| 45 | +mysql -uroot -proot world # Make changes |
| 46 | +python3 bin/scripts/sync/sync_mysql_to_json.py # Sync MySQL → JSON |
| 47 | +git add contributions/ && git commit # Commit JSON changes |
| 48 | +``` |
| 49 | + |
| 50 | +### Optional: Auto-Normalize JSON |
| 51 | +```bash |
| 52 | +# Pre-assign IDs before committing (connects to MySQL for next ID) |
| 53 | +python3 bin/scripts/sync/normalize_json.py contributions/cities/US.json |
| 54 | +``` |
| 55 | + |
| 56 | +## Common Development Commands |
| 57 | + |
| 58 | +### Initial Setup |
| 59 | +```bash |
| 60 | +cd bin |
| 61 | +composer install --no-interaction --prefer-dist # PHP dependencies (Symfony Console, etc.) |
| 62 | +``` |
| 63 | + |
| 64 | +### Database Setup |
| 65 | +```bash |
| 66 | +# Start MySQL |
| 67 | +sudo systemctl start mysql.service |
| 68 | + |
| 69 | +# Create database |
| 70 | +mysql -uroot -proot -e "CREATE DATABASE world CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;" |
| 71 | + |
| 72 | +# Import canonical SQL dump |
| 73 | +mysql -uroot -proot --default-character-set=utf8mb4 world < sql/world.sql |
| 74 | + |
| 75 | +# Validate |
| 76 | +mysql -uroot -proot -e "USE world; SELECT COUNT(*) FROM cities;" # Should be ~151,024 |
| 77 | +``` |
| 78 | + |
| 79 | +### Import & Export (Local Testing) |
| 80 | +```bash |
| 81 | +# Import JSON to MySQL (from contributions/) |
| 82 | +python3 bin/scripts/sync/import_json_to_mysql.py |
| 83 | + |
| 84 | +# Export MySQL to all formats |
| 85 | +cd bin |
| 86 | +php console export:json # 4 seconds |
| 87 | +php console export:csv # 1 second |
| 88 | +php console export:xml # 9 seconds |
| 89 | +php console export:yaml # 17 seconds |
| 90 | +php console export:sql-server # 3 seconds |
| 91 | +php console export:mongodb # 1 second |
| 92 | + |
| 93 | +# Sync MySQL back to JSON (validation or SQL-first workflow) |
| 94 | +python3 bin/scripts/sync/sync_mysql_to_json.py |
| 95 | +``` |
| 96 | + |
| 97 | +### Database Migration (Optional) |
| 98 | +```bash |
| 99 | +# PostgreSQL |
| 100 | +cd nmig |
| 101 | +npm install && npm run build |
| 102 | +cp ../nmig.config.json config/config.json |
| 103 | +npm start |
| 104 | + |
| 105 | +# SQLite |
| 106 | +pip install mysql-to-sqlite3 |
| 107 | +mysql2sqlite -d world --mysql-password root -u root -f sqlite/world.sqlite3 |
| 108 | + |
| 109 | +# DuckDB (reference only, not supported) |
| 110 | +pip install duckdb |
| 111 | +python3 bin/scripts/export/import_duckdb.py --input sqlite/world.sqlite3 --output duckdb/world.db |
| 112 | +``` |
| 113 | + |
| 114 | +## Key Architecture Patterns |
| 115 | + |
| 116 | +### PHP Console Application (`bin/console`) |
| 117 | +- Symfony Console Application extending `Application` |
| 118 | +- Auto-discovers commands in `bin/Commands/` via DirectoryIterator |
| 119 | +- Sets `memory_limit = -1` for large dataset exports |
| 120 | +- Registers Phinx migration commands (migrate, seed, etc.) |
| 121 | +- Each export = independent Command class |
| 122 | + |
| 123 | +### Dynamic Schema Detection (`import_json_to_mysql.py`) |
| 124 | +```python |
| 125 | +# Auto-detects new columns in JSON and adds to MySQL |
| 126 | +new_columns = self.detect_new_columns(table_name, json_data) |
| 127 | +if new_columns: |
| 128 | + self.add_columns_to_table(table_name, new_columns) # ALTER TABLE |
| 129 | +``` |
| 130 | + |
| 131 | +**How it works:** |
| 132 | +1. Compares JSON fields vs MySQL SHOW COLUMNS |
| 133 | +2. Infers MySQL types from JSON values (smart type detection) |
| 134 | +3. Executes ALTER TABLE for missing columns |
| 135 | +4. Bidirectional schema evolution supported |
| 136 | + |
| 137 | +### Export Command Pattern (`bin/Commands/Export*.php`) |
| 138 | +```php |
| 139 | +class ExportJson extends Command { |
| 140 | + protected static $defaultName = 'export:json'; |
| 141 | + |
| 142 | + protected function execute(InputInterface $input, OutputInterface $output): int { |
| 143 | + $db = Config::getConfig()->getDB(); // MySQL connection |
| 144 | + $result = $db->query("SELECT * FROM cities ORDER BY name"); |
| 145 | + // Build arrays, write to json/ directory |
| 146 | + $this->filesystem->dumpFile($rootDir . '/json/cities.json', ...); |
| 147 | + } |
| 148 | +} |
| 149 | +``` |
| 150 | + |
| 151 | +**Pattern for new exports:** |
| 152 | +1. Extend `Command`, set `$defaultName` (e.g., `export:csv`) |
| 153 | +2. Read from MySQL via `Config::getConfig()->getDB()` |
| 154 | +3. Transform data to target format |
| 155 | +4. Write to corresponding directory (csv/, xml/, etc.) |
| 156 | + |
| 157 | +### GitHub Actions Workflow (`.github/workflows/export.yml`) |
| 158 | +**Auto-runs on:** |
| 159 | +- Manual trigger (workflow_dispatch) |
| 160 | +- Changes to `bin/Commands/Export**` |
| 161 | + |
| 162 | +**Steps:** |
| 163 | +1. Setup: MySQL, PostgreSQL, MongoDB |
| 164 | +2. Create database + run schema migrations |
| 165 | +3. Import: `import_json_to_mysql.py` (JSON → MySQL) |
| 166 | +4. Validation: `sync_mysql_to_json.py` (MySQL → JSON round-trip) |
| 167 | +5. Exports: All formats via PHP console commands |
| 168 | +6. Compress: Large files (.gz for cities.json, world.sql, etc.) |
| 169 | +7. Pull Request: Auto-created with all exports |
| 170 | + |
| 171 | +## File Organization |
| 172 | + |
| 173 | +``` |
| 174 | +contributions/ # Source of truth (edit these) |
| 175 | +├── cities/ # 209+ country files (US.json, IN.json, etc.) |
| 176 | +├── states/states.json |
| 177 | +└── countries/countries.json |
| 178 | +
|
| 179 | +bin/ |
| 180 | +├── console # Symfony Console app (CLI entrypoint) |
| 181 | +├── Commands/ # Export*.php classes (auto-discovered) |
| 182 | +├── config/ |
| 183 | +│ ├── app.yaml # DB credentials for scripts |
| 184 | +│ └── phinx.yaml # Migration config |
| 185 | +└── scripts/ |
| 186 | + ├── sync/ # Python: JSON ↔ MySQL bidirectional sync |
| 187 | + └── export/ # Python: Format conversions (SQLite, DuckDB) |
| 188 | +
|
| 189 | +sql/world.sql # Canonical MySQL dump (auto-generated) |
| 190 | +``` |
| 191 | + |
| 192 | +## Data Schema Essentials |
| 193 | + |
| 194 | +### Cities (Most Common) |
| 195 | +- `id` - OMIT for new records (MySQL AUTO_INCREMENT) |
| 196 | +- `name`, `state_id`, `state_code`, `country_id`, `country_code`, `latitude`, `longitude` - REQUIRED |
| 197 | +- `timezone` (IANA), `wikiDataId` - Optional |
| 198 | +- `created_at`, `updated_at`, `flag` - OMIT (auto-managed by MySQL) |
| 199 | + |
| 200 | +### Finding Foreign Keys |
| 201 | +```bash |
| 202 | +grep -A 5 '"name": "California"' contributions/states/states.json # state_id |
| 203 | +grep -A 5 '"name": "United States"' contributions/countries/countries.json # country_id |
| 204 | +``` |
| 205 | + |
| 206 | +## Configuration |
| 207 | + |
| 208 | +**Database credentials:** |
| 209 | +- Scripts: `bin/config/app.yaml` |
| 210 | +- PHP exports: `Config::getConfig()->getDB()` reads from app.yaml |
| 211 | +- Default: `root:root@localhost/world` |
| 212 | +- **Local Environment**: MySQL runs without password (`root:@localhost/world`) |
| 213 | + |
| 214 | +**Override for GitHub Actions:** |
| 215 | +```bash |
| 216 | +python3 bin/scripts/sync/import_json_to_mysql.py --host $DB_HOST --user $DB_USER --password $DB_PASSWORD |
| 217 | +``` |
| 218 | + |
| 219 | +**Local MySQL commands (no password):** |
| 220 | +```bash |
| 221 | +mysql -uroot world # Connect to database |
| 222 | +mysql -uroot -e "USE world; SHOW TABLES;" # Run queries |
| 223 | +``` |
| 224 | + |
| 225 | +## Important Rules |
| 226 | + |
| 227 | +**DO:** |
| 228 | +- Edit `contributions/` JSON only (source of truth) |
| 229 | +- Omit `id` for new records (auto-assigned) |
| 230 | +- Run `normalize_json.py` to pre-assign IDs (optional) |
| 231 | +- Document fixes in `.github/fixes-docs/FIX_<issue_number>_SUMMARY.md` (ONE file per issue) |
| 232 | +- When adding states + cities: run JSON→MySQL→JSON between tasks for ID assignment |
| 233 | +- Use `ExportJson.php` as reference for new export commands |
| 234 | + |
| 235 | +**DO NOT:** |
| 236 | +- Edit auto-generated dirs: `json/`, `csv/`, `xml/`, `yml/`, `sql/`, etc. |
| 237 | +- Commit large exports without explicit request |
| 238 | +- Edit `sql/world.sql` directly (prefer JSON-first workflow) |
| 239 | +- Add `flag`, `created_at`, `updated_at` manually (MySQL manages these) |
| 240 | +- Run exports locally without cleaning up afterward |
| 241 | + |
| 242 | +## Performance Expectations |
| 243 | + |
| 244 | +- MySQL import: ~3 seconds (151k+ records) |
| 245 | +- JSON export: ~4 seconds |
| 246 | +- CSV export: ~1 second |
| 247 | +- XML export: ~9 seconds |
| 248 | +- YAML export: ~17 seconds |
| 249 | +- DuckDB conversion: ~8 minutes (set 20+ min timeout) |
| 250 | +- GitHub Actions: 10-15 minutes (full pipeline) |
| 251 | + |
| 252 | +## Validation Queries |
| 253 | + |
| 254 | +```sql |
| 255 | +-- Data integrity check |
| 256 | +SELECT 'Cities', COUNT(*) FROM cities UNION |
| 257 | +SELECT 'States', COUNT(*) FROM states UNION |
| 258 | +SELECT 'Countries', COUNT(*) FROM countries; |
| 259 | + |
| 260 | +-- Sample validation |
| 261 | +SELECT COUNT(*) FROM cities WHERE country_code = 'US'; -- ~19,824 |
| 262 | +``` |
| 263 | + |
| 264 | +## Common Issues |
| 265 | + |
| 266 | +- **Composer hangs**: Use `--no-interaction --prefer-dist` |
| 267 | +- **MySQL connection failed**: `sudo systemctl start mysql.service` |
| 268 | +- **DuckDB timeout**: Takes 8+ minutes, set timeout to 20+ minutes |
| 269 | +- **Export files missing**: Run exports from `bin/` directory |
| 270 | +- **Round-trip validation fails**: Check for schema mismatches between JSON and MySQL |
| 271 | + |
| 272 | +## Timezone Management |
| 273 | + |
| 274 | +**Tools (Keep these):** |
| 275 | +- `bin/scripts/analysis/timezone_summary.py` - Generate timezone analysis reports |
| 276 | +- `bin/scripts/fixes/timezone_mappings.json` - Geographic timezone reference data |
| 277 | +- `.github/fixes-docs/TIMEZONE_FIX_SUMMARY.md` - Documentation of fixes applied (2025-10-18) |
| 278 | + |
| 279 | +**Completed Fixes (2025-10-18):** |
| 280 | +- Fixed 81 states across 9 countries (US, CA, RU, BR, MX, AU, AR, ID, KZ, CN) |
| 281 | +- Improved timezone utilization from 240 to 299 unique timezones |
| 282 | +- All changes validated against country timezone definitions |
| 283 | + |
| 284 | +**Generate Timezone Reports:** |
| 285 | +```bash |
| 286 | +python3 bin/scripts/analysis/timezone_summary.py # Full analysis report |
| 287 | +``` |
0 commit comments