A fast, asset-complete website cloner built with Node.js, Puppeteer, Cheerio, and Axios. Crawls a website, downloads all HTML, CSS, JS, fonts, images, and media to a local folder. Suitable for archiving or offline analysis.
- 📄 Asset-complete crawl: CSS, JS, images, fonts, videos, audio, etc.
- 🔗 Recursive link following within root domain
- 🎨 CSS parsing: Handles
url(),@import, and asset references - 🧹 Query and fragment stripping for clean local files
- 🕷️ Uses Puppeteer (headless Chrome) for reliable page rendering
- ♻️ Download retry mechanism for network resilience
git clone https://github.com/NeaByteLab/Website-Cloner.git
cd Website-Cloner
npm installnode index.js <website_url> [output_folder]<website_url>: Root URL to clone (e.g.https://example.com)[output_folder]: (Optional) Output directory (default:./output)
Example:
node index.js https://example.com ./my-archiveAll files are saved with original folder structure in the output folder. Querystrings/fragments are stripped from asset references for clean offline usage.
- 🌐 Only follows links within the provided root domain.
- ✉️ Ignores mailto links and anchor jumps.
- 🎯 All CSS
url()and@importasset links are also downloaded. - 🔄 Minimal error output, retries up to 3 times for assets/pages.
MIT License © 2025 NeaByteLab