Skip to content

NeaByteLab/Website-Cloner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🚀 Website Cloner (Node.js)

A fast, asset-complete website cloner built with Node.js, Puppeteer, Cheerio, and Axios. Crawls a website, downloads all HTML, CSS, JS, fonts, images, and media to a local folder. Suitable for archiving or offline analysis.


✨ Features

  • 📄 Asset-complete crawl: CSS, JS, images, fonts, videos, audio, etc.
  • 🔗 Recursive link following within root domain
  • 🎨 CSS parsing: Handles url(), @import, and asset references
  • 🧹 Query and fragment stripping for clean local files
  • 🕷️ Uses Puppeteer (headless Chrome) for reliable page rendering
  • ♻️ Download retry mechanism for network resilience

📦 Install

git clone https://github.com/NeaByteLab/Website-Cloner.git
cd Website-Cloner
npm install

▶️ Usage

node index.js <website_url> [output_folder]
  • <website_url>: Root URL to clone (e.g. https://example.com)
  • [output_folder]: (Optional) Output directory (default: ./output)

Example:

node index.js https://example.com ./my-archive

📁 Output

All files are saved with original folder structure in the output folder. Querystrings/fragments are stripped from asset references for clean offline usage.

📝 Notes

  • 🌐 Only follows links within the provided root domain.
  • ✉️ Ignores mailto links and anchor jumps.
  • 🎯 All CSS url() and @import asset links are also downloaded.
  • 🔄 Minimal error output, retries up to 3 times for assets/pages.

📜 License

MIT License © 2025 NeaByteLab

About

Fast, asset-complete website cloner for Node.js. Crawls and downloads all HTML, CSS, JS, images, and media for full offline backup.

Topics

Resources

License

Stars

Watchers

Forks

Contributors