Skip to content

Commit 4aa9482

Browse files
committed
✨ chore(crawler): rename binary from mqcr to mq-crawl and update references
1 parent 1ab12b4 commit 4aa9482

File tree

4 files changed

+23
-35
lines changed

4 files changed

+23
-35
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -375,6 +375,7 @@ The following external tools are available to extend mq's functionality:
375375
376376
- [mq-check](https://github.com/harehare/mq/tree/main/crates/mq-check) - A syntax and semantic checker for mq files.
377377
- [mq-conv](https://github.com/harehare/mq-conv) - A CLI tool for converting various file formats to Markdown.
378+
- [mq-crawler](https://github.com/harehare/mq/tree/main/crates/mq-crawler) - A web crawler that extracts structured data from websites and outputs it in Markdown format.
378379
- [mq-docs](https://github.com/harehare/mq-docs) - A documentation generator for mq functions, macros, and selectors.
379380
- [mq-edit](https://github.com/harehare/mq-edit) - A terminal-based Markdown and code editor with WYSIWYG rendering and LSP support.
380381
- [mq-lsp](https://github.com/harehare/mq/tree/main/crates/mq-lsp) - Language Server Protocol (LSP) implementation for mq query files, providing IDE features like completion, hover, and diagnostics.

crates/mq-crawler/Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ url = {workspace = true}
3232

3333
[[bin]]
3434
doc = false
35-
name = "mqcr"
35+
name = "mq-crawl"
3636
path = "src/main.rs"
3737

3838
[dev-dependencies]

crates/mq-crawler/README.md

Lines changed: 20 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ Make web scraping and content extraction effortless with intelligent Markdown co
3030
### Homebrew
3131

3232
```sh
33-
brew install harehare/tap/mqcr
33+
brew install harehare/tap/mq-crawl
3434
```
3535

3636
### Cargo
@@ -53,72 +53,72 @@ cargo build --release -p mq-crawler
5353

5454
```bash
5555
# Crawl a website and output to stdout
56-
mqcr https://example.com
56+
mq-crawl https://example.com
5757

5858
# Save crawled content to directory
59-
mqcr -o ./output https://example.com
59+
mq-crawl -o ./output https://example.com
6060

6161
# Crawl with custom delay (default: 0.5 seconds)
62-
mqcr -d 2.0 https://example.com
62+
mq-crawl -d 2.0 https://example.com
6363

6464
# Limit crawl depth
65-
mqcr --depth 2 https://example.com
65+
mq-crawl --depth 2 https://example.com
6666
```
6767

6868
### Processing with mq Queries
6969

7070
```bash
7171
# Extract only headings from crawled pages
72-
mqcr -m '.h | select(contains("News"))' https://example.com
72+
mq-crawl -m '.h | select(contains("News"))' https://example.com
7373

7474
# Extract all code blocks
75-
mqcr -m '.code' https://developer.example.com
75+
mq-crawl -m '.code' https://developer.example.com
7676

7777
# Extract and transform links
78-
mqcr -m '.link | to_text()' https://example.com
78+
mq-crawl -m '.link | to_text()' https://example.com
7979
```
8080

8181
### Parallel Crawling
8282

8383
```bash
8484
# Crawl with 3 concurrent workers
85-
mqcr -c 3 https://example.com
85+
mq-crawl -c 3 https://example.com
8686

8787
# High-speed crawling with 10 workers
88-
mqcr -c 10 -d 0.1 https://example.com
88+
mq-crawl -c 10 -d 0.1 https://example.com
8989
```
9090

9191
### Custom Robots.txt
9292

9393
```bash
9494
# Use custom robots.txt file
95-
mqcr --robots-path ./custom-robots.txt https://example.com
95+
mq-crawl --robots-path ./custom-robots.txt https://example.com
9696
```
9797

9898
### HTML to Markdown Options
9999

100100
```bash
101101
# Extract scripts as code blocks
102-
mqcr --extract-scripts-as-code-blocks https://example.com
102+
mq-crawl --extract-scripts-as-code-blocks https://example.com
103103

104104
# Generate YAML front matter with metadata
105-
mqcr --generate-front-matter https://example.com
105+
mq-crawl --generate-front-matter https://example.com
106106

107107
# Use page title as H1 heading
108-
mqcr --use-title-as-h1 https://example.com
108+
mq-crawl --use-title-as-h1 https://example.com
109109

110110
# Combine multiple options
111-
mqcr --generate-front-matter --use-title-as-h1 -o ./docs https://example.com
111+
mq-crawl --generate-front-matter --use-title-as-h1 -o ./docs https://example.com
112112
```
113113

114114
### Output Formats
115115

116116
```bash
117117
# Output as JSON
118-
mqcr --format json https://example.com
118+
mq-crawl --format json https://example.com
119119

120120
# Output as text (default)
121-
mqcr --format text https://example.com
121+
mq-crawl --format text https://example.com
122122
```
123123

124124
### Browser-Based Crawling
@@ -130,10 +130,10 @@ For JavaScript-heavy sites, use WebDriver (requires Selenium):
130130
# docker run -d -p 4444:4444 selenium/standalone-chrome
131131

132132
# Crawl with WebDriver
133-
mqcr -U http://localhost:4444 https://spa-example.com
133+
mq-crawl -U http://localhost:4444 https://spa-example.com
134134

135135
# Custom timeouts
136-
mqcr -U http://localhost:4444 \
136+
mq-crawl -U http://localhost:4444 \
137137
--page-load-timeout 60 \
138138
--script-timeout 30 \
139139
--implicit-timeout 10 \
@@ -145,7 +145,7 @@ mqcr -U http://localhost:4444 \
145145
```sh
146146
A simple web crawler that fetches HTML, converts it to Markdown, and optionally processes it with an mq query
147147

148-
Usage: mqcr [OPTIONS] <URL>
148+
Usage: mq-crawl [OPTIONS] <URL>
149149

150150
Arguments:
151151
<URL> The initial URL to start crawling from

scripts/install_crawler.sh

Lines changed: 1 addition & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -188,7 +188,7 @@ install_mqcr() {
188188
local os="$2"
189189
local arch="$3"
190190
local download_url
191-
local binary_name="mqcr"
191+
local binary_name="mq-crawl"
192192
local ext=""
193193
local target=""
194194

@@ -240,19 +240,6 @@ install_mqcr() {
240240
chmod +x "$MQ_BIN_DIR/$binary_name"
241241

242242
log "mq-crawler installed successfully to $MQ_BIN_DIR/$binary_name"
243-
244-
# Create symlink mq-crawl -> mqcr
245-
local symlink_name="mq-crawl"
246-
if [[ "$os" == "windows" ]]; then
247-
symlink_name="mq-crawl.exe"
248-
fi
249-
250-
if [[ -L "$MQ_BIN_DIR/$symlink_name" ]]; then
251-
rm "$MQ_BIN_DIR/$symlink_name"
252-
fi
253-
254-
ln -s "$MQ_BIN_DIR/$binary_name" "$MQ_BIN_DIR/$symlink_name"
255-
log "Created symlink $symlink_name -> $binary_name"
256243
}
257244

258245
# Add mq to PATH by updating shell profile

0 commit comments

Comments
 (0)