Skip to content

Add case insensitive file systems support in rustdoc#83612

Closed
GuillaumeGomez wants to merge 2 commits into
rust-lang:masterfrom
GuillaumeGomez:case-insensitive
Closed

Add case insensitive file systems support in rustdoc#83612
GuillaumeGomez wants to merge 2 commits into
rust-lang:masterfrom
GuillaumeGomez:case-insensitive

Conversation

@GuillaumeGomez

Copy link
Copy Markdown
Member

Fixes #25879.
Fixes #80504.

This PR adds the support for case insensitive file system. To do so, we first detect if we are on a case insensitive file system (writing in files with same "name" if lowercased and then check if we have both different content or just one). This can be overriden with the --generate-case-insensitive option (which I used to test and for the tests).

Now for the front-end: we generate a file for the conflicted URL which contains JS which will load the expected file. The conflicted files are put in non-conflict file paths (simply adding a number depending on their position in the vector).

If you want to test it locally even on linux, you can check the Self and self keywords.

@GuillaumeGomez GuillaumeGomez added T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. A-rustdoc-ui Area: Rustdoc UI (generated HTML) labels Mar 28, 2021
@rust-highfive

Copy link
Copy Markdown
Contributor

Some changes occurred in HTML/CSS/JS.

cc @GuillaumeGomez

@rust-highfive

Copy link
Copy Markdown
Contributor

r? @ollie27

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 28, 2021
@rust-log-analyzer

This comment has been minimized.

@jyn514 jyn514 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To do so, we first detect if we are on a case insensitive file system (writing in files with same "name" if lowercased and then check if we have both different content or just one). This can be overriden with the --generate-case-insensitive option (which I used to test and for the tests).

I think we should do this unconditionally, not just when the current filesystem happens to be case-insensitive. The error in #83154 comes from trying to install linux docs on MacOS - if you keep the linux docs the same, they'll still give the same error.

Comment on lines +249 to +254
format!(
"var script = document.createElement('script');\
script.src = {:?};\
document.body.appendChild(script);",
script_src,
)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be dynamic with createElement('script')? Couldn't you just add the JS to the page directly?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could, but that would make the page a lot heavier (minimum twice the normal size since there at least two "items" to be loaded).

@jyn514 jyn514 Mar 29, 2021

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not my point - the idea here is you want to put the JS in a separate file so the main page is smaller, right? Why not add <script src="{script_src}"> rather than add that dynamically?

@GuillaumeGomez

GuillaumeGomez commented Mar 29, 2021

Copy link
Copy Markdown
Member Author

To do so, we first detect if we are on a case insensitive file system (writing in files with same "name" if lowercased and then check if we have both different content or just one). This can be overriden with the --generate-case-insensitive option (which I used to test and for the tests).

I think we should do this unconditionally, not just when the current filesystem happens to be case-insensitive. The error in #83154 comes from trying to install linux docs on MacOS - if you keep the linux docs the same, they'll still give the same error.

I'm not sure to understand: currently, this is checked everytime you run rustdoc, unless you use --generate-case-insensitive which will then enforce to check for conflicts. So you can get linux docs on macos and as long as you generate them on your mac, it'll work locally (or with --generate-case-insensitive option if you are on a case sensitive file system and want to export it).

EDIT: Oh, I think I just understood what you meant. Unfortunately, the current solution doesn't work on case sensitive file systems because if you have "a" and "A", it'll be two different files for the system. I can go around this limitation by creating a link to the first file if you want? Or simply duplicating it, as you prefer.

@kinnison

Copy link
Copy Markdown
Contributor

I think it should always generate safe filenames.

@GuillaumeGomez

Copy link
Copy Markdown
Member Author

They are safe on case sensitive file systems. I think we shouldn't do it automatically if unnecessary. But the case insensitive support should work on both kind of systems, so I'll fix the current issue.

@kinnison

Copy link
Copy Markdown
Contributor

If you intend to make the stdlib docs to always be generated with the flag to force case-insensitive support then okay. There is an issue that installing the linux docs on a macos filesystem (for use in a cross-compilation docker) fails because the Linux docs are built assuming a case-sensitive FS.

@GuillaumeGomez

Copy link
Copy Markdown
Member Author

I don't mind us using this flag when generating std docs so it can be redistributed everywhere without issue. But for websites like docs.rs, it makes more sense to just keep the current system (less files to handle, less JS, etc).

@jyn514 jyn514 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind us using this flag when generating std docs so it can be redistributed everywhere without issue. But for websites like docs.rs, it makes more sense to just keep the current system (less files to handle, less JS, etc).

I don't really like special casing the standard library docs more than we have to - it seems useful to me to have the docs redistributable anywhere, regardless of the project. docs.rs doesn't have downloadable docs for now, so it's less of an issue, but it may in the future. Projects that want to distribute their own docs, like windows-rs and rocket, would also benefit from having this.

If the current solution is too JS heavy, I would rather work on making that more efficient, since I expect people to read the standard library docs at least as much as they read docs.rs.

Comment on lines +249 to +254
format!(
"var script = document.createElement('script');\
script.src = {:?};\
document.body.appendChild(script);",
script_src,
)

@jyn514 jyn514 Mar 29, 2021

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not my point - the idea here is you want to put the JS in a separate file so the main page is smaller, right? Why not add <script src="{script_src}"> rather than add that dynamically?

@GuillaumeGomez

Copy link
Copy Markdown
Member Author

I don't really like special casing the standard library docs more than we have to - it seems useful to me to have the docs redistributable anywhere, regardless of the project. docs.rs doesn't have downloadable docs for now, so it's less of an issue, but it may in the future. Projects that want to distribute their own docs, like windows-rs and rocket, would also benefit from having this.

It's not special casing, it's simply about enabling an option. I guess the debate here is wether or not we want to enable it by default. I'd prefer not though because some parts (not much but still) of the documentation wouldn't work without JS enabled.

If the current solution is too JS heavy, I would rather work on making that more efficient, since I expect people to read the standard library docs at least as much as they read docs.rs.

The original page is very small, so that part isn't an issue from what I could see. The parsing of the JS is also very quick since it's mostly strings.

That's not my point - the idea here is you want to put the JS in a separate file so the main page is smaller, right? Why not add <script src="{script_src}"> rather than add that dynamically?

Because if we do that, the content will be loaded in all cases (if it's part of the DOM, it's loaded). So if we have 3 conflicts, we'll load the three items even though we only need one.

@jyn514

jyn514 commented Mar 29, 2021

Copy link
Copy Markdown
Member

Because if we do that, the content will be loaded in all cases (if it's part of the DOM, it's loaded). So if we have 3 conflicts, we'll load the three items even though we only need one.

Ok I see, I misunderstood how it works - it takes the entire page of the item and loads it in the current page using JS. That's also why it needs JS to work.

Rather than doing that, can you redirect to the disambiguated page? That shouldn't need any JS at all, just <meta http-refresh>, and avoids the issue with dynamic loading (which @pietroalbini will be happy about).

@est31

est31 commented Mar 29, 2021

Copy link
Copy Markdown
Member

I'd prefer not though because some parts (not much but still) of the documentation wouldn't work without JS enabled.

Which parts of the std libs documentation have collisions right now? It seems to me that they are quite rare.

@jyn514

jyn514 commented Mar 29, 2021

Copy link
Copy Markdown
Member

@est31 #80504 (it's in the PR description)

@GuillaumeGomez

Copy link
Copy Markdown
Member Author

Rather than doing that, can you redirect to the disambiguated page? That shouldn't need any JS at all, just <meta http-refresh>, and avoids the issue with dynamic loading (which @pietroalbini will be happy about).

And yet another thing I didn't know about. If it works, it'll be so awesome! Definitely gonna give it a try. If this work, I'll remove the split between sensitive and insensitive case like we currently have.

@GuillaumeGomez

Copy link
Copy Markdown
Member Author

I've been thinking even more about distributing the case insensitive by default, but it'd be problematic: on case sensitive file systems, we'll need to create file links to the file to handle all URLs (which isn't a problem). However, those file links might override the original file on the case insensitive systems. So independently of the solution suggested by @jyn514, how could we solve this problem?

@est31

est31 commented Mar 29, 2021

Copy link
Copy Markdown
Member

Just did a quick check in the browser console, apparently it's indeed the only doubling in the std docs:

{ let b = new Map(); searchIndex.std.i.map(function(v) {return [v[1].toLowerCase(), v[1], v[0]];}).filter(args => {let [lc, nc, v] = args; const k = lc+v; if (b.has(k)) { if (b.get(k) !== nc) { return true; } } b.set(k, nc); return false;});}

Outputs:

[
  [
    "self",
    "Self",
    21
  ]
]

@est31

est31 commented Mar 29, 2021

Copy link
Copy Markdown
Member

So independently of the solution suggested by @jyn514, how could we solve this problem?

One way might be to generate files for each case version, but each file generated contains a hidden div for each casing. On load, there is a little bit of js that looks at window.location.href and auto expands the correct div. That way, if the docs get messed up by a case insensitive file system, there is still a recovery path. To support non-js users, one can make the div that corresponds to the intended casing of the file be shown by default, and only be hidden by the js.

It's probably a nightmare though as I guess most of the content of the page has to be in that div :).

@jyn514

jyn514 commented Mar 29, 2021

Copy link
Copy Markdown
Member

I've been thinking even more about distributing the case insensitive by default, but it'd be problematic: on case sensitive file systems, we'll need to create file links to the file to handle all URLs (which isn't a problem). However, those file links might override the original file on the case insensitive systems. So independently of the solution suggested by @jyn514, how could we solve this problem?

I don't understand this problem, can you describe it more? How would it go wrong?

@GuillaumeGomez

Copy link
Copy Markdown
Member Author

Let's say you don't generate links on case sensitive file system: if you have a capital letter, the file won't be found. Which is why you need links if case sensitive. But links remain files, so on case insensitive, they might overwrite the "good" file (the conflict file like I call it). If that makes more sense?

@jyn514

jyn514 commented Mar 29, 2021

Copy link
Copy Markdown
Member

We discussed this in DMs - the issue is that keyword.self.html and keyword.Self.html both still need to exist on case-sensitive file systems. So we end up with the same issue as before, there needs to be a file in both cases, which breaks on case-insensitive file-systems. The only fix if we want docs generated on ext4 to work on NTFS is to change the names of the files we emit. @GuillaumeGomez is hoping to write up an RFC sometime soon.

@bors

bors commented Apr 2, 2021

Copy link
Copy Markdown
Collaborator

☔ The latest upstream changes (presumably #83781) made this pull request unmergeable. Please resolve the merge conflicts.

@jyn514

jyn514 commented Apr 2, 2021

Copy link
Copy Markdown
Member

@GuillaumeGomez IIUC we're currently planning to take a different approach, right? Does it make sense to keep this PR open?

@GuillaumeGomez

Copy link
Copy Markdown
Member Author

No. I guess that the FS kind detection part will be useful though. :)

@GuillaumeGomez GuillaumeGomez deleted the case-insensitive branch April 2, 2021 17:15
@GuillaumeGomez GuillaumeGomez restored the case-insensitive branch September 1, 2021 10:24
@GuillaumeGomez GuillaumeGomez deleted the case-insensitive branch August 19, 2024 12:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-rustdoc-ui Area: Rustdoc UI (generated HTML) S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rustdoc: pages are being overwritten on the file systems with case-insensitive names Keyword doc of self and Self points to the same file on windows.

8 participants