Skip to content

Support context-aware includes for images and anchors#826

Merged
yamgent merged 5 commits into
MarkBind:masterfrom
sijie123:opRelativeSrc
Apr 18, 2019
Merged

Support context-aware includes for images and anchors#826
yamgent merged 5 commits into
MarkBind:masterfrom
sijie123:opRelativeSrc

Conversation

@sijie123

@sijie123 sijie123 commented Apr 9, 2019

Copy link
Copy Markdown
Contributor

• [x] Enhancement to an existing feature

Fixes #822

What is the rationale for this request?
After our parser includes a file, it's context is discarded.
This means that in subsequent passes of parser, we do not know where
the content has been included from, and included content is treated
uniformly as if they were originally part of the file.

This means that content in an included file must be valid wherever
it is included from. For example, resource references need to be
absolute paths, so that they remain valid wherever they are included
from.

This makes it impossible to support relative src references written using Markdown, as markdown code is only parsed after all the includes have been made.

What changes did you make? (Give an overview)
To support context-based references, i.e. the included segments
retain awareness of their original include locations, let's add
a data-included-from meta whenever we include files. To maintain
the existing behaviour and to prevent leaking file structure data,
this meta tag is removed before we finish generating the final
website.

Breaking changes
Previously, some relative URLs are allowed. The reference was always made against files which initiate the include.
Now, any reference is always made against files using the relative URLs.
This should make it more intuitive for authors to reuse content, but may break existing code especially if they are generated using nunjucks/macros.

Provide some example code that this change will affect:
Given this directory structure:

/
├ userGuide/
├-- subFolder/
├---- content.md
├-- index.md
├-- tutorial.md

Assume that we have a tutorial file that simply collates a list of content:

<include src="subFolder/content.md" />

And assuming we have the existing content file:

Topic One
Please refer to [our user guide here](index.html)

The existing content file is correct because the generated tutorial.html file will have a correct relative reference to index.html, the homepage of the user guide.
However, after this PR, the existing content file is incorrect. index.html will refer to an index file within subFolder. All relative references to resources and links are now with respect to the file that writes these references, so in this case, they are made with respect to content.md.
content.md should be updated as

Topic One
Please refer to [our user guide here](../index.html)

Is there anything you'd like reviewers to focus on?
As this PR is significantly breaking and likely to be buggy, please help me test as much as possible and point out any bugs so that I can correct them ASAP.
Sites confirmed working:
[x] Our system test site
[x] CS2103T website
[x] TE3201 website

List of known issues:
All known issues to date are resolved.

Progress:
[x] Finish testing against all existing websites and ensure that there are no bugs
[x] Update documentation to reflect change against relative URL references
[x] Determine reason behind unwrapping div bug. Update: bug is due to difference between span and div used for inline and non-inline includes. Bug squashed.
[x] Determine why tests aren't correct - and whether to update tests or to find bug. Update: bug squashed.

Testing instructions:
Please check out this PR and build it against existing, known good sites and verify that the sites are working as expected.

@sijie123

sijie123 commented Apr 9, 2019

Copy link
Copy Markdown
Contributor Author

@Chng-Zhi-Xuan there were some issues with the commit previously - i've updated and rebased. Please pull the changes again. Thanks!

@Chng-Zhi-Xuan

Copy link
Copy Markdown
Contributor

I will be testing in-depth more tomorrow due to time constraints.

Not immediately after requesting for review from myself :P

@acjh

acjh commented Apr 9, 2019

Copy link
Copy Markdown
Contributor

Do test with boilerplate files.

@yamgent

yamgent commented Apr 9, 2019

Copy link
Copy Markdown
Member

A brief comment from a quick glance... commit 11bb1e8 seems to be doing two different things at the same time (adding test + pushing the parsing of img and anchors to a later stage), is it not possible to split it up?

@sijie123

sijie123 commented Apr 10, 2019

Copy link
Copy Markdown
Contributor Author

A brief comment from a quick glance... commit 11bb1e8 seems to be doing two different things at the same time (adding test + pushing the parsing of img and anchors to a later stage), is it not possible to split it up?

Oh no. One of the commits disappeared / got squashed together... I'll fix that up in a bit. Thanks for the heads up.

Edit: It appears okay now.

@sijie123 sijie123 closed this Apr 10, 2019
@sijie123 sijie123 reopened this Apr 10, 2019
@sijie123 sijie123 force-pushed the opRelativeSrc branch 2 times, most recently from 2e39f3f to 40988ee Compare April 10, 2019 11:56
@Chng-Zhi-Xuan

Copy link
Copy Markdown
Contributor

So I've updated the test_site files to pass the tests.

Then I've notice the image links are broken for the 2 lines are are replaced here, in testReuse.md:
unexpected-behaviour

So the src is wrong and there is a href attached to those links.

Here is the line within testReuse.md. Can you take a look into this weird bug?

<pic src="images/I'm not allowed to use my favorite tool.png"></pic>
<pic src="{{imgFolder}}/I'm not allowed to use my favorite tool.png"></pic>

@sijie123 sijie123 force-pushed the opRelativeSrc branch 3 times, most recently from 3313e79 to 3635062 Compare April 11, 2019 07:17
Comment thread test/functional/test_site/expected/sub_site/testReuse._include_.html Outdated
@sijie123

Copy link
Copy Markdown
Contributor Author

Thanks @yamgent. That was dangerous - I had thought that the relative URLs in the testReuse.include.html was something expected from the new behaviour.
Lesson learnt: Always trust your tests :P

I've fixed the oversight. Now, included files (via panels) should be processed correctly too.

@sijie123 sijie123 requested a review from yamgent April 12, 2019 12:31

@yamgent yamgent left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit "Parser: Support context-aware includes"

This means that content in an included file must be valid wherever
it is included from. For example, resource references need to be
absolute paths, so that they remain valid wherever they are included
from.

Um, the England here doesn't make sense? :P Are you trying to say that
paths need to be valid regardless of wherever it is included from?

Comment thread src/lib/markbind/src/parser.js Outdated
Comment thread test/unit/parser.test.js
@sijie123

Copy link
Copy Markdown
Contributor Author

Commit "Parser: Support context-aware includes"

This means that content in an included file must be valid wherever
it is included from. For example, resource references need to be
absolute paths, so that they remain valid wherever they are included
from.

Um, the England here doesn't make sense? :P Are you trying to say that
paths need to be valid regardless of wherever it is included from?

I was trying to say something like if we have 2 files from different places, e.g.
/a/index.md and /index.md both includes /a/sub-folder/b.md, any reference in b.md needs to remain valid whether it ends up in /a/index.md or /index.md.
For example, if b.md refers to a css file located in /a/file.css, it cannot refer to it as file.css, because this reference is invalid from the perspective of /index.md.

I've updated the commit message and fixed the {{hostBaseUrl}} bug. I still don't really like my commit message, but I can't find a clearer way to express the idea above^

@yamgent

yamgent commented Apr 15, 2019

Copy link
Copy Markdown
Member

I put in the example that you gave, and also invented the word "host file", see if that makes it clearer. The first two paragraphs are modified.

After our parser includes a file, its context is discarded.
This means that in subsequent passes of parser, we do not know where
the content has been included from, and included content is treated
uniformly as if they were originally part of the host file.

The path of the references in the included file may not resolve
correctly if it belongs to a different directory from the host file.
For example, if /a/b/c.md refers to a file in '../file.css', and
/index.md is the host file that includes /a/b/c.md, then /index.md
should resolve this path to '/a/b/file.css'. It cannot retain the
original path '../file.css', as that will lead to the wrong css file.

To support context-aware references, i.e. the included segments
retain awareness of their original include locations, let's add
a data-included-from meta whenever we include files. To maintain
the existing behaviour and to prevent leaking file structure data,
this meta tag is removed before we finish generating the final
website.

@Chng-Zhi-Xuan

Chng-Zhi-Xuan commented Apr 15, 2019

Copy link
Copy Markdown
Contributor

I just pulled the latest branch. Tested and found this:

Both Pic and Img tags test have a stay C:/ within the src, which results in a broken link
bug

@sijie123

Copy link
Copy Markdown
Contributor Author

I just pulled the latest branch. Tested and found this:

Both Pic and Img tags test have a stay C:/ within the src, which results in a broken link
bug

Confirmed. I'll just have to add a couple more ensurePosix then :(
On a side note, we really have to get our AppVeyor up and running :P

@sijie123 sijie123 force-pushed the opRelativeSrc branch 2 times, most recently from 6f20e97 to c9213fb Compare April 15, 2019 12:47
@sijie123

Copy link
Copy Markdown
Contributor Author

I put in the example that you gave, and also invented the word "host file", see if that makes it clearer. The first two paragraphs are modified.

After our parser includes a file, its context is discarded.
This means that in subsequent passes of parser, we do not know where
the content has been included from, and included content is treated
uniformly as if they were originally part of the host file.

The path of the references in the included file may not resolve
correctly if it belongs to a different directory from the host file.
For example, if /a/b/c.md refers to a file in '../file.css', and
/index.md is the host file that includes /a/b/c.md, then /index.md
should resolve this path to '/a/b/file.css'. It cannot retain the
original path '../file.css', as that will lead to the wrong css file.

To support context-aware references, i.e. the included segments
retain awareness of their original include locations, let's add
a data-included-from meta whenever we include files. To maintain
the existing behaviour and to prevent leaking file structure data,
this meta tag is removed before we finish generating the final
website.

Yes! The use of "host file" made it much clearer. Thank you so much - I will adopt this commit message :D

I've also attempted a fix for the Windows bug @Chng-Zhi-Xuan by forcing it to use posix notation. I've tested it on a Windows VM and it looks fine now.

@sijie123 sijie123 requested a review from yamgent April 15, 2019 12:50
yamgent
yamgent previously approved these changes Apr 15, 2019
@yamgent yamgent added this to the v2.2.1 milestone Apr 15, 2019
@sijie123 sijie123 requested a review from yamgent April 15, 2019 17:14
@sijie123 sijie123 dismissed yamgent’s stale review April 15, 2019 17:14

Updated code base

@sijie123

sijie123 commented Apr 15, 2019

Copy link
Copy Markdown
Contributor Author

Thanks for reviewing @yamgent. Sorry I had to dismiss your review because I found an issue with authoringContents.md (and some CS2103T pages) that happened as a result of inline includes.
When we do inline includes, we use a span instead of a div. However, my implementation "broke" inline includes because I always use a div. I have now added checks: if it's an inline include, use a span; otherwise, use a div.
(The reason why I don't use span all the time is because somehow spans introduce extra <p></p> that cause the formatting to go wrong)

I've taken a quick glance at the resulting CS2103T and TE3201 websites, they look more fine now :P

I'll still need to update the documentation to reflect the change in our includes. Will do that tomorrow.

@Chng-Zhi-Xuan

Chng-Zhi-Xuan commented Apr 16, 2019

Copy link
Copy Markdown
Contributor

Hi @sijie123 Thanks for updating the PR, and doing additional tests.

It will be good to add a Pending Item or To-Do list either in the PR description or latest comment to inform reviewers that there are still items being added to the PR.

Just a minor QoL nit :P

@sijie123 sijie123 changed the title Support context-aware includes for images and anchors [WIP] Support context-aware includes for images and anchors Apr 16, 2019
@sijie123 sijie123 force-pushed the opRelativeSrc branch 2 times, most recently from 9597ebc to f108f1c Compare April 16, 2019 05:48
@sijie123 sijie123 changed the title [WIP] Support context-aware includes for images and anchors Support context-aware includes for images and anchors Apr 16, 2019
@sijie123

sijie123 commented Apr 16, 2019

Copy link
Copy Markdown
Contributor Author

Finally! I think I'm done with everything I set out to do...
I'll do one last test on a Windows VM, but in the meantime it's up for review. (Yup, looks good on a Windows VM)
Thanks @yamgent @Chng-Zhi-Xuan for your help!

@Chng-Zhi-Xuan Chng-Zhi-Xuan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested with panels and nested panels and seems fine so far 👍

Just a couple of comments, feel free to give your perspective if needed.

Comment thread src/lib/markbind/src/parser.js Outdated
Comment thread docs/userGuide/syntax/links.mbdf
After our parser includes a file, its context is discarded.
This means that in subsequent passes of parser, we do not know where
the content has been included from, and included content is treated
uniformly as if they were originally part of the host file.

The path of the references in the included file may not resolve
correctly if it belongs to a different directory from the host file.
For example, if /a/b/c.md refers to a file in '../file.css', and
/index.md is the host file that includes /a/b/c.md, then /index.md
should resolve this path to '/a/b/file.css'. It cannot retain the
original path '../file.css', as that will lead to the wrong css file.

To support context-aware references, i.e. the included segments
retain awareness of their original include locations, let's add
a data-included-from meta whenever we include files. To maintain
the existing behaviour and to prevent leaking file structure data,
this meta tag is removed before we finish generating the final
website.
MarkBind does not support relative references if they are written
using Markdown syntax. The existing MarkBind parser only processes
relative URLs once during the preprocess stage, during which
Markdown code has not yet been transcribed into html.

With context-based includes available in a previous commit, we can
now store the context from which files are included from. We can
then use this information to derive the absolute path of a resource
referenced by an included file.

Let's rewrite the support for relative URLs by processing all
such relative URLs after the site is generated. This way, we can
convert relative URLs in Markdown code to absolute URLs too.

With this new method of processing relative URLs, let's also remove
the old implementation.
As parser now supports context-aware includes, we need to update
our unit tests for parser to check that our data-included-from
tags are correct.

Let's update our unit tests in parser.test.js.
With context-aware includes, all relative URLs should now be with
respect to the file that contains the relative URLs, instead of
being with respect to the files that are expected to include them.

FilterTags needs to be updated so that the reference to Hiding
Tags is with respect to its location in userGuide/plugins/.
Our user guide does not mention how users should use relative URL
references. Now that parser fully supports relative URL references
for images and links, let's update our user guide to document
MarkBind's behavior towards relative references.

In particular, the intra-site links section has been updated with
an important note on how users should specify relative references,
while images and pictures section have been updated with a note
teaching users that they can now use relative references.
@sijie123

Copy link
Copy Markdown
Contributor Author

Updated with @Chng-Zhi-Xuan's comments. Thanks :))

@Chng-Zhi-Xuan Chng-Zhi-Xuan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 , don't notice any other glaring issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Markdown syntax for images doesn't work without {{ baseUrl }}

4 participants