Update Tokenizer to treat Markdown code as text instead of HTML by danielbrzn · Pull Request #1 · MarkBind/htmlparser2

danielbrzn · 2018-01-25T16:23:23Z

This fix allows Markdown code to contain '<' , '<=' without having it affect other HTML elements as it is now treated as a text element. Furthermore, no spaces are required when typing these symbols within the back ticks.

As such, inequalities like the above can be rendered normally as shown below.

Resolves MarkBind/markbind#101

acjh · 2018-01-25T16:41:27Z

 	this._ended = false;
 	this._xmlMode = !!(options && options.xmlMode);
 	this._decodeEntities = !!(options && options.decodeEntities);
+    this._isMarkdownCode = false;


Tabs vs spaces 😨

Let's be consistent with the rest of the file (tabs).

acjh · 2018-01-25T16:42:26Z

 	while(this._index < this._buffer.length && this._running){
 		var c = this._buffer.charAt(this._index);
+		// Detect Markdown code so that it is parsed as text instead of HTML
+		if (c === '`')


No space before opening parentheses 😢

Let's be consistent with L152 of the file (braces even for single line of code).

acjh · 2018-01-25T16:51:14Z

Off-topic: Add a white space around operators :)

x < y
x <= y

We don't have a JS coding standard but:

danielbrzn · 2018-01-25T17:14:35Z

Updated with the requested changes, somehow my WebStorm was set to indent with spaces and I didn't manage to catch the difference in the editor.

Thanks for the tip about the white space!

acjh · 2018-01-25T17:18:31Z


 Tokenizer.prototype._stateText = function(c){
-	if(c === "<"){
+	// parse open tags if it is not Markdown


Parse (capital P)

tag (singular)

acjh · 2018-01-25T17:22:26Z

 	while(this._index < this._buffer.length && this._running){
 		var c = this._buffer.charAt(this._index);
+		// Detect Markdown code so that it is parsed as text instead of HTML
+		if (c === '`') {


No spaces before/after parentheses.

- Allows Markdown code to contain '<' , '<=' without having it affect other HTML elements

acjh · 2018-01-26T06:20:03Z

We should treat <␣ and <= as text as well:

a < b
a <= b

This is reasonable: create a .html file with the above and open it in your browser (tested in Chrome).

danielbrzn · 2018-01-26T15:35:54Z

Seems like the first case is handled fine. I've modified Tokenizer.js to treat <= as text, but there's a peculiar bug with the beautifying process that uses js-beautify

x <= y will get beautified to x <=y. I've tried using this fix mentioned here but it doesn't work. I'd reckon that the beautifier thinks <= is a valid open tag. Any ideas on how I could fix this?

acjh · 2018-01-26T17:53:24Z

I've modified Tokenizer.js to treat <= as text, but there's a peculiar bug with the beautifying process that uses js-beautify

Can you commit and push, so we can attempt to repro?

Any ideas on how I could fix this?

Try updating js-beautify from 1.6.12 to 1.7.5 and see if the problem still exists.

danielbrzn · 2018-01-26T18:50:43Z

js-beautify is at version 1.7.5 and the problem still persists unfortunately.

acjh · 2018-01-26T19:06:27Z

No repro:

danielbrzn · 2018-01-27T04:56:20Z

Are you generating the site from a index.md or a index.html? I get the bug when it's a html file, but not when it's an md file.

acjh · 2018-01-27T05:38:11Z

Ah, I see that I suggested to "create a .html file" to see how the browser treats those strings.
Repro-ed when using markbind build with a .html file.

We don't have to solve that in this PR since:

it works with .md files which we're primarily concerned with,
it doesn't break anything, and
it's not caused by bad code in this PR.

So it's partial support for .html files: Given "a <= b", this PR gives "a <=b" instead of just "a".

acjh · 2018-01-27T05:39:58Z

 		this._sectionStart = this._index;
+	} else if(this._isInequality){
+		// Next character should be parsed normally
+		this._isInequality = !this._isInequality;


This should be this._isInequality = false; since it's not a toggle.

acjh · 2018-01-27T05:41:01Z

 Tokenizer.prototype._stateText = function(c){
-	if(c === "<"){
+	// Parse open tag if it is not Markdown and not part of an inequality
+	if(c === "<" && !this._isMarkdownCode && !this._isInequality){


Why is && !this._isInequality necessary?

This is such that the Tokenizer doesn't think that the < of a <= is the start of an open HTML tag.

acjh · 2018-01-27T05:42:55Z

+		} else if(c === '<'){
+			var nextChar = this._buffer.charAt(this._index + 1);
+			if(nextChar === '='){
+				this._isInequality = !this._isInequality;


Should this be this._isInequality = true;?

acjh · 2018-01-27T05:44:06Z

 	this._xmlMode = !!(options && options.xmlMode);
 	this._decodeEntities = !!(options && options.decodeEntities);
+	this._isMarkdownCode = false;
+	this._isInequality = false;


Reorder just these 2 in alphabetical order.

acjh · 2018-01-27T08:47:23Z

 		this._sectionStart = this._index;
+	} else if(this._isInequality){
+		// Next character should be parsed normally
+		this._isInequality = false;


Can this be the first if condition?

If it's the first if condition, this._isInequality would be set to false and then < would then be treated as a valid open tag

It won't enter the else if block though?

Woops, yes that's right. Will resolve it ASAP.

acjh · 2018-01-27T08:51:21Z

+			this._isMarkdownCode = !this._isMarkdownCode;
+		} else if(c === '<'){
+			var nextChar = this._buffer.charAt(this._index + 1);
+			if(nextChar === '='){


This needs a comment for consistency.

Index should also be checked: if(c === '<' && this._index + 1 < this._buffer.length){

Good point about the index, will do so.

Should the comment be inside the else if block or outside of it?

It can be inside if you add a section name.

acjh · 2018-01-27T09:16:16Z

+			if(nextChar === '='){
+				this._isInequality = true;
+			}
+		}


Add a newline before and after this entire block.

Maybe add a section name like the ones below.

By section name, do you mean changing '=' into something like EQUALS?

if(nextChar === EQUALS){ this._isInequality = true; }

I mean these: https://github.com/MarkBind/htmlparser2/pull/1/files#diff-00550ec11d6b5101df5a54c5fee7cc2eR670

Would special conditions be an appropriate section name?

Fine for now.

acjh · 2018-02-01T04:37:01Z

+	if(this._isInequality){
+		// Next character will be parsed normally
+		this._isInequality = false;
+	} else if(c === "<" && !this._isMarkdownCode && !this._isInequality){


&& !this._isInequality should be removed.

acjh · 2018-02-01T04:42:07Z

+		*	special conditions
+		*/
+		if(c === '`'){
+			// Detect Markdown code to be parsed as text


~~Detect~~ Toggle

acjh · 2018-02-01T04:45:05Z

+			this._isMarkdownCode = !this._isMarkdownCode;
+		} else if(c === '<' && this._index + 1 < this._buffer.length){
+			var nextChar = this._buffer.charAt(this._index + 1);
+			// Detect '<=' inequality to be parsed as text


~~Detect~~ Set

Also, move this comment into the if block.

danielbrzn · 2018-02-01T05:15:59Z

Made the necessary changes.

acjh · 2018-02-01T06:15:14Z

+				this._isInequality = true;
+			}
+		}
+


Hmm, this still looks out-of-place.

Let's introduce a new state MARKDOWN instead of tracking this._isMarkdownCode and this._isInequality.

Near top of file:

MARKDOWN = i++, TEXT = i++, // No change

In this function:

if(this._state === MARKDOWN) { this._stateMarkdown(c); } else if (this.state === TEXT) { this._stateText(c); // No change

Other functions:

Tokenizer.prototype._stateMarkdown = function(c){ if(c === '`'){ this._state = TEXT; } } Tokenizer.prototype._stateText = function(c){ if(c === '`'){ this._state = MARKDOWN; } else if(c === "<"){ let isInequality = (this._index + 1 < this._buffer.length) && this._buffer.charAt(this._index + 1) === '='; if(!isInequality){ if(this._index > this._sectionStart){ this._cbs.ontext(this._getSection()); } this._state = BEFORE_TAG_NAME; this._sectionStart = this._index; } } }

acjh · 2018-02-01T16:05:08Z


    i = 0,
-
+


Remove whitespace.

acjh · 2018-02-01T16:06:04Z

+	} else if(c === "<"){
+		var isInequality = (this._index + 1 < this._buffer.length) && (this._buffer.charAt(this._index + 1) === '=');
+		if(!isInequality){
+			if (this._index > this._sectionStart) {


No spaces before/after parentheses 😢

acjh · 2018-02-02T06:18:10Z

    xmlMap    = require("entities/maps/xml.json"),

    i = 0,
-


Restore newline (without whitespace).

You added whitespace again 😕

Sorry, fixed it now.

Gisonrg · 2018-02-02T14:57:06Z

Have we test the code block (```) case?

danielbrzn · 2018-02-02T16:37:48Z

Do you mean whether code block cases render as before?

Just tried this out, seems to be fine. Is there something else I should test?

In the current version of the CS2103 website however, this fix will cause the rest of the page to not render as intended as there's an extra backtick; specifically in this page under the code snippet where it says
//Solution below adpated from https://stackoverflow.com/a/16252290`

If this backtick is removed, the page renders as per normal.

damithc · 2018-02-03T02:01:40Z

In the current version of the CS2103 website however, this fix will cause the rest of the page to not render as intended as there's an extra backtick;

Removed the extra backtick.

Gisonrg

Great work :P

Let's patch Tokenizer to treat Markdown code as text instead of HTML. From MarkBind/htmlparser2#1: > This fix allows Markdown code to contain '<' , '<=' without having it > affect other HTML elements as it is now treated as a text element. > Furthermore, no spaces are required when typing these symbols within > the back ticks. > > As such, inequalities like the above can be rendered normally as > shown below. > > `x<y` > `<` > `<=` > `x<=y`

acjh requested changes Jan 25, 2018

View reviewed changes

danielbrzn force-pushed the markdown-parsing-fix branch from c3efc5a to 28804b6 Compare January 25, 2018 17:12

acjh reviewed Jan 25, 2018

View reviewed changes

danielbrzn force-pushed the markdown-parsing-fix branch from 28804b6 to 43fe9b5 Compare January 25, 2018 17:31

Update Tokenizer to treat Markdown code as text instead of HTML

3880c5c

- Allows Markdown code to contain '<' , '<=' without having it affect other HTML elements

danielbrzn force-pushed the markdown-parsing-fix branch from 43fe9b5 to 3880c5c Compare January 25, 2018 18:00

acjh approved these changes Jan 26, 2018

View reviewed changes

acjh requested a review from Gisonrg January 26, 2018 04:00

acjh requested changes Jan 27, 2018

View reviewed changes

danielbrzn force-pushed the markdown-parsing-fix branch from 38ac8da to 21ddebe Compare January 27, 2018 06:21

acjh requested changes Jan 27, 2018

View reviewed changes

danielbrzn force-pushed the markdown-parsing-fix branch from 21ddebe to ed1f971 Compare January 27, 2018 14:59

acjh requested changes Feb 1, 2018

View reviewed changes

danielbrzn force-pushed the markdown-parsing-fix branch from ed1f971 to c09d6ee Compare February 1, 2018 05:12

danielbrzn force-pushed the markdown-parsing-fix branch from c09d6ee to f11e76a Compare February 1, 2018 05:20

acjh reviewed Feb 1, 2018

View reviewed changes

acjh added this to the v3.10.0-markbind.1 milestone Feb 1, 2018

acjh mentioned this pull request Feb 1, 2018

Update package.json to point to htmlparser2 fork MarkBind/markbind#126

Merged

danielbrzn force-pushed the markdown-parsing-fix branch 2 times, most recently from 6943bc6 to 7aecd9b Compare February 1, 2018 15:34

danielbrzn force-pushed the markdown-parsing-fix branch from 7aecd9b to 369b0ba Compare February 1, 2018 15:37

acjh requested changes Feb 1, 2018

View reviewed changes

danielbrzn force-pushed the markdown-parsing-fix branch from 369b0ba to b348228 Compare February 1, 2018 16:41

acjh requested changes Feb 2, 2018

View reviewed changes

danielbrzn force-pushed the markdown-parsing-fix branch from b348228 to 89cde72 Compare February 2, 2018 07:18

Update Tokenizer to recognise inequalities and parse them as text

6e614fb

danielbrzn force-pushed the markdown-parsing-fix branch from 89cde72 to 6e614fb Compare February 2, 2018 14:55

acjh approved these changes Feb 3, 2018

View reviewed changes

Gisonrg approved these changes Feb 3, 2018

View reviewed changes

acjh merged commit 815b507 into MarkBind:master Feb 3, 2018

acjh mentioned this pull request Dec 7, 2019

Patch htmlparser2 instead of rely on MarkBind fork MarkBind/markbind#948

Merged

ang-zeyu mentioned this pull request Dec 30, 2020

Remove markdown - htmlparser2 patch MarkBind/markbind#1435

Merged

10 tasks

Conversation

danielbrzn commented Jan 25, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

acjh commented Jan 25, 2018

Uh oh!

danielbrzn commented Jan 25, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

acjh commented Jan 26, 2018

Uh oh!

danielbrzn commented Jan 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

acjh commented Jan 26, 2018

Uh oh!

danielbrzn commented Jan 26, 2018

Uh oh!

acjh commented Jan 26, 2018

Uh oh!

danielbrzn commented Jan 27, 2018

Uh oh!

acjh commented Jan 27, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danielbrzn commented Feb 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

acjh Feb 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

danielbrzn commented Jan 25, 2018 •

edited

Loading

danielbrzn commented Jan 25, 2018 •

edited

Loading

danielbrzn commented Jan 26, 2018 •

edited

Loading

danielbrzn commented Feb 1, 2018 •

edited

Loading

acjh Feb 1, 2018 •

edited

Loading