Added logging, changed some directory structure

This commit is contained in:
2018-01-13 21:33:40 -05:00
parent f079a5f067
commit 8e72ffb917
73656 changed files with 35284 additions and 53718 deletions

View File

@@ -0,0 +1,69 @@
## 1.5.1
* Fix: Qualified tag name emission in Serializer (GH [#79](https://github.com/inikulin/parse5/issues/79)).
## 1.5.0
* Add: Location info for the element start and end tags (by @sakagg).
## 1.4.2
* Fix: htmlparser2 tree adapter `DocumentType.data` property rendering (GH [#45](https://github.com/inikulin/parse5/issues/45)).
## 1.4.1
* Fix: Location info handling for the implicitly generated `<html>` and `<body>` elements (GH [#44](https://github.com/inikulin/parse5/issues/44)).
## 1.4.0
* Add: Parser [decodeHtmlEntities](https://github.com/inikulin/parse5#optionsdecodehtmlentities) option.
* Add: SimpleApiParser [decodeHtmlEntities](https://github.com/inikulin/parse5#optionsdecodehtmlentities-1) option.
* Add: Parser [locationInfo](https://github.com/inikulin/parse5#optionslocationinfo) option.
* Add: SimpleApiParser [locationInfo](https://github.com/inikulin/parse5#optionslocationinfo-1) option.
## 1.3.2
* Fix: `<form>` processing in `<template>` (GH [#40](https://github.com/inikulin/parse5/issues/40)).
## 1.3.1
* Fix: text node in `<template>` serialization problem with custom tree adapter (GH [#38](https://github.com/inikulin/parse5/issues/38)).
## 1.3.0
* Add: Serializer `encodeHtmlEntities` option.
## 1.2.0
* Add: `<template>` support
* `parseFragment` now uses `<template>` as default `contextElement`. This leads to the more "forgiving" parsing manner.
* `TreeSerializer` was renamed to `Serializer`. However, serializer is accessible as `parse5.TreeSerializer` for backward compatibility .
## 1.1.6
* Fix: apply latest changes to the `htmlparser2` tree format (DOM Level1 node emulation).
## 1.1.5
* Add: [jsdom](https://github.com/tmpvar/jsdom)-specific parser with scripting support. Undocumented for `jsdom` internal use only.
## 1.1.4
* Add: logo
* Fix: use fake `document` element for fragment parsing (required by [jsdom](https://github.com/tmpvar/jsdom)).
## 1.1.3
* Development files (e.g. `.travis.yml`, `.editorconfig`) are removed from NPM package.
## 1.1.2
* Fix: crash on Linux due to upper-case leading character in module name used in `require()`.
## 1.1.1
* Add: [SimpleApiParser](https://github.com/inikulin/parse5/#class-simpleapiparser).
* Fix: new line serialization in `<pre>`.
* Fix: `SYSTEM`-only `DOCTYPE` serialization.
* Fix: quotes serialization in `DOCTYPE` IDs.
## 1.0.0
* First stable release, switch to semantic versioning.
## 0.8.3
* Fix: siblings calculation bug in `appendChild` in `htmlparser2` tree adapter.
## 0.8.1
* Add: [TreeSerializer](https://github.com/inikulin/parse5/#class-serializer).
* Add: [htmlparser2 tree adapter](https://github.com/inikulin/parse5/#-treeadaptershtmlparser2).
## 0.6.1
* Fix: incorrect `<menuitem>` handling in `<body>`.
## 0.6.0
* Initial release.

View File

@@ -0,0 +1,19 @@
Copyright (c) 2013-2015 Ivan Nikulin (ifaaan@gmail.com)
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

View File

@@ -0,0 +1,247 @@
<p align="center">
<img src="https://raw.github.com/inikulin/parse5/master/logo.png" alt="parse5" />
</p>
[![Build Status](https://api.travis-ci.org/inikulin/parse5.svg)](https://travis-ci.org/inikulin/parse5)
[![npm](https://img.shields.io/npm/v/parse5.svg)](https://www.npmjs.com/package/parse5)
*WHATWG HTML5 specification-compliant, fast and ready for production HTML parsing/serialization toolset for Node and io.js.*
I needed fast and ready for production HTML parser, which will parse HTML as a modern browser's parser.
Existing solutions were either too slow or their output was too inaccurate. So, this is how parse5 was born.
**Included tools:**
* [Parser](#class-parser) - HTML to DOM-tree parser.
* [SimpleApiParser](#class-simpleapiparser) - [SAX](http://en.wikipedia.org/wiki/Simple_API_for_XML)-style parser for HTML.
* [Serializer](#class-serializer) - DOM-tree to HTML code serializer.
## Install
```
$ npm install parse5
```
## Usage
```js
var Parser = require('parse5').Parser;
//Instantiate parser
var parser = new Parser();
//Then feed it with an HTML document
var document = parser.parse('<!DOCTYPE html><html><head></head><body>Hi there!</body></html>')
//Now let's parse HTML-snippet
var fragment = parser.parseFragment('<title>Parse5 is &#102;&#117;&#99;&#107;ing awesome!</title><h1>42</h1>');
```
## Is it fast?
Check out [this benchmark](https://github.com/inikulin/node-html-parser-bench).
```
Starting benchmark. Fasten your seatbelts...
html5 (https://github.com/aredridel/html5) x 0.18 ops/sec ±5.92% (5 runs sampled)
htmlparser (https://github.com/tautologistics/node-htmlparser/) x 3.83 ops/sec ±42.43% (14 runs sampled)
htmlparser2 (https://github.com/fb55/htmlparser2) x 4.05 ops/sec ±39.27% (15 runs sampled)
parse5 (https://github.com/inikulin/parse5) x 3.04 ops/sec ±51.81% (13 runs sampled)
Fastest is htmlparser2 (https://github.com/fb55/htmlparser2),parse5 (https://github.com/inikulin/parse5)
```
So, parse5 is as fast as simple specification incompatible parsers and ~15-times(!) faster than the current specification compatible parser available for the node.
## API reference
### Enum: TreeAdapters
Provides built-in tree adapters which can be passed as an optional argument to the `Parser` and `Serializer` constructors.
#### &bull; TreeAdapters.default
Default tree format for parse5.
#### &bull; TreeAdapters.htmlparser2
Quite popular [htmlparser2](https://github.com/fb55/htmlparser2) tree format (e.g. used in [cheerio](https://github.com/MatthewMueller/cheerio) and [jsdom](https://github.com/tmpvar/jsdom)).
---------------------------------------
### Class: Parser
Provides HTML parsing functionality.
#### &bull; Parser.ctor([treeAdapter, options])
Creates new reusable instance of the `Parser`. Optional `treeAdapter` argument specifies resulting tree format. If `treeAdapter` argument is not specified, `default` tree adapter will be used.
`options` object provides the parsing algorithm modifications:
##### options.decodeHtmlEntities
Decode HTML-entities like `&amp;`, `&nbsp;`, etc. Default: `true`. **Warning:** disabling this option may cause output which is not conform HTML5 specification.
##### options.locationInfo
Enables source code location information for the nodes. Default: `false`. When enabled, each node (except root node) has `__location` property, which contains `start` and `end` indices of the node in the source code. If element was implicitly created by the parser it's `__location` property will be `null`. In case the node is not an empty element, `__location` has two addition properties `startTag` and `endTag` which contain location information for individual tags in a fashion similar to `__location` property.
*Example:*
```js
var parse5 = require('parse5');
//Instantiate new parser with default tree adapter
var parser1 = new parse5.Parser();
//Instantiate new parser with htmlparser2 tree adapter
var parser2 = new parse5.Parser(parse5.TreeAdapters.htmlparser2);
```
#### &bull; Parser.parse(html)
Parses specified `html` string. Returns `document` node.
*Example:*
```js
var document = parser.parse('<!DOCTYPE html><html><head></head><body>Hi there!</body></html>');
```
#### &bull; Parser.parseFragment(htmlFragment, [contextElement])
Parses given `htmlFragment`. Returns `documentFragment` node. Optional `contextElement` argument specifies context in which given `htmlFragment` will be parsed (consider it as setting `contextElement.innerHTML` property). If `contextElement` argument is not specified then `<template>` element will be used as a context and fragment will be parsed in 'forgiving' manner.
*Example:*
```js
var documentFragment = parser.parseFragment('<table></table>');
//Parse html fragment in context of the parsed <table> element
var trFragment = parser.parseFragment('<tr><td>Shake it, baby</td></tr>', documentFragment.childNodes[0]);
```
---------------------------------------
### Class: SimpleApiParser
Provides [SAX](https://en.wikipedia.org/wiki/Simple_API_for_XML)-style HTML parsing functionality.
#### &bull; SimpleApiParser.ctor(handlers, [options])
Creates new reusable instance of the `SimpleApiParser`. `handlers` argument specifies object that contains parser's event handlers. Possible events and their signatures are shown in the example.
`options` object provides the parsing algorithm modifications:
##### options.decodeHtmlEntities
Decode HTML-entities like `&amp;`, `&nbsp;`, etc. Default: `true`. **Warning:** disabling this option may cause output which is not conform HTML5 specification.
##### options.locationInfo
Enables source code location information for the tokens. Default: `false`. When enabled, each node handler receives `location` object as it's last argument. `location` object contains `start` and `end` indices of the token in the source code.
*Example:*
```js
var parse5 = require('parse5');
var parser = new parse5.SimpleApiParser({
doctype: function(name, publicId, systemId /*, [location] */) {
//Handle doctype here
},
startTag: function(tagName, attrs, selfClosing /*, [location] */) {
//Handle start tags here
},
endTag: function(tagName /*, [location] */) {
//Handle end tags here
},
text: function(text /*, [location] */) {
//Handle texts here
},
comment: function(text /*, [location] */) {
//Handle comments here
}
});
```
#### &bull; SimpleApiParser.parse(html)
Raises parser events for the given `html`.
*Example:*
```js
var parse5 = require('parse5');
var parser = new parse5.SimpleApiParser({
text: function(text) {
console.log(text);
}
});
parser.parse('<body>Yo!</body>');
```
---------------------------------------
### Class: Serializer
Provides tree-to-HTML serialization functionality.
**Note:** prior to v1.2.0 this class was called `TreeSerializer`. However, it's still accessible as `parse5.TreeSerializer` for backward compatibility.
#### &bull; Serializer.ctor([treeAdapter, options])
Creates new reusable instance of the `Serializer`. Optional `treeAdapter` argument specifies input tree format. If `treeAdapter` argument is not specified, `default` tree adapter will be used.
`options` object provides the serialization algorithm modifications:
##### options.encodeHtmlEntities
HTML-encode characters like `<`, `>`, `&`, etc. Default: `true`. **Warning:** disabling this option may cause output which is not conform HTML5 specification.
*Example:*
```js
var parse5 = require('parse5');
//Instantiate new serializer with default tree adapter
var serializer1 = new parse5.Serializer();
//Instantiate new serializer with htmlparser2 tree adapter
var serializer2 = new parse5.Serializer(parse5.TreeAdapters.htmlparser2);
```
#### &bull; Serializer.serialize(node)
Serializes the given `node`. Returns HTML string.
*Example:*
```js
var document = parser.parse('<!DOCTYPE html><html><head></head><body>Hi there!</body></html>');
//Serialize document
var html = serializer.serialize(document);
//Serialize <body> element content
var bodyInnerHtml = serializer.serialize(document.childNodes[0].childNodes[1]);
```
---------------------------------------
## Testing
Test data is adopted from [html5lib project](https://github.com/html5lib). Parser is covered by more than 8000 test cases.
To run tests:
```
$ npm test
```
## Custom tree adapter
You can create a custom tree adapter so parse5 can work with your own DOM-tree implementation.
Just pass your adapter implementation to the parser's constructor as an argument:
```js
var Parser = require('parse5').Parser;
var myTreeAdapter = {
//Adapter methods...
};
//Instantiate parser
var parser = new Parser(myTreeAdapter);
```
Sample implementation can be found [here](https://github.com/inikulin/parse5/blob/master/lib/tree_adapters/default.js).
The custom tree adapter should implement all methods exposed via `exports` in the sample implementation.
## Questions or suggestions?
If you have any questions, please feel free to create an issue [here on github](https://github.com/inikulin/parse5/issues).
## Author
[Ivan Nikulin](https://github.com/inikulin) (ifaaan@gmail.com)

View File

@@ -0,0 +1,963 @@
<p align="center">
<img src="https://raw.github.com/inikulin/parse5/master/docs/logo.png" alt="parse5" />
</p>
<p align="center">
<a href="https://www.npmjs.com/package/parse5"><img alt="NPM Version" src="https://img.shields.io/npm/v/parse5.svg"></a>
</p>
<p align="center">
<i>WHATWG HTML5 specification-compliant, fast and ready for production HTML parsing/serialization toolset for Node.</i>
</p>
<b><i>parse5</i></b> contains nearly everything what you will need to deal with the HTML. It's the fastest spec-compliant HTML parser
for Node to the date and will parse HTML the way the latest version of your browser does. It's stable and used
by such projects as [jsdom](https://github.com/tmpvar/jsdom), [Angular2](https://github.com/angular/angular),
[Polymer](https://www.polymer-project.org/1.0/) and many more.
# Table of contents
* [Install](#install)
* [Usage](#usage)
* [API Reference](#api-reference)
* [FAQ](#faq)
* [Version history](#version-history)
* [License](#license-and-author-information)
# Install
```
$ npm install parse5
```
# Usage
```js
var parse5 = require('parse5');
var document = parse5.parse('<!DOCTYPE html><html><body>Hi there!</body></html>');
var documentHtml = parse5.serialize(document);
var fragment = parse5.parseFragment('<td>Yo!</td>');
var fragmentHtml = parse5.serialize(fragment);
```
For more advanced examples see [API reference](#api-reference) and [FAQ](#faq).
# API Reference
## Objects
<dl>
<dt><a href="#parse5">parse5</a> : <code>object</code></dt>
<dd></dd>
</dl>
## Typedefs
<dl>
<dt><a href="#ElementLocationInfo">ElementLocationInfo</a> : <code>Object</code></dt>
<dd></dd>
<dt><a href="#LocationInfo">LocationInfo</a> : <code>Object</code></dt>
<dd></dd>
<dt><a href="#ParserOptions">ParserOptions</a> : <code>Object</code></dt>
<dd></dd>
<dt><a href="#SAXParserOptions">SAXParserOptions</a> : <code>Object</code></dt>
<dd></dd>
<dt><a href="#SerializerOptions">SerializerOptions</a> : <code>Object</code></dt>
<dd></dd>
<dt><a href="#TreeAdapter">TreeAdapter</a> : <code>Object</code></dt>
<dd></dd>
</dl>
<a name="parse5"></a>
## parse5 : <code>object</code>
**Kind**: global namespace
* [parse5](#parse5) : <code>object</code>
* [.ParserStream](#parse5+ParserStream) ⇐ <code>stream.Writable</code>
* [new ParserStream(options)](#new_parse5+ParserStream_new)
* [.document](#parse5+ParserStream+document) : <code>ASTNode.&lt;document&gt;</code>
* ["script" (scriptElement, documentWrite(html), resume)](#parse5+ParserStream+event_script)
* [.SAXParser](#parse5+SAXParser) ⇐ <code>stream.Transform</code>
* [new SAXParser(options)](#new_parse5+SAXParser_new)
* [.stop()](#parse5+SAXParser+stop)
* ["startTag" (name, attributes, selfClosing, [location])](#parse5+SAXParser+event_startTag)
* ["endTag" (name, [location])](#parse5+SAXParser+event_endTag)
* ["comment" (text, [location])](#parse5+SAXParser+event_comment)
* ["doctype" (name, publicId, systemId, [location])](#parse5+SAXParser+event_doctype)
* ["text" (text, [location])](#parse5+SAXParser+event_text)
* [.SerializerStream](#parse5+SerializerStream) ⇐ <code>stream.Readable</code>
* [new SerializerStream(node, [options])](#new_parse5+SerializerStream_new)
* [.treeAdapters](#parse5+treeAdapters)
* [.parse(html, [options])](#parse5+parse) ⇒ <code>ASTNode.&lt;Document&gt;</code>
* [.parseFragment([fragmentContext], html, [options])](#parse5+parseFragment) ⇒ <code>ASTNode.&lt;DocumentFragment&gt;</code>
* [.serialize(node, [options])](#parse5+serialize) ⇒ <code>String</code>
<a name="parse5+ParserStream"></a>
### parse5.ParserStream ⇐ <code>stream.Writable</code>
**Kind**: instance class of <code>[parse5](#parse5)</code>
**Extends:** <code>stream.Writable</code>
* [.ParserStream](#parse5+ParserStream) ⇐ <code>stream.Writable</code>
* [new ParserStream(options)](#new_parse5+ParserStream_new)
* [.document](#parse5+ParserStream+document) : <code>ASTNode.&lt;document&gt;</code>
* ["script" (scriptElement, documentWrite(html), resume)](#parse5+ParserStream+event_script)
<a name="new_parse5+ParserStream_new"></a>
#### new ParserStream(options)
Streaming HTML parser with the scripting support.
[Writable stream](https://nodejs.org/api/stream.html#stream_class_stream_writable).
| Param | Type | Description |
| --- | --- | --- |
| options | <code>[ParserOptions](#ParserOptions)</code> | Parsing options. |
**Example**
```js
var parse5 = require('parse5');
var http = require('http');
// Fetch google.com content and obtain it's <body> node
http.get('http://google.com', function(res) {
var parser = new parse5.ParserStream();
parser.on('finish', function() {
var body = parser.document.childNodes[0].childNodes[1];
});
res.pipe(parser);
});
```
<a name="parse5+ParserStream+document"></a>
#### parserStream.document : <code>ASTNode.&lt;document&gt;</code>
Resulting document node.
**Kind**: instance property of <code>[ParserStream](#parse5+ParserStream)</code>
<a name="parse5+ParserStream+event_script"></a>
#### "script" (scriptElement, documentWrite(html), resume)
Raised then parser encounters `<script>` element.
If event has listeners then parsing will be suspended on event emission.
So, if `<script>` has `src` attribute you can fetch it, execute and then
resume parser like browsers do.
**Kind**: event emitted by <code>[ParserStream](#parse5+ParserStream)</code>
| Param | Type | Description |
| --- | --- | --- |
| scriptElement | <code>ASTNode</code> | Script element that caused the event. |
| documentWrite(html) | <code>function</code> | Write additional `html` at the current parsing position. Suitable for the DOM `document.write` and `document.writeln` methods implementation. |
| resume | <code>function</code> | Resumes the parser. |
**Example**
```js
var parse = require('parse5');
var http = require('http');
var parser = new parse5.ParserStream();
parser.on('script', function(scriptElement, documentWrite, resume) {
var src = parse5.treeAdapters.default.getAttrList(scriptElement)[0].value;
http.get(src, function(res) {
// Fetch script content, execute it with DOM built around `parser.document` and
// `document.write` implemented using `documentWrite`
...
// Then resume the parser
resume();
});
});
parser.end('<script src="example.com/script.js"></script>');
```
<a name="parse5+SAXParser"></a>
### parse5.SAXParser ⇐ <code>stream.Transform</code>
**Kind**: instance class of <code>[parse5](#parse5)</code>
**Extends:** <code>stream.Transform</code>
* [.SAXParser](#parse5+SAXParser) ⇐ <code>stream.Transform</code>
* [new SAXParser(options)](#new_parse5+SAXParser_new)
* [.stop()](#parse5+SAXParser+stop)
* ["startTag" (name, attributes, selfClosing, [location])](#parse5+SAXParser+event_startTag)
* ["endTag" (name, [location])](#parse5+SAXParser+event_endTag)
* ["comment" (text, [location])](#parse5+SAXParser+event_comment)
* ["doctype" (name, publicId, systemId, [location])](#parse5+SAXParser+event_doctype)
* ["text" (text, [location])](#parse5+SAXParser+event_text)
<a name="new_parse5+SAXParser_new"></a>
#### new SAXParser(options)
Streaming [SAX](https://en.wikipedia.org/wiki/Simple_API_for_XML)-style HTML parser.
[Transform stream](https://nodejs.org/api/stream.html#stream_class_stream_transform)
(which means you can pipe *through* it, see example).
| Param | Type | Description |
| --- | --- | --- |
| options | <code>[SAXParserOptions](#SAXParserOptions)</code> | Parsing options. |
**Example**
```js
var parse5 = require('parse5');
var http = require('http');
var fs = require('fs');
var file = fs.createWriteStream('/home/google.com.html');
var parser = new SAXParser();
parser.on('text', function(text) {
// Handle page text content
...
});
http.get('http://google.com', function(res) {
// SAXParser is the Transform stream, which means you can pipe
// through it. So you can analyze page content and e.g. save it
// to the file at the same time:
res.pipe(parser).pipe(file);
});
```
<a name="parse5+SAXParser+stop"></a>
#### saxParser.stop()
Stops parsing. Useful if you want parser to stop consume
CPU time once you've obtained desired info from input stream.
Doesn't prevents piping, so data will flow through parser as usual.
**Kind**: instance method of <code>[SAXParser](#parse5+SAXParser)</code>
**Example**
```js
var parse5 = require('parse5');
var http = require('http');
var fs = require('fs');
var file = fs.createWriteStream('/home/google.com.html');
var parser = new parse5.SAXParser();
parser.on('doctype', function(name, publicId, systemId) {
// Process doctype info ans stop parsing
...
parser.stop();
});
http.get('http://google.com', function(res) {
// Despite the fact that parser.stop() was called whole
// content of the page will be written to the file
res.pipe(parser).pipe(file);
});
```
<a name="parse5+SAXParser+event_startTag"></a>
#### "startTag" (name, attributes, selfClosing, [location])
Raised then parser encounters start tag.
**Kind**: event emitted by <code>[SAXParser](#parse5+SAXParser)</code>
| Param | Type | Description |
| --- | --- | --- |
| name | <code>String</code> | Tag name. |
| attributes | <code>String</code> | List of attributes in `{ key: String, value: String }` form. |
| selfClosing | <code>Boolean</code> | Indicates if tag is self-closing. |
| [location] | <code>[LocationInfo](#LocationInfo)</code> | Start tag source code location info. Available if location info is enabled in [SAXParserOptions](#SAXParserOptions). |
<a name="parse5+SAXParser+event_endTag"></a>
#### "endTag" (name, [location])
Raised then parser encounters end tag.
**Kind**: event emitted by <code>[SAXParser](#parse5+SAXParser)</code>
| Param | Type | Description |
| --- | --- | --- |
| name | <code>String</code> | Tag name. |
| [location] | <code>[LocationInfo](#LocationInfo)</code> | End tag source code location info. Available if location info is enabled in [SAXParserOptions](#SAXParserOptions). |
<a name="parse5+SAXParser+event_comment"></a>
#### "comment" (text, [location])
Raised then parser encounters comment.
**Kind**: event emitted by <code>[SAXParser](#parse5+SAXParser)</code>
| Param | Type | Description |
| --- | --- | --- |
| text | <code>String</code> | Comment text. |
| [location] | <code>[LocationInfo](#LocationInfo)</code> | Comment source code location info. Available if location info is enabled in [SAXParserOptions](#SAXParserOptions). |
<a name="parse5+SAXParser+event_doctype"></a>
#### "doctype" (name, publicId, systemId, [location])
Raised then parser encounters [document type declaration](https://en.wikipedia.org/wiki/Document_type_declaration).
**Kind**: event emitted by <code>[SAXParser](#parse5+SAXParser)</code>
| Param | Type | Description |
| --- | --- | --- |
| name | <code>String</code> | Document type name. |
| publicId | <code>String</code> | Document type public identifier. |
| systemId | <code>String</code> | Document type system identifier. |
| [location] | <code>[LocationInfo](#LocationInfo)</code> | Document type declaration source code location info. Available if location info is enabled in [SAXParserOptions](#SAXParserOptions). |
<a name="parse5+SAXParser+event_text"></a>
#### "text" (text, [location])
Raised then parser encounters text content.
**Kind**: event emitted by <code>[SAXParser](#parse5+SAXParser)</code>
| Param | Type | Description |
| --- | --- | --- |
| text | <code>String</code> | Text content. |
| [location] | <code>[LocationInfo](#LocationInfo)</code> | Text content code location info. Available if location info is enabled in [SAXParserOptions](#SAXParserOptions). |
<a name="parse5+SerializerStream"></a>
### parse5.SerializerStream ⇐ <code>stream.Readable</code>
**Kind**: instance class of <code>[parse5](#parse5)</code>
**Extends:** <code>stream.Readable</code>
<a name="new_parse5+SerializerStream_new"></a>
#### new SerializerStream(node, [options])
Streaming AST node to HTML serializer.
[Readable stream](https://nodejs.org/api/stream.html#stream_class_stream_readable).
| Param | Type | Description |
| --- | --- | --- |
| node | <code>ASTNode</code> | Node to serialize. |
| [options] | <code>[SerializerOptions](#SerializerOptions)</code> | Serialization options. |
**Example**
```js
var parse5 = require('parse5');
var fs = require('fs');
var file = fs.createWriteStream('/home/index.html');
// Serialize parsed document to the HTML and write it to file
var document = parse5.parse('<body>Who is John Galt?</body>');
var serializer = new parse5.SerializerStream(document);
serializer.pipe(file);
```
<a name="parse5+treeAdapters"></a>
### parse5.treeAdapters
Provides built-in tree adapters which can be used for parsing and serialization.
**Kind**: instance property of <code>[parse5](#parse5)</code>
**Properties**
| Name | Type | Description |
| --- | --- | --- |
| default | <code>[TreeAdapter](#TreeAdapter)</code> | Default tree format for parse5. |
| htmlparser2 | <code>[TreeAdapter](#TreeAdapter)</code> | Quite popular [htmlparser2](https://github.com/fb55/htmlparser2) tree format (e.g. used by [cheerio](https://github.com/MatthewMueller/cheerio) and [jsdom](https://github.com/tmpvar/jsdom)). |
**Example**
```js
var parse5 = require('parse5');
// Use default tree adapter for parsing
var document = parse5.parse('<div></div>', { treeAdapter: parse5.treeAdapters.default });
// Use htmlparser2 tree adapter with SerializerStream
var serializer = new parse5.SerializerStream(node, { treeAdapter: parse5.treeAdapters.htmlparser2 });
```
<a name="parse5+parse"></a>
### parse5.parse(html, [options]) ⇒ <code>ASTNode.&lt;Document&gt;</code>
Parses HTML string.
**Kind**: instance method of <code>[parse5](#parse5)</code>
**Returns**: <code>ASTNode.&lt;Document&gt;</code> - document
| Param | Type | Description |
| --- | --- | --- |
| html | <code>string</code> | Input HTML string. |
| [options] | <code>[ParserOptions](#ParserOptions)</code> | Parsing options. |
**Example**
```js
var parse5 = require('parse5');
var document = parse5.parse('<!DOCTYPE html><html><head></head><body>Hi there!</body></html>');
```
<a name="parse5+parseFragment"></a>
### parse5.parseFragment([fragmentContext], html, [options]) ⇒ <code>ASTNode.&lt;DocumentFragment&gt;</code>
Parses HTML fragment.
**Kind**: instance method of <code>[parse5](#parse5)</code>
**Returns**: <code>ASTNode.&lt;DocumentFragment&gt;</code> - documentFragment
| Param | Type | Description |
| --- | --- | --- |
| [fragmentContext] | <code>ASTNode</code> | Parsing context element. If specified, given fragment will be parsed as if it was set to the context element's `innerHTML` property. |
| html | <code>string</code> | Input HTML fragment string. |
| [options] | <code>[ParserOptions](#ParserOptions)</code> | Parsing options. |
**Example**
```js
var parse5 = require('parse5');
var documentFragment = parse5.parseFragment('<table></table>');
//Parse html fragment in context of the parsed <table> element
var trFragment = parser.parseFragment(documentFragment.childNodes[0], '<tr><td>Shake it, baby</td></tr>');
```
<a name="parse5+serialize"></a>
### parse5.serialize(node, [options]) ⇒ <code>String</code>
Serializes AST node to HTML string.
**Kind**: instance method of <code>[parse5](#parse5)</code>
**Returns**: <code>String</code> - html
| Param | Type | Description |
| --- | --- | --- |
| node | <code>ASTNode</code> | Node to serialize. |
| [options] | <code>[SerializerOptions](#SerializerOptions)</code> | Serialization options. |
**Example**
```js
var parse5 = require('parse5');
var document = parse5.parse('<!DOCTYPE html><html><head></head><body>Hi there!</body></html>');
//Serialize document
var html = parse5.serialize(document);
//Serialize <body> element content
var bodyInnerHtml = parse5.serialize(document.childNodes[0].childNodes[1]);
```
<a name="ElementLocationInfo"></a>
## ElementLocationInfo : <code>Object</code>
**Kind**: global typedef
**Extends:** <code>[LocationInfo](#LocationInfo)</code>
**Properties**
| Name | Type | Description |
| --- | --- | --- |
| startTag | <code>[LocationInfo](#LocationInfo)</code> | Element's start tag [LocationInfo](#LocationInfo). |
| endTag | <code>[LocationInfo](#LocationInfo)</code> | Element's end tag [LocationInfo](#LocationInfo). |
<a name="LocationInfo"></a>
## LocationInfo : <code>Object</code>
**Kind**: global typedef
**Properties**
| Name | Type | Description |
| --- | --- | --- |
| line | <code>Number</code> | One-based line index |
| col | <code>Number</code> | One-based column index |
| startOffset | <code>Number</code> | Zero-based first character index |
| endOffset | <code>Number</code> | Zero-based last character index |
<a name="ParserOptions"></a>
## ParserOptions : <code>Object</code>
**Kind**: global typedef
**Properties**
| Name | Type | Default | Description |
| --- | --- | --- | --- |
| locationInfo | <code>Boolean</code> | <code>false</code> | Enables source code location information for the nodes. When enabled, each node (except root node) has `__location` property. In case the node is not an empty element, `__location` will be [ElementLocationInfo](#ElementLocationInfo) object, otherwise it's [LocationInfo](#LocationInfo). If element was implicitly created by the parser it's `__location` property will be `null`. |
| treeAdapter | <code>[TreeAdapter](#TreeAdapter)</code> | <code>parse5.treeAdapters.default</code> | Specifies resulting tree format. |
<a name="SAXParserOptions"></a>
## SAXParserOptions : <code>Object</code>
**Kind**: global typedef
**Properties**
| Name | Type | Default | Description |
| --- | --- | --- | --- |
| locationInfo | <code>Boolean</code> | <code>false</code> | Enables source code location information for the tokens. When enabled, each token event handler will receive [LocationInfo](#LocationInfo) object as the last argument. |
<a name="SerializerOptions"></a>
## SerializerOptions : <code>Object</code>
**Kind**: global typedef
**Properties**
| Name | Type | Default | Description |
| --- | --- | --- | --- |
| treeAdapter | <code>[TreeAdapter](#TreeAdapter)</code> | <code>parse5.treeAdapters.default</code> | Specifies input tree format. |
<a name="TreeAdapter"></a>
## TreeAdapter : <code>Object</code>
**Kind**: global typedef
* [TreeAdapter](#TreeAdapter) : <code>Object</code>
* [.createDocument()](#TreeAdapter.createDocument) ⇒ <code>ASTNode.&lt;Document&gt;</code>
* [.createDocumentFragment()](#TreeAdapter.createDocumentFragment) ⇒ <code>ASTNode.&lt;DocumentFragment&gt;</code>
* [.createElement(tagName, namespaceURI, attrs)](#TreeAdapter.createElement) ⇒ <code>ASTNode.&lt;Element&gt;</code>
* [.createElement(data)](#TreeAdapter.createElement) ⇒ <code>ASTNode.&lt;CommentNode&gt;</code>
* [.setDocumentType(document, name, publicId, systemId)](#TreeAdapter.setDocumentType)
* [.setQuirksMode(document)](#TreeAdapter.setQuirksMode)
* [.setQuirksMode(document)](#TreeAdapter.setQuirksMode) ⇒ <code>Boolean</code>
* [.detachNode(node)](#TreeAdapter.detachNode)
* [.insertText(parentNode, text)](#TreeAdapter.insertText)
* [.insertTextBefore(parentNode, text, referenceNode)](#TreeAdapter.insertTextBefore)
* [.adoptAttributes(recipientNode, attrs)](#TreeAdapter.adoptAttributes)
* [.getFirstChild(node)](#TreeAdapter.getFirstChild) ⇒ <code>ASTNode</code>
* [.getChildNodes(node)](#TreeAdapter.getChildNodes) ⇒ <code>Array</code>
* [.getParentNode(node)](#TreeAdapter.getParentNode) ⇒ <code>ASTNode</code>
* [.getAttrList(node)](#TreeAdapter.getAttrList) ⇒ <code>Array</code>
* [.getTagName(element)](#TreeAdapter.getTagName) ⇒ <code>String</code>
* [.getNamespaceURI(element)](#TreeAdapter.getNamespaceURI) ⇒ <code>String</code>
* [.getTextNodeContent(textNode)](#TreeAdapter.getTextNodeContent) ⇒ <code>String</code>
* [.getTextNodeContent(commentNode)](#TreeAdapter.getTextNodeContent) ⇒ <code>String</code>
* [.getDocumentTypeNodeName(doctypeNode)](#TreeAdapter.getDocumentTypeNodeName) ⇒ <code>String</code>
* [.getDocumentTypeNodePublicId(doctypeNode)](#TreeAdapter.getDocumentTypeNodePublicId) ⇒ <code>String</code>
* [.getDocumentTypeNodeSystemId(doctypeNode)](#TreeAdapter.getDocumentTypeNodeSystemId) ⇒ <code>String</code>
* [.isTextNode(node)](#TreeAdapter.isTextNode) ⇒ <code>Boolean</code>
* [.isCommentNode(node)](#TreeAdapter.isCommentNode) ⇒ <code>Boolean</code>
* [.isDocumentTypeNode(node)](#TreeAdapter.isDocumentTypeNode) ⇒ <code>Boolean</code>
* [.isElementNode(node)](#TreeAdapter.isElementNode) ⇒ <code>Boolean</code>
<a name="TreeAdapter.createDocument"></a>
### TreeAdapter.createDocument() ⇒ <code>ASTNode.&lt;Document&gt;</code>
Creates document node
**Kind**: static method of <code>[TreeAdapter](#TreeAdapter)</code>
**Returns**: <code>ASTNode.&lt;Document&gt;</code> - document
**See**: [default implementation.](https://github.com/inikulin/parse5/blob/tree-adapter-docs-rev/lib/tree_adapters/default.js#L19)
<a name="TreeAdapter.createDocumentFragment"></a>
### TreeAdapter.createDocumentFragment() ⇒ <code>ASTNode.&lt;DocumentFragment&gt;</code>
Creates document fragment node
**Kind**: static method of <code>[TreeAdapter](#TreeAdapter)</code>
**Returns**: <code>ASTNode.&lt;DocumentFragment&gt;</code> - fragment
**See**: [default implementation.](https://github.com/inikulin/parse5/blob/tree-adapter-docs-rev/lib/tree_adapters/default.js#L37)
<a name="TreeAdapter.createElement"></a>
### TreeAdapter.createElement(tagName, namespaceURI, attrs) ⇒ <code>ASTNode.&lt;Element&gt;</code>
Creates element node
**Kind**: static method of <code>[TreeAdapter](#TreeAdapter)</code>
**Returns**: <code>ASTNode.&lt;Element&gt;</code> - element
**See**: [default implementation.](https://github.com/inikulin/parse5/blob/tree-adapter-docs-rev/lib/tree_adapters/default.js#L61)
| Param | Type | Description |
| --- | --- | --- |
| tagName | <code>String</code> | Tag name of the element. |
| namespaceURI | <code>String</code> | Namespace of the element. |
| attrs | <code>Array</code> | Attribute name-value pair array. Foreign attributes may contain `namespace` and `prefix` fields as well. |
<a name="TreeAdapter.createElement"></a>
### TreeAdapter.createElement(data) ⇒ <code>ASTNode.&lt;CommentNode&gt;</code>
Creates comment node
**Kind**: static method of <code>[TreeAdapter](#TreeAdapter)</code>
**Returns**: <code>ASTNode.&lt;CommentNode&gt;</code> - comment
**See**: [default implementation.](https://github.com/inikulin/parse5/blob/tree-adapter-docs-rev/lib/tree_adapters/default.js#L85)
| Param | Type | Description |
| --- | --- | --- |
| data | <code>String</code> | Comment text. |
<a name="TreeAdapter.setDocumentType"></a>
### TreeAdapter.setDocumentType(document, name, publicId, systemId)
Sets document type. If `document` already have document type node in it then
`name`, `publicId` and `systemId` properties of the node will be updated with
the provided values. Otherwise, creates new document type node with the given
properties and inserts it into `document`.
**Kind**: static method of <code>[TreeAdapter](#TreeAdapter)</code>
**See**: [default implementation.](https://github.com/inikulin/parse5/blob/tree-adapter-docs-rev/lib/tree_adapters/default.js#L131)
| Param | Type | Description |
| --- | --- | --- |
| document | <code>ASTNode.&lt;Document&gt;</code> | Document node. |
| name | <code>String</code> | Document type name. |
| publicId | <code>String</code> | Document type public identifier. |
| systemId | <code>String</code> | Document type system identifier. |
<a name="TreeAdapter.setQuirksMode"></a>
### TreeAdapter.setQuirksMode(document)
Sets document quirks mode flag.
**Kind**: static method of <code>[TreeAdapter](#TreeAdapter)</code>
**See**: [default implementation.](https://github.com/inikulin/parse5/blob/tree-adapter-docs-rev/lib/tree_adapters/default.js#L167)
| Param | Type | Description |
| --- | --- | --- |
| document | <code>ASTNode.&lt;Document&gt;</code> | Document node. |
<a name="TreeAdapter.setQuirksMode"></a>
### TreeAdapter.setQuirksMode(document) ⇒ <code>Boolean</code>
Determines if document quirks mode flag is set.
**Kind**: static method of <code>[TreeAdapter](#TreeAdapter)</code>
**See**: [default implementation.](https://github.com/inikulin/parse5/blob/tree-adapter-docs-rev/lib/tree_adapters/default.js#L183)
| Param | Type | Description |
| --- | --- | --- |
| document | <code>ASTNode.&lt;Document&gt;</code> | Document node. |
<a name="TreeAdapter.detachNode"></a>
### TreeAdapter.detachNode(node)
Removes node from it's parent.
**Kind**: static method of <code>[TreeAdapter](#TreeAdapter)</code>
**See**: [default implementation.](https://github.com/inikulin/parse5/blob/tree-adapter-docs-rev/lib/tree_adapters/default.js#L197)
| Param | Type | Description |
| --- | --- | --- |
| node | <code>ASTNode</code> | Node. |
<a name="TreeAdapter.insertText"></a>
### TreeAdapter.insertText(parentNode, text)
Inserts text into node. If the last child of the node is the text node then
provided text will be appended to the text node content. Otherwise, inserts
new text node with the given text.
**Kind**: static method of <code>[TreeAdapter](#TreeAdapter)</code>
**See**: [default implementation.](https://github.com/inikulin/parse5/blob/tree-adapter-docs-rev/lib/tree_adapters/default.js#L220)
| Param | Type | Description |
| --- | --- | --- |
| parentNode | <code>ASTNode</code> | Node to insert text into. |
| text | <code>String</code> | Text to insert. |
<a name="TreeAdapter.insertTextBefore"></a>
### TreeAdapter.insertTextBefore(parentNode, text, referenceNode)
Inserts text into node before the referenced child node. If node before the
referenced child node is the text node then provided text will be appended
to the text node content. Otherwise, inserts new text node with the given text
before the referenced child node.
**Kind**: static method of <code>[TreeAdapter](#TreeAdapter)</code>
**See**: [default implementation.](https://github.com/inikulin/parse5/blob/tree-adapter-docs-rev/lib/tree_adapters/default.js#L249)
| Param | Type | Description |
| --- | --- | --- |
| parentNode | <code>ASTNode</code> | Node to insert text into. |
| text | <code>String</code> | Text to insert. |
| referenceNode | <code>ASTNode</code> | Node to insert text before. |
<a name="TreeAdapter.adoptAttributes"></a>
### TreeAdapter.adoptAttributes(recipientNode, attrs)
Copies attributes to the given node. Only those nodes
which are not yet present in the node are copied.
**Kind**: static method of <code>[TreeAdapter](#TreeAdapter)</code>
**See**: [default implementation.](https://github.com/inikulin/parse5/blob/tree-adapter-docs-rev/lib/tree_adapters/default.js#L270)
| Param | Type | Description |
| --- | --- | --- |
| recipientNode | <code>ASTNode</code> | Node to copy attributes into. |
| attrs | <code>Array</code> | Attributes to copy. |
<a name="TreeAdapter.getFirstChild"></a>
### TreeAdapter.getFirstChild(node) ⇒ <code>ASTNode</code>
Returns first child of the given node.
**Kind**: static method of <code>[TreeAdapter](#TreeAdapter)</code>
**Returns**: <code>ASTNode</code> - firstChild
**See**: [default implementation.](https://github.com/inikulin/parse5/blob/tree-adapter-docs-rev/lib/tree_adapters/default.js#L297)
| Param | Type | Description |
| --- | --- | --- |
| node | <code>ASTNode</code> | Node. |
<a name="TreeAdapter.getChildNodes"></a>
### TreeAdapter.getChildNodes(node) ⇒ <code>Array</code>
Returns array of the given node's children.
**Kind**: static method of <code>[TreeAdapter](#TreeAdapter)</code>
**Returns**: <code>Array</code> - children
**See**: [default implementation.](https://github.com/inikulin/parse5/blob/tree-adapter-docs-rev/lib/tree_adapters/default.js#L313)
| Param | Type | Description |
| --- | --- | --- |
| node | <code>ASTNode</code> | Node. |
<a name="TreeAdapter.getParentNode"></a>
### TreeAdapter.getParentNode(node) ⇒ <code>ASTNode</code>
Returns given node's parent.
**Kind**: static method of <code>[TreeAdapter](#TreeAdapter)</code>
**Returns**: <code>ASTNode</code> - parent
**See**: [default implementation.](https://github.com/inikulin/parse5/blob/tree-adapter-docs-rev/lib/tree_adapters/default.js#L329)
| Param | Type | Description |
| --- | --- | --- |
| node | <code>ASTNode</code> | Node. |
<a name="TreeAdapter.getAttrList"></a>
### TreeAdapter.getAttrList(node) ⇒ <code>Array</code>
Returns array of the given node's attributes in form of the name-value pair.
Foreign attributes may contain `namespace` and `prefix` fields as well.
**Kind**: static method of <code>[TreeAdapter](#TreeAdapter)</code>
**Returns**: <code>Array</code> - attributes
**See**: [default implementation.](https://github.com/inikulin/parse5/blob/tree-adapter-docs-rev/lib/tree_adapters/default.js#L346)
| Param | Type | Description |
| --- | --- | --- |
| node | <code>ASTNode</code> | Node. |
<a name="TreeAdapter.getTagName"></a>
### TreeAdapter.getTagName(element) ⇒ <code>String</code>
Returns given element's tag name.
**Kind**: static method of <code>[TreeAdapter](#TreeAdapter)</code>
**Returns**: <code>String</code> - tagName
**See**: [default implementation.](https://github.com/inikulin/parse5/blob/tree-adapter-docs-rev/lib/tree_adapters/default.js#L364)
| Param | Type | Description |
| --- | --- | --- |
| element | <code>ASTNode.&lt;Element&gt;</code> | Element. |
<a name="TreeAdapter.getNamespaceURI"></a>
### TreeAdapter.getNamespaceURI(element) ⇒ <code>String</code>
Returns given element's namespace.
**Kind**: static method of <code>[TreeAdapter](#TreeAdapter)</code>
**Returns**: <code>String</code> - namespaceURI
**See**: [default implementation.](https://github.com/inikulin/parse5/blob/tree-adapter-docs-rev/lib/tree_adapters/default.js#L380)
| Param | Type | Description |
| --- | --- | --- |
| element | <code>ASTNode.&lt;Element&gt;</code> | Element. |
<a name="TreeAdapter.getTextNodeContent"></a>
### TreeAdapter.getTextNodeContent(textNode) ⇒ <code>String</code>
Returns given text node's content.
**Kind**: static method of <code>[TreeAdapter](#TreeAdapter)</code>
**Returns**: <code>String</code> - text
**See**: [default implementation.](https://github.com/inikulin/parse5/blob/tree-adapter-docs-rev/lib/tree_adapters/default.js#L396)
| Param | Type | Description |
| --- | --- | --- |
| textNode | <code>ASTNode.&lt;Text&gt;</code> | Text node. |
<a name="TreeAdapter.getTextNodeContent"></a>
### TreeAdapter.getTextNodeContent(commentNode) ⇒ <code>String</code>
Returns given comment node's content.
**Kind**: static method of <code>[TreeAdapter](#TreeAdapter)</code>
**Returns**: <code>String</code> - commentText
**See**: [default implementation.](https://github.com/inikulin/parse5/blob/tree-adapter-docs-rev/lib/tree_adapters/default.js#L412)
| Param | Type | Description |
| --- | --- | --- |
| commentNode | <code>ASTNode.&lt;Comment&gt;</code> | Comment node. |
<a name="TreeAdapter.getDocumentTypeNodeName"></a>
### TreeAdapter.getDocumentTypeNodeName(doctypeNode) ⇒ <code>String</code>
Returns given document type node's name.
**Kind**: static method of <code>[TreeAdapter](#TreeAdapter)</code>
**Returns**: <code>String</code> - name
**See**: [default implementation.](https://github.com/inikulin/parse5/blob/tree-adapter-docs-rev/lib/tree_adapters/default.js#L428)
| Param | Type | Description |
| --- | --- | --- |
| doctypeNode | <code>ASTNode.&lt;DocumentType&gt;</code> | Document type node. |
<a name="TreeAdapter.getDocumentTypeNodePublicId"></a>
### TreeAdapter.getDocumentTypeNodePublicId(doctypeNode) ⇒ <code>String</code>
Returns given document type node's public identifier.
**Kind**: static method of <code>[TreeAdapter](#TreeAdapter)</code>
**Returns**: <code>String</code> - publicId
**See**: [default implementation.](https://github.com/inikulin/parse5/blob/tree-adapter-docs-rev/lib/tree_adapters/default.js#L444)
| Param | Type | Description |
| --- | --- | --- |
| doctypeNode | <code>ASTNode.&lt;DocumentType&gt;</code> | Document type node. |
<a name="TreeAdapter.getDocumentTypeNodeSystemId"></a>
### TreeAdapter.getDocumentTypeNodeSystemId(doctypeNode) ⇒ <code>String</code>
Returns given document type node's system identifier.
**Kind**: static method of <code>[TreeAdapter](#TreeAdapter)</code>
**Returns**: <code>String</code> - systemId
**See**: [default implementation.](https://github.com/inikulin/parse5/blob/tree-adapter-docs-rev/lib/tree_adapters/default.js#L460)
| Param | Type | Description |
| --- | --- | --- |
| doctypeNode | <code>ASTNode.&lt;DocumentType&gt;</code> | Document type node. |
<a name="TreeAdapter.isTextNode"></a>
### TreeAdapter.isTextNode(node) ⇒ <code>Boolean</code>
Determines if given node is a text node.
**Kind**: static method of <code>[TreeAdapter](#TreeAdapter)</code>
**See**: [default implementation.](https://github.com/inikulin/parse5/blob/tree-adapter-docs-rev/lib/tree_adapters/default.js#L477)
| Param | Type | Description |
| --- | --- | --- |
| node | <code>ASTNode</code> | Node. |
<a name="TreeAdapter.isCommentNode"></a>
### TreeAdapter.isCommentNode(node) ⇒ <code>Boolean</code>
Determines if given node is a comment node.
**Kind**: static method of <code>[TreeAdapter](#TreeAdapter)</code>
**See**: [default implementation.](https://github.com/inikulin/parse5/blob/tree-adapter-docs-rev/lib/tree_adapters/default.js#L493)
| Param | Type | Description |
| --- | --- | --- |
| node | <code>ASTNode</code> | Node. |
<a name="TreeAdapter.isDocumentTypeNode"></a>
### TreeAdapter.isDocumentTypeNode(node) ⇒ <code>Boolean</code>
Determines if given node is a document type node.
**Kind**: static method of <code>[TreeAdapter](#TreeAdapter)</code>
**See**: [default implementation.](https://github.com/inikulin/parse5/blob/tree-adapter-docs-rev/lib/tree_adapters/default.js#L509)
| Param | Type | Description |
| --- | --- | --- |
| node | <code>ASTNode</code> | Node. |
<a name="TreeAdapter.isElementNode"></a>
### TreeAdapter.isElementNode(node) ⇒ <code>Boolean</code>
Determines if given node is an element.
**Kind**: static method of <code>[TreeAdapter](#TreeAdapter)</code>
**See**: [default implementation.](https://github.com/inikulin/parse5/blob/tree-adapter-docs-rev/lib/tree_adapters/default.js#L525)
| Param | Type | Description |
| --- | --- | --- |
| node | <code>ASTNode</code> | Node. |
# FAQ
## Q: I want to work with my own document tree format. How can I achieve this?
You can create a custom tree adapter so parse5 can work with your own DOM-tree implementation.
Then just pass it to the parser or serializer via option:
```js
var parse5 = require('parse5');
var myTreeAdapter = {
//Adapter methods...
};
var document = parse5.parse('<div></div>', { treeAdapter: myTreeAdapter });
var html = parse5.serialize(document, { treeAdapter: myTreeAdapter });
```
You can find description of the methods which should be exposed by tree adapter and links to their
default implementation in the [API reference](#TreeAdapter).
## Q: How can I use parse5 in the browser?
Just compile it with [browserify](http://browserify.org/) and you're set.
## Q: I'm parsing `<img src="foo">` with the `SAXParser` and I expect `selfClosing` flag to be `true` for the `<img>` tag. But it's not. Is there something wrong with parser?
No. Self-closing tag is the tag that has `/` before the closing bracket. E.g: `<br/>`, `<meta/>`.
In the provided example tag just doesn't have end tag. Self-closing tags and tags without end tags are differently treated by the
parser: in case of self-closing tag parser will not lookup for the appropriate closing tag and expects element to not have any content.
But if start tag is not self-closing parser will treat everything after it (with the few exceptions) as the element content.
However, if the start tag is in the list of [void elements](https://html.spec.whatwg.org/multipage/syntax.html#void-elements) parser expects corresponding
element to not have content and behaves the same way as the if element is self-closing. So, semantically if element is the
void element self-closing tags and tags without closing tags are equivalent, but it's not true for all other tags.
**TL;DR**: `selfClosing` is the part of the lexical information and will be set only if the tag in source code has `/` before the closing bracket.
## Q: I have some weird output from the parser, seems like it's a bug.
More likely, it's not. There are a lot of weird edge cases in HTML5 parsing algorithm, e.g.:
```html
<b>1<p>2</b>3</p>
```
will be parsed as
```html
<b>1</b><p><b>2</b>3</p>
```
Just try it in the latest version of your browser before submitting the issue.
# Version history
## 2.0.0
* Add: [ParserStream](http://inikulin.github.io/parse5/#parse5+ParserStream) with the scripting support.
* Add: [SerializerStream](http://inikulin.github.io/parse5/#parse5+SerializerStream)
* Add: Line/column location info.
* Update (**breaking**): `SimpleApiParser` was renamed to [SAXParser](http://inikulin.github.io/parse5/#parse5+SAXParser).
* Update (**breaking**): [SAXParser](http://inikulin.github.io/parse5/#parse5+SAXParser) is the [transform stream](https://nodejs.org/api/stream.html#stream_class_stream_transform)
now.
* Update (**breaking**): [SAXParser](http://inikulin.github.io/parse5/#parse5+SAXParser) handler subscription is done via events now.
* Add: [SAXParser.stop()](http://inikulin.github.io/parse5/#parse5+SAXParser+stop)
* Add (**breaking**): [parse5.parse()](http://inikulin.github.io/parse5/#parse5+parse) and [parse5.parseFragment()](http://inikulin.github.io/parse5/#parse5+parseFragment)
methods as replacement for the `Parser` class.
* Add (**breaking**): [parse5.serialize()](http://inikulin.github.io/parse5/#parse5+serialized) method as replacement for the `Serializer` class.
* Update: parsing algorithm was updated with the latest [HTML spec](https://html.spec.whatwg.org/) changes.
* Remove (**breaking**): `decodeHtmlEntities` and `encodeHtmlEntities` options. [Discussion](https://github.com/inikulin/parse5/issues/75).
## 1.5.0
* Add: Location info for the element start and end tags (by @sakagg).
## 1.4.2
* Fix: htmlparser2 tree adapter `DocumentType.data` property rendering (GH [#45](https://github.com/inikulin/parse5/issues/45)).
## 1.4.1
* Fix: Location info handling for the implicitly generated `<html>` and `<body>` elements (GH [#44](https://github.com/inikulin/parse5/issues/44)).
## 1.4.0
* Add: Parser [decodeHtmlEntities](https://github.com/inikulin/parse5#optionsdecodehtmlentities) option.
* Add: SimpleApiParser [decodeHtmlEntities](https://github.com/inikulin/parse5#optionsdecodehtmlentities-1) option.
* Add: Parser [locationInfo](https://github.com/inikulin/parse5#optionslocationinfo) option.
* Add: SimpleApiParser [locationInfo](https://github.com/inikulin/parse5#optionslocationinfo-1) option.
## 1.3.2
* Fix: `<form>` processing in `<template>` (GH [#40](https://github.com/inikulin/parse5/issues/40)).
## 1.3.1
* Fix: text node in `<template>` serialization problem with custom tree adapter (GH [#38](https://github.com/inikulin/parse5/issues/38)).
## 1.3.0
* Add: Serializer `encodeHtmlEntities` option.
## 1.2.0
* Add: `<template>` support
* `parseFragment` now uses `<template>` as default `contextElement`. This leads to the more "forgiving" parsing manner.
* `TreeSerializer` was renamed to `Serializer`. However, serializer is accessible as `parse5.TreeSerializer` for backward compatibility .
## 1.1.6
* Fix: apply latest changes to the `htmlparser2` tree format (DOM Level1 node emulation).
## 1.1.5
* Add: [jsdom](https://github.com/tmpvar/jsdom)-specific parser with scripting support. Undocumented for `jsdom` internal use only.
## 1.1.4
* Add: logo
* Fix: use fake `document` element for fragment parsing (required by [jsdom](https://github.com/tmpvar/jsdom)).
## 1.1.3
* Development files (e.g. `.travis.yml`, `.editorconfig`) are removed from NPM package.
## 1.1.2
* Fix: crash on Linux due to upper-case leading character in module name used in `require()`.
## 1.1.1
* Add: [SimpleApiParser](https://github.com/inikulin/parse5/#class-simpleapiparser).
* Fix: new line serialization in `<pre>`.
* Fix: `SYSTEM`-only `DOCTYPE` serialization.
* Fix: quotes serialization in `DOCTYPE` IDs.
## 1.0.0
* First stable release, switch to semantic versioning.
## 0.8.3
* Fix: siblings calculation bug in `appendChild` in `htmlparser2` tree adapter.
## 0.8.1
* Add: [TreeSerializer](https://github.com/inikulin/parse5/#class-serializer).
* Add: [htmlparser2 tree adapter](https://github.com/inikulin/parse5/#-treeadaptershtmlparser2).
## 0.6.1
* Fix: incorrect `<menuitem>` handling in `<body>`.
## 0.6.0
* Initial release.
# License
Copyright (c) 2013-2015 Ivan Nikulin (ifaaan@gmail.com, https://github.com/inikulin)
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

View File

@@ -0,0 +1,12 @@
'use strict';
exports.Parser = require('./lib/tree_construction/parser');
exports.SimpleApiParser = require('./lib/simple_api/simple_api_parser');
exports.TreeSerializer =
exports.Serializer = require('./lib/serialization/serializer');
exports.JsDomParser = require('./lib/jsdom/jsdom_parser');
exports.TreeAdapters = {
default: require('./lib/tree_adapters/default'),
htmlparser2: require('./lib/tree_adapters/htmlparser2')
};

View File

@@ -0,0 +1,134 @@
'use strict';
//Const
var VALID_DOCTYPE_NAME = 'html',
QUIRKS_MODE_SYSTEM_ID = 'http://www.ibm.com/data/dtd/v11/ibmxhtml1-transitional.dtd',
QUIRKS_MODE_PUBLIC_ID_PREFIXES = [
"+//silmaril//dtd html pro v0r11 19970101//en",
"-//advasoft ltd//dtd html 3.0 aswedit + extensions//en",
"-//as//dtd html 3.0 aswedit + extensions//en",
"-//ietf//dtd html 2.0 level 1//en",
"-//ietf//dtd html 2.0 level 2//en",
"-//ietf//dtd html 2.0 strict level 1//en",
"-//ietf//dtd html 2.0 strict level 2//en",
"-//ietf//dtd html 2.0 strict//en",
"-//ietf//dtd html 2.0//en",
"-//ietf//dtd html 2.1e//en",
"-//ietf//dtd html 3.0//en",
"-//ietf//dtd html 3.0//en//",
"-//ietf//dtd html 3.2 final//en",
"-//ietf//dtd html 3.2//en",
"-//ietf//dtd html 3//en",
"-//ietf//dtd html level 0//en",
"-//ietf//dtd html level 0//en//2.0",
"-//ietf//dtd html level 1//en",
"-//ietf//dtd html level 1//en//2.0",
"-//ietf//dtd html level 2//en",
"-//ietf//dtd html level 2//en//2.0",
"-//ietf//dtd html level 3//en",
"-//ietf//dtd html level 3//en//3.0",
"-//ietf//dtd html strict level 0//en",
"-//ietf//dtd html strict level 0//en//2.0",
"-//ietf//dtd html strict level 1//en",
"-//ietf//dtd html strict level 1//en//2.0",
"-//ietf//dtd html strict level 2//en",
"-//ietf//dtd html strict level 2//en//2.0",
"-//ietf//dtd html strict level 3//en",
"-//ietf//dtd html strict level 3//en//3.0",
"-//ietf//dtd html strict//en",
"-//ietf//dtd html strict//en//2.0",
"-//ietf//dtd html strict//en//3.0",
"-//ietf//dtd html//en",
"-//ietf//dtd html//en//2.0",
"-//ietf//dtd html//en//3.0",
"-//metrius//dtd metrius presentational//en",
"-//microsoft//dtd internet explorer 2.0 html strict//en",
"-//microsoft//dtd internet explorer 2.0 html//en",
"-//microsoft//dtd internet explorer 2.0 tables//en",
"-//microsoft//dtd internet explorer 3.0 html strict//en",
"-//microsoft//dtd internet explorer 3.0 html//en",
"-//microsoft//dtd internet explorer 3.0 tables//en",
"-//netscape comm. corp.//dtd html//en",
"-//netscape comm. corp.//dtd strict html//en",
"-//o'reilly and associates//dtd html 2.0//en",
"-//o'reilly and associates//dtd html extended 1.0//en",
"-//spyglass//dtd html 2.0 extended//en",
"-//sq//dtd html 2.0 hotmetal + extensions//en",
"-//sun microsystems corp.//dtd hotjava html//en",
"-//sun microsystems corp.//dtd hotjava strict html//en",
"-//w3c//dtd html 3 1995-03-24//en",
"-//w3c//dtd html 3.2 draft//en",
"-//w3c//dtd html 3.2 final//en",
"-//w3c//dtd html 3.2//en",
"-//w3c//dtd html 3.2s draft//en",
"-//w3c//dtd html 4.0 frameset//en",
"-//w3c//dtd html 4.0 transitional//en",
"-//w3c//dtd html experimental 19960712//en",
"-//w3c//dtd html experimental 970421//en",
"-//w3c//dtd w3 html//en",
"-//w3o//dtd w3 html 3.0//en",
"-//w3o//dtd w3 html 3.0//en//",
"-//webtechs//dtd mozilla html 2.0//en",
"-//webtechs//dtd mozilla html//en"
],
QUIRKS_MODE_NO_SYSTEM_ID_PUBLIC_ID_PREFIXES = [
'-//w3c//dtd html 4.01 frameset//',
'-//w3c//dtd html 4.01 transitional//'
],
QUIRKS_MODE_PUBLIC_IDS = [
'-//w3o//dtd w3 html strict 3.0//en//',
'-/w3c/dtd html 4.0 transitional/en',
'html'
];
//Utils
function enquoteDoctypeId(id) {
var quote = id.indexOf('"') !== -1 ? '\'' : '"';
return quote + id + quote;
}
//API
exports.isQuirks = function (name, publicId, systemId) {
if (name !== VALID_DOCTYPE_NAME)
return true;
if (systemId && systemId.toLowerCase() === QUIRKS_MODE_SYSTEM_ID)
return true;
if (publicId !== null) {
publicId = publicId.toLowerCase();
if (QUIRKS_MODE_PUBLIC_IDS.indexOf(publicId) > -1)
return true;
var prefixes = QUIRKS_MODE_PUBLIC_ID_PREFIXES;
if (systemId === null)
prefixes = prefixes.concat(QUIRKS_MODE_NO_SYSTEM_ID_PUBLIC_ID_PREFIXES);
for (var i = 0; i < prefixes.length; i++) {
if (publicId.indexOf(prefixes[i]) === 0)
return true;
}
}
return false;
};
exports.serializeContent = function (name, publicId, systemId) {
var str = '!DOCTYPE ' + name;
if (publicId !== null)
str += ' PUBLIC ' + enquoteDoctypeId(publicId);
else if (systemId !== null)
str += ' SYSTEM';
if (systemId !== null)
str += ' ' + enquoteDoctypeId(systemId);
return str;
};

View File

@@ -0,0 +1,257 @@
'use strict';
var Tokenizer = require('../tokenization/tokenizer'),
HTML = require('./html');
//Aliases
var $ = HTML.TAG_NAMES,
NS = HTML.NAMESPACES,
ATTRS = HTML.ATTRS;
//MIME types
var MIME_TYPES = {
TEXT_HTML: 'text/html',
APPLICATION_XML: 'application/xhtml+xml'
};
//Attributes
var DEFINITION_URL_ATTR = 'definitionurl',
ADJUSTED_DEFINITION_URL_ATTR = 'definitionURL',
SVG_ATTRS_ADJUSTMENT_MAP = {
'attributename': 'attributeName',
'attributetype': 'attributeType',
'basefrequency': 'baseFrequency',
'baseprofile': 'baseProfile',
'calcmode': 'calcMode',
'clippathunits': 'clipPathUnits',
'contentscripttype': 'contentScriptType',
'contentstyletype': 'contentStyleType',
'diffuseconstant': 'diffuseConstant',
'edgemode': 'edgeMode',
'externalresourcesrequired': 'externalResourcesRequired',
'filterres': 'filterRes',
'filterunits': 'filterUnits',
'glyphref': 'glyphRef',
'gradienttransform': 'gradientTransform',
'gradientunits': 'gradientUnits',
'kernelmatrix': 'kernelMatrix',
'kernelunitlength': 'kernelUnitLength',
'keypoints': 'keyPoints',
'keysplines': 'keySplines',
'keytimes': 'keyTimes',
'lengthadjust': 'lengthAdjust',
'limitingconeangle': 'limitingConeAngle',
'markerheight': 'markerHeight',
'markerunits': 'markerUnits',
'markerwidth': 'markerWidth',
'maskcontentunits': 'maskContentUnits',
'maskunits': 'maskUnits',
'numoctaves': 'numOctaves',
'pathlength': 'pathLength',
'patterncontentunits': 'patternContentUnits',
'patterntransform': 'patternTransform',
'patternunits': 'patternUnits',
'pointsatx': 'pointsAtX',
'pointsaty': 'pointsAtY',
'pointsatz': 'pointsAtZ',
'preservealpha': 'preserveAlpha',
'preserveaspectratio': 'preserveAspectRatio',
'primitiveunits': 'primitiveUnits',
'refx': 'refX',
'refy': 'refY',
'repeatcount': 'repeatCount',
'repeatdur': 'repeatDur',
'requiredextensions': 'requiredExtensions',
'requiredfeatures': 'requiredFeatures',
'specularconstant': 'specularConstant',
'specularexponent': 'specularExponent',
'spreadmethod': 'spreadMethod',
'startoffset': 'startOffset',
'stddeviation': 'stdDeviation',
'stitchtiles': 'stitchTiles',
'surfacescale': 'surfaceScale',
'systemlanguage': 'systemLanguage',
'tablevalues': 'tableValues',
'targetx': 'targetX',
'targety': 'targetY',
'textlength': 'textLength',
'viewbox': 'viewBox',
'viewtarget': 'viewTarget',
'xchannelselector': 'xChannelSelector',
'ychannelselector': 'yChannelSelector',
'zoomandpan': 'zoomAndPan'
},
XML_ATTRS_ADJUSTMENT_MAP = {
'xlink:actuate': {prefix: 'xlink', name: 'actuate', namespace: NS.XLINK},
'xlink:arcrole': {prefix: 'xlink', name: 'arcrole', namespace: NS.XLINK},
'xlink:href': {prefix: 'xlink', name: 'href', namespace: NS.XLINK},
'xlink:role': {prefix: 'xlink', name: 'role', namespace: NS.XLINK},
'xlink:show': {prefix: 'xlink', name: 'show', namespace: NS.XLINK},
'xlink:title': {prefix: 'xlink', name: 'title', namespace: NS.XLINK},
'xlink:type': {prefix: 'xlink', name: 'type', namespace: NS.XLINK},
'xml:base': {prefix: 'xml', name: 'base', namespace: NS.XML},
'xml:lang': {prefix: 'xml', name: 'lang', namespace: NS.XML},
'xml:space': {prefix: 'xml', name: 'space', namespace: NS.XML},
'xmlns': {prefix: '', name: 'xmlns', namespace: NS.XMLNS},
'xmlns:xlink': {prefix: 'xmlns', name: 'xlink', namespace: NS.XMLNS}
};
//SVG tag names adjustment map
var SVG_TAG_NAMES_ADJUSTMENT_MAP = {
'altglyph': 'altGlyph',
'altglyphdef': 'altGlyphDef',
'altglyphitem': 'altGlyphItem',
'animatecolor': 'animateColor',
'animatemotion': 'animateMotion',
'animatetransform': 'animateTransform',
'clippath': 'clipPath',
'feblend': 'feBlend',
'fecolormatrix': 'feColorMatrix',
'fecomponenttransfer': 'feComponentTransfer',
'fecomposite': 'feComposite',
'feconvolvematrix': 'feConvolveMatrix',
'fediffuselighting': 'feDiffuseLighting',
'fedisplacementmap': 'feDisplacementMap',
'fedistantlight': 'feDistantLight',
'feflood': 'feFlood',
'fefunca': 'feFuncA',
'fefuncb': 'feFuncB',
'fefuncg': 'feFuncG',
'fefuncr': 'feFuncR',
'fegaussianblur': 'feGaussianBlur',
'feimage': 'feImage',
'femerge': 'feMerge',
'femergenode': 'feMergeNode',
'femorphology': 'feMorphology',
'feoffset': 'feOffset',
'fepointlight': 'fePointLight',
'fespecularlighting': 'feSpecularLighting',
'fespotlight': 'feSpotLight',
'fetile': 'feTile',
'feturbulence': 'feTurbulence',
'foreignobject': 'foreignObject',
'glyphref': 'glyphRef',
'lineargradient': 'linearGradient',
'radialgradient': 'radialGradient',
'textpath': 'textPath'
};
//Tags that causes exit from foreign content
var EXITS_FOREIGN_CONTENT = {};
EXITS_FOREIGN_CONTENT[$.B] = true;
EXITS_FOREIGN_CONTENT[$.BIG] = true;
EXITS_FOREIGN_CONTENT[$.BLOCKQUOTE] = true;
EXITS_FOREIGN_CONTENT[$.BODY] = true;
EXITS_FOREIGN_CONTENT[$.BR] = true;
EXITS_FOREIGN_CONTENT[$.CENTER] = true;
EXITS_FOREIGN_CONTENT[$.CODE] = true;
EXITS_FOREIGN_CONTENT[$.DD] = true;
EXITS_FOREIGN_CONTENT[$.DIV] = true;
EXITS_FOREIGN_CONTENT[$.DL] = true;
EXITS_FOREIGN_CONTENT[$.DT] = true;
EXITS_FOREIGN_CONTENT[$.EM] = true;
EXITS_FOREIGN_CONTENT[$.EMBED] = true;
EXITS_FOREIGN_CONTENT[$.H1] = true;
EXITS_FOREIGN_CONTENT[$.H2] = true;
EXITS_FOREIGN_CONTENT[$.H3] = true;
EXITS_FOREIGN_CONTENT[$.H4] = true;
EXITS_FOREIGN_CONTENT[$.H5] = true;
EXITS_FOREIGN_CONTENT[$.H6] = true;
EXITS_FOREIGN_CONTENT[$.HEAD] = true;
EXITS_FOREIGN_CONTENT[$.HR] = true;
EXITS_FOREIGN_CONTENT[$.I] = true;
EXITS_FOREIGN_CONTENT[$.IMG] = true;
EXITS_FOREIGN_CONTENT[$.LI] = true;
EXITS_FOREIGN_CONTENT[$.LISTING] = true;
EXITS_FOREIGN_CONTENT[$.MENU] = true;
EXITS_FOREIGN_CONTENT[$.META] = true;
EXITS_FOREIGN_CONTENT[$.NOBR] = true;
EXITS_FOREIGN_CONTENT[$.OL] = true;
EXITS_FOREIGN_CONTENT[$.P] = true;
EXITS_FOREIGN_CONTENT[$.PRE] = true;
EXITS_FOREIGN_CONTENT[$.RUBY] = true;
EXITS_FOREIGN_CONTENT[$.S] = true;
EXITS_FOREIGN_CONTENT[$.SMALL] = true;
EXITS_FOREIGN_CONTENT[$.SPAN] = true;
EXITS_FOREIGN_CONTENT[$.STRONG] = true;
EXITS_FOREIGN_CONTENT[$.STRIKE] = true;
EXITS_FOREIGN_CONTENT[$.SUB] = true;
EXITS_FOREIGN_CONTENT[$.SUP] = true;
EXITS_FOREIGN_CONTENT[$.TABLE] = true;
EXITS_FOREIGN_CONTENT[$.TT] = true;
EXITS_FOREIGN_CONTENT[$.U] = true;
EXITS_FOREIGN_CONTENT[$.UL] = true;
EXITS_FOREIGN_CONTENT[$.VAR] = true;
//Check exit from foreign content
exports.causesExit = function (startTagToken) {
var tn = startTagToken.tagName;
if (tn === $.FONT && (Tokenizer.getTokenAttr(startTagToken, ATTRS.COLOR) !== null ||
Tokenizer.getTokenAttr(startTagToken, ATTRS.SIZE) !== null ||
Tokenizer.getTokenAttr(startTagToken, ATTRS.FACE) !== null)) {
return true;
}
return EXITS_FOREIGN_CONTENT[tn];
};
//Token adjustments
exports.adjustTokenMathMLAttrs = function (token) {
for (var i = 0; i < token.attrs.length; i++) {
if (token.attrs[i].name === DEFINITION_URL_ATTR) {
token.attrs[i].name = ADJUSTED_DEFINITION_URL_ATTR;
break;
}
}
};
exports.adjustTokenSVGAttrs = function (token) {
for (var i = 0; i < token.attrs.length; i++) {
var adjustedAttrName = SVG_ATTRS_ADJUSTMENT_MAP[token.attrs[i].name];
if (adjustedAttrName)
token.attrs[i].name = adjustedAttrName;
}
};
exports.adjustTokenXMLAttrs = function (token) {
for (var i = 0; i < token.attrs.length; i++) {
var adjustedAttrEntry = XML_ATTRS_ADJUSTMENT_MAP[token.attrs[i].name];
if (adjustedAttrEntry) {
token.attrs[i].prefix = adjustedAttrEntry.prefix;
token.attrs[i].name = adjustedAttrEntry.name;
token.attrs[i].namespace = adjustedAttrEntry.namespace;
}
}
};
exports.adjustTokenSVGTagName = function (token) {
var adjustedTagName = SVG_TAG_NAMES_ADJUSTMENT_MAP[token.tagName];
if (adjustedTagName)
token.tagName = adjustedTagName;
};
//Integration points
exports.isMathMLTextIntegrationPoint = function (tn, ns) {
return ns === NS.MATHML && (tn === $.MI || tn === $.MO || tn === $.MN || tn === $.MS || tn === $.MTEXT);
};
exports.isHtmlIntegrationPoint = function (tn, ns, attrs) {
if (ns === NS.MATHML && tn === $.ANNOTATION_XML) {
for (var i = 0; i < attrs.length; i++) {
if (attrs[i].name === ATTRS.ENCODING) {
var value = attrs[i].value.toLowerCase();
return value === MIME_TYPES.TEXT_HTML || value === MIME_TYPES.APPLICATION_XML;
}
}
}
return ns === NS.SVG && (tn === $.FOREIGN_OBJECT || tn === $.DESC || tn === $.TITLE);
};

View File

@@ -0,0 +1,268 @@
'use strict';
var NS = exports.NAMESPACES = {
HTML: 'http://www.w3.org/1999/xhtml',
MATHML: 'http://www.w3.org/1998/Math/MathML',
SVG: 'http://www.w3.org/2000/svg',
XLINK: 'http://www.w3.org/1999/xlink',
XML: 'http://www.w3.org/XML/1998/namespace',
XMLNS: 'http://www.w3.org/2000/xmlns/'
};
exports.ATTRS = {
TYPE: 'type',
ACTION: 'action',
ENCODING: 'encoding',
PROMPT: 'prompt',
NAME: 'name',
COLOR: 'color',
FACE: 'face',
SIZE: 'size'
};
var $ = exports.TAG_NAMES = {
A: 'a',
ADDRESS: 'address',
ANNOTATION_XML: 'annotation-xml',
APPLET: 'applet',
AREA: 'area',
ARTICLE: 'article',
ASIDE: 'aside',
B: 'b',
BASE: 'base',
BASEFONT: 'basefont',
BGSOUND: 'bgsound',
BIG: 'big',
BLOCKQUOTE: 'blockquote',
BODY: 'body',
BR: 'br',
BUTTON: 'button',
CAPTION: 'caption',
CENTER: 'center',
CODE: 'code',
COL: 'col',
COLGROUP: 'colgroup',
COMMAND: 'command',
DD: 'dd',
DESC: 'desc',
DETAILS: 'details',
DIALOG: 'dialog',
DIR: 'dir',
DIV: 'div',
DL: 'dl',
DT: 'dt',
EM: 'em',
EMBED: 'embed',
FIELDSET: 'fieldset',
FIGCAPTION: 'figcaption',
FIGURE: 'figure',
FONT: 'font',
FOOTER: 'footer',
FOREIGN_OBJECT: 'foreignObject',
FORM: 'form',
FRAME: 'frame',
FRAMESET: 'frameset',
H1: 'h1',
H2: 'h2',
H3: 'h3',
H4: 'h4',
H5: 'h5',
H6: 'h6',
HEAD: 'head',
HEADER: 'header',
HGROUP: 'hgroup',
HR: 'hr',
HTML: 'html',
I: 'i',
IMG: 'img',
IMAGE: 'image',
INPUT: 'input',
IFRAME: 'iframe',
ISINDEX: 'isindex',
KEYGEN: 'keygen',
LABEL: 'label',
LI: 'li',
LINK: 'link',
LISTING: 'listing',
MAIN: 'main',
MALIGNMARK: 'malignmark',
MARQUEE: 'marquee',
MATH: 'math',
MENU: 'menu',
MENUITEM: 'menuitem',
META: 'meta',
MGLYPH: 'mglyph',
MI: 'mi',
MO: 'mo',
MN: 'mn',
MS: 'ms',
MTEXT: 'mtext',
NAV: 'nav',
NOBR: 'nobr',
NOFRAMES: 'noframes',
NOEMBED: 'noembed',
NOSCRIPT: 'noscript',
OBJECT: 'object',
OL: 'ol',
OPTGROUP: 'optgroup',
OPTION: 'option',
P: 'p',
PARAM: 'param',
PLAINTEXT: 'plaintext',
PRE: 'pre',
RP: 'rp',
RT: 'rt',
RUBY: 'ruby',
S: 's',
SCRIPT: 'script',
SECTION: 'section',
SELECT: 'select',
SOURCE: 'source',
SMALL: 'small',
SPAN: 'span',
STRIKE: 'strike',
STRONG: 'strong',
STYLE: 'style',
SUB: 'sub',
SUMMARY: 'summary',
SUP: 'sup',
TABLE: 'table',
TBODY: 'tbody',
TEMPLATE: 'template',
TEXTAREA: 'textarea',
TFOOT: 'tfoot',
TD: 'td',
TH: 'th',
THEAD: 'thead',
TITLE: 'title',
TR: 'tr',
TRACK: 'track',
TT: 'tt',
U: 'u',
UL: 'ul',
SVG: 'svg',
VAR: 'var',
WBR: 'wbr',
XMP: 'xmp'
};
var SPECIAL_ELEMENTS = exports.SPECIAL_ELEMENTS = {};
SPECIAL_ELEMENTS[NS.HTML] = {};
SPECIAL_ELEMENTS[NS.HTML][$.ADDRESS] = true;
SPECIAL_ELEMENTS[NS.HTML][$.APPLET] = true;
SPECIAL_ELEMENTS[NS.HTML][$.AREA] = true;
SPECIAL_ELEMENTS[NS.HTML][$.ARTICLE] = true;
SPECIAL_ELEMENTS[NS.HTML][$.ASIDE] = true;
SPECIAL_ELEMENTS[NS.HTML][$.BASE] = true;
SPECIAL_ELEMENTS[NS.HTML][$.BASEFONT] = true;
SPECIAL_ELEMENTS[NS.HTML][$.BGSOUND] = true;
SPECIAL_ELEMENTS[NS.HTML][$.BLOCKQUOTE] = true;
SPECIAL_ELEMENTS[NS.HTML][$.BODY] = true;
SPECIAL_ELEMENTS[NS.HTML][$.BR] = true;
SPECIAL_ELEMENTS[NS.HTML][$.BUTTON] = true;
SPECIAL_ELEMENTS[NS.HTML][$.CAPTION] = true;
SPECIAL_ELEMENTS[NS.HTML][$.CENTER] = true;
SPECIAL_ELEMENTS[NS.HTML][$.COL] = true;
SPECIAL_ELEMENTS[NS.HTML][$.COLGROUP] = true;
SPECIAL_ELEMENTS[NS.HTML][$.DD] = true;
SPECIAL_ELEMENTS[NS.HTML][$.DETAILS] = true;
SPECIAL_ELEMENTS[NS.HTML][$.DIR] = true;
SPECIAL_ELEMENTS[NS.HTML][$.DIV] = true;
SPECIAL_ELEMENTS[NS.HTML][$.DL] = true;
SPECIAL_ELEMENTS[NS.HTML][$.DT] = true;
SPECIAL_ELEMENTS[NS.HTML][$.EMBED] = true;
SPECIAL_ELEMENTS[NS.HTML][$.FIELDSET] = true;
SPECIAL_ELEMENTS[NS.HTML][$.FIGCAPTION] = true;
SPECIAL_ELEMENTS[NS.HTML][$.FIGURE] = true;
SPECIAL_ELEMENTS[NS.HTML][$.FOOTER] = true;
SPECIAL_ELEMENTS[NS.HTML][$.FORM] = true;
SPECIAL_ELEMENTS[NS.HTML][$.FRAME] = true;
SPECIAL_ELEMENTS[NS.HTML][$.FRAMESET] = true;
SPECIAL_ELEMENTS[NS.HTML][$.H1] = true;
SPECIAL_ELEMENTS[NS.HTML][$.H2] = true;
SPECIAL_ELEMENTS[NS.HTML][$.H3] = true;
SPECIAL_ELEMENTS[NS.HTML][$.H4] = true;
SPECIAL_ELEMENTS[NS.HTML][$.H5] = true;
SPECIAL_ELEMENTS[NS.HTML][$.H6] = true;
SPECIAL_ELEMENTS[NS.HTML][$.HEAD] = true;
SPECIAL_ELEMENTS[NS.HTML][$.HEADER] = true;
SPECIAL_ELEMENTS[NS.HTML][$.HGROUP] = true;
SPECIAL_ELEMENTS[NS.HTML][$.HR] = true;
SPECIAL_ELEMENTS[NS.HTML][$.HTML] = true;
SPECIAL_ELEMENTS[NS.HTML][$.IFRAME] = true;
SPECIAL_ELEMENTS[NS.HTML][$.IMG] = true;
SPECIAL_ELEMENTS[NS.HTML][$.INPUT] = true;
SPECIAL_ELEMENTS[NS.HTML][$.ISINDEX] = true;
SPECIAL_ELEMENTS[NS.HTML][$.LI] = true;
SPECIAL_ELEMENTS[NS.HTML][$.LINK] = true;
SPECIAL_ELEMENTS[NS.HTML][$.LISTING] = true;
SPECIAL_ELEMENTS[NS.HTML][$.MAIN] = true;
SPECIAL_ELEMENTS[NS.HTML][$.MARQUEE] = true;
SPECIAL_ELEMENTS[NS.HTML][$.MENU] = true;
SPECIAL_ELEMENTS[NS.HTML][$.MENUITEM] = true;
SPECIAL_ELEMENTS[NS.HTML][$.META] = true;
SPECIAL_ELEMENTS[NS.HTML][$.NAV] = true;
SPECIAL_ELEMENTS[NS.HTML][$.NOEMBED] = true;
SPECIAL_ELEMENTS[NS.HTML][$.NOFRAMES] = true;
SPECIAL_ELEMENTS[NS.HTML][$.NOSCRIPT] = true;
SPECIAL_ELEMENTS[NS.HTML][$.OBJECT] = true;
SPECIAL_ELEMENTS[NS.HTML][$.OL] = true;
SPECIAL_ELEMENTS[NS.HTML][$.P] = true;
SPECIAL_ELEMENTS[NS.HTML][$.PARAM] = true;
SPECIAL_ELEMENTS[NS.HTML][$.PLAINTEXT] = true;
SPECIAL_ELEMENTS[NS.HTML][$.PRE] = true;
SPECIAL_ELEMENTS[NS.HTML][$.SCRIPT] = true;
SPECIAL_ELEMENTS[NS.HTML][$.SECTION] = true;
SPECIAL_ELEMENTS[NS.HTML][$.SELECT] = true;
SPECIAL_ELEMENTS[NS.HTML][$.SOURCE] = true;
SPECIAL_ELEMENTS[NS.HTML][$.STYLE] = true;
SPECIAL_ELEMENTS[NS.HTML][$.SUMMARY] = true;
SPECIAL_ELEMENTS[NS.HTML][$.TABLE] = true;
SPECIAL_ELEMENTS[NS.HTML][$.TBODY] = true;
SPECIAL_ELEMENTS[NS.HTML][$.TD] = true;
SPECIAL_ELEMENTS[NS.HTML][$.TEMPLATE] = true;
SPECIAL_ELEMENTS[NS.HTML][$.TEXTAREA] = true;
SPECIAL_ELEMENTS[NS.HTML][$.TFOOT] = true;
SPECIAL_ELEMENTS[NS.HTML][$.TH] = true;
SPECIAL_ELEMENTS[NS.HTML][$.THEAD] = true;
SPECIAL_ELEMENTS[NS.HTML][$.TITLE] = true;
SPECIAL_ELEMENTS[NS.HTML][$.TR] = true;
SPECIAL_ELEMENTS[NS.HTML][$.TRACK] = true;
SPECIAL_ELEMENTS[NS.HTML][$.UL] = true;
SPECIAL_ELEMENTS[NS.HTML][$.WBR] = true;
SPECIAL_ELEMENTS[NS.HTML][$.XMP] = true;
SPECIAL_ELEMENTS[NS.MATHML] = {};
SPECIAL_ELEMENTS[NS.MATHML][$.MI] = true;
SPECIAL_ELEMENTS[NS.MATHML][$.MO] = true;
SPECIAL_ELEMENTS[NS.MATHML][$.MN] = true;
SPECIAL_ELEMENTS[NS.MATHML][$.MS] = true;
SPECIAL_ELEMENTS[NS.MATHML][$.MTEXT] = true;
SPECIAL_ELEMENTS[NS.MATHML][$.ANNOTATION_XML] = true;
SPECIAL_ELEMENTS[NS.SVG] = {};
SPECIAL_ELEMENTS[NS.SVG][$.TITLE] = true;
SPECIAL_ELEMENTS[NS.SVG][$.FOREIGN_OBJECT] = true;
SPECIAL_ELEMENTS[NS.SVG][$.DESC] = true;

View File

@@ -0,0 +1,48 @@
'use strict';
exports.REPLACEMENT_CHARACTER = '\uFFFD';
exports.CODE_POINTS = {
EOF: -1,
NULL: 0x00,
TABULATION: 0x09,
CARRIAGE_RETURN: 0x0D,
LINE_FEED: 0x0A,
FORM_FEED: 0x0C,
SPACE: 0x20,
EXCLAMATION_MARK: 0x21,
QUOTATION_MARK: 0x22,
NUMBER_SIGN: 0x23,
AMPERSAND: 0x26,
APOSTROPHE: 0x27,
HYPHEN_MINUS: 0x2D,
SOLIDUS: 0x2F,
DIGIT_0: 0x30,
DIGIT_9: 0x39,
SEMICOLON: 0x3B,
LESS_THAN_SIGN: 0x3C,
EQUALS_SIGN: 0x3D,
GREATER_THAN_SIGN: 0x3E,
QUESTION_MARK: 0x3F,
LATIN_CAPITAL_A: 0x41,
LATIN_CAPITAL_F: 0x46,
LATIN_CAPITAL_X: 0x58,
LATIN_CAPITAL_Z: 0x5A,
GRAVE_ACCENT: 0x60,
LATIN_SMALL_A: 0x61,
LATIN_SMALL_F: 0x66,
LATIN_SMALL_X: 0x78,
LATIN_SMALL_Z: 0x7A,
BOM: 0xFEFF,
REPLACEMENT_CHARACTER: 0xFFFD
};
exports.CODE_POINT_SEQUENCES = {
DASH_DASH_STRING: [0x2D, 0x2D], //--
DOCTYPE_STRING: [0x44, 0x4F, 0x43, 0x54, 0x59, 0x50, 0x45], //DOCTYPE
CDATA_START_STRING: [0x5B, 0x43, 0x44, 0x41, 0x54, 0x41, 0x5B], //[CDATA[
CDATA_END_STRING: [0x5D, 0x5D, 0x3E], //]]>
SCRIPT_STRING: [0x73, 0x63, 0x72, 0x69, 0x70, 0x74], //script
PUBLIC_STRING: [0x50, 0x55, 0x42, 0x4C, 0x49, 0x43], //PUBLIC
SYSTEM_STRING: [0x53, 0x59, 0x53, 0x54, 0x45, 0x4D] //SYSTEM
};

View File

@@ -0,0 +1,13 @@
'use strict';
exports.mergeOptions = function (defaults, options) {
options = options || {};
return [defaults, options].reduce(function (merged, optObj) {
Object.keys(optObj).forEach(function (key) {
merged[key] = optObj[key];
});
return merged;
}, {});
};

View File

@@ -0,0 +1,39 @@
'use strict';
var Parser = require('../tree_construction/parser'),
ParsingUnit = require('./parsing_unit');
//API
exports.parseDocument = function (html, treeAdapter) {
//NOTE: this should be reentrant, so we create new parser here
var parser = new Parser(treeAdapter),
parsingUnit = new ParsingUnit(parser);
//NOTE: override parser loop method
parser._runParsingLoop = function () {
parsingUnit.parsingLoopLock = true;
while (!parsingUnit.suspended && !this.stopped)
this._iterateParsingLoop();
parsingUnit.parsingLoopLock = false;
if (this.stopped)
parsingUnit.callback(this.document);
};
//NOTE: wait while parserController will be adopted by calling code, then
//start parsing
process.nextTick(function () {
parser.parse(html);
});
return parsingUnit;
};
exports.parseInnerHtml = function (innerHtml, contextElement, treeAdapter) {
//NOTE: this should be reentrant, so we create new parser here
var parser = new Parser(treeAdapter);
return parser.parseFragment(innerHtml, contextElement);
};

View File

@@ -0,0 +1,53 @@
'use strict';
var ParsingUnit = module.exports = function (parser) {
this.parser = parser;
this.suspended = false;
this.parsingLoopLock = false;
this.callback = null;
};
ParsingUnit.prototype._stateGuard = function (suspend) {
if (this.suspended && suspend)
throw new Error('parse5: Parser was already suspended. Please, check your control flow logic.');
else if (!this.suspended && !suspend)
throw new Error('parse5: Parser was already resumed. Please, check your control flow logic.');
return suspend;
};
ParsingUnit.prototype.suspend = function () {
this.suspended = this._stateGuard(true);
return this;
};
ParsingUnit.prototype.resume = function () {
this.suspended = this._stateGuard(false);
//NOTE: don't enter parsing loop if it is locked. Without this lock _runParsingLoop() may be called
//while parsing loop is still running. E.g. when suspend() and resume() called synchronously.
if (!this.parsingLoopLock)
this.parser._runParsingLoop();
return this;
};
ParsingUnit.prototype.documentWrite = function (html) {
this.parser.tokenizer.preprocessor.write(html);
return this;
};
ParsingUnit.prototype.handleScripts = function (scriptHandler) {
this.parser.scriptHandler = scriptHandler;
return this;
};
ParsingUnit.prototype.done = function (callback) {
this.callback = callback;
return this;
};

View File

@@ -0,0 +1,178 @@
'use strict';
var DefaultTreeAdapter = require('../tree_adapters/default'),
Doctype = require('../common/doctype'),
Utils = require('../common/utils'),
HTML = require('../common/html');
//Aliases
var $ = HTML.TAG_NAMES,
NS = HTML.NAMESPACES;
//Default serializer options
var DEFAULT_OPTIONS = {
encodeHtmlEntities: true
};
//Escaping regexes
var AMP_REGEX = /&/g,
NBSP_REGEX = /\u00a0/g,
DOUBLE_QUOTE_REGEX = /"/g,
LT_REGEX = /</g,
GT_REGEX = />/g;
//Escape string
function escapeString(str, attrMode) {
str = str
.replace(AMP_REGEX, '&amp;')
.replace(NBSP_REGEX, '&nbsp;');
if (attrMode)
str = str.replace(DOUBLE_QUOTE_REGEX, '&quot;');
else {
str = str
.replace(LT_REGEX, '&lt;')
.replace(GT_REGEX, '&gt;');
}
return str;
}
//Enquote doctype ID
//Serializer
var Serializer = module.exports = function (treeAdapter, options) {
this.treeAdapter = treeAdapter || DefaultTreeAdapter;
this.options = Utils.mergeOptions(DEFAULT_OPTIONS, options);
};
//API
Serializer.prototype.serialize = function (node) {
this.html = '';
this._serializeChildNodes(node);
return this.html;
};
//Internals
Serializer.prototype._serializeChildNodes = function (parentNode) {
var childNodes = this.treeAdapter.getChildNodes(parentNode);
if (childNodes) {
for (var i = 0, cnLength = childNodes.length; i < cnLength; i++) {
var currentNode = childNodes[i];
if (this.treeAdapter.isElementNode(currentNode))
this._serializeElement(currentNode);
else if (this.treeAdapter.isTextNode(currentNode))
this._serializeTextNode(currentNode);
else if (this.treeAdapter.isCommentNode(currentNode))
this._serializeCommentNode(currentNode);
else if (this.treeAdapter.isDocumentTypeNode(currentNode))
this._serializeDocumentTypeNode(currentNode);
}
}
};
Serializer.prototype._serializeElement = function (node) {
var tn = this.treeAdapter.getTagName(node),
ns = this.treeAdapter.getNamespaceURI(node);
this.html += '<' + tn;
this._serializeAttributes(node);
this.html += '>';
if (tn !== $.AREA && tn !== $.BASE && tn !== $.BASEFONT && tn !== $.BGSOUND && tn !== $.BR && tn !== $.BR &&
tn !== $.COL && tn !== $.EMBED && tn !== $.FRAME && tn !== $.HR && tn !== $.IMG && tn !== $.INPUT &&
tn !== $.KEYGEN && tn !== $.LINK && tn !== $.MENUITEM && tn !== $.META && tn !== $.PARAM && tn !== $.SOURCE &&
tn !== $.TRACK && tn !== $.WBR) {
if (tn === $.PRE || tn === $.TEXTAREA || tn === $.LISTING) {
var firstChild = this.treeAdapter.getFirstChild(node);
if (firstChild && this.treeAdapter.isTextNode(firstChild)) {
var content = this.treeAdapter.getTextNodeContent(firstChild);
if (content[0] === '\n')
this.html += '\n';
}
}
var childNodesHolder = tn === $.TEMPLATE && ns === NS.HTML ?
this.treeAdapter.getChildNodes(node)[0] :
node;
this._serializeChildNodes(childNodesHolder);
this.html += '</' + tn + '>';
}
};
Serializer.prototype._serializeAttributes = function (node) {
var attrs = this.treeAdapter.getAttrList(node);
for (var i = 0, attrsLength = attrs.length; i < attrsLength; i++) {
var attr = attrs[i],
value = this.options.encodeHtmlEntities ? escapeString(attr.value, true) : attr.value;
this.html += ' ';
if (!attr.namespace)
this.html += attr.name;
else if (attr.namespace === NS.XML)
this.html += 'xml:' + attr.name;
else if (attr.namespace === NS.XMLNS) {
if (attr.name !== 'xmlns')
this.html += 'xmlns:';
this.html += attr.name;
}
else if (attr.namespace === NS.XLINK)
this.html += 'xlink:' + attr.name;
else
this.html += attr.namespace + ':' + attr.name;
this.html += '="' + value + '"';
}
};
Serializer.prototype._serializeTextNode = function (node) {
var content = this.treeAdapter.getTextNodeContent(node),
parent = this.treeAdapter.getParentNode(node),
parentTn = void 0;
if (parent && this.treeAdapter.isElementNode(parent))
parentTn = this.treeAdapter.getTagName(parent);
if (parentTn === $.STYLE || parentTn === $.SCRIPT || parentTn === $.XMP || parentTn === $.IFRAME ||
parentTn === $.NOEMBED || parentTn === $.NOFRAMES || parentTn === $.PLAINTEXT || parentTn === $.NOSCRIPT) {
this.html += content;
}
else
this.html += this.options.encodeHtmlEntities ? escapeString(content, false) : content;
};
Serializer.prototype._serializeCommentNode = function (node) {
this.html += '<!--' + this.treeAdapter.getCommentNodeContent(node) + '-->';
};
Serializer.prototype._serializeDocumentTypeNode = function (node) {
var name = this.treeAdapter.getDocumentTypeNodeName(node),
publicId = this.treeAdapter.getDocumentTypeNodePublicId(node),
systemId = this.treeAdapter.getDocumentTypeNodeSystemId(node);
this.html += '<' + Doctype.serializeContent(name, publicId, systemId) + '>';
};

View File

@@ -0,0 +1,107 @@
'use strict';
var Tokenizer = require('../tokenization/tokenizer'),
TokenizerProxy = require('./tokenizer_proxy'),
Utils = require('../common/utils');
//Default options
var DEFAULT_OPTIONS = {
decodeHtmlEntities: true,
locationInfo: false
};
//Skipping handler
function skip() {
//NOTE: do nothing =)
}
//SimpleApiParser
var SimpleApiParser = module.exports = function (handlers, options) {
this.options = Utils.mergeOptions(DEFAULT_OPTIONS, options);
this.handlers = {
doctype: this._wrapHandler(handlers.doctype),
startTag: this._wrapHandler(handlers.startTag),
endTag: this._wrapHandler(handlers.endTag),
text: this._wrapHandler(handlers.text),
comment: this._wrapHandler(handlers.comment)
};
};
SimpleApiParser.prototype._wrapHandler = function (handler) {
var parser = this;
handler = handler || skip;
if (this.options.locationInfo) {
return function () {
var args = Array.prototype.slice.call(arguments);
args.push(parser.currentTokenLocation);
handler.apply(handler, args);
};
}
return handler;
};
//API
SimpleApiParser.prototype.parse = function (html) {
var token = null;
this._reset(html);
do {
token = this.tokenizerProxy.getNextToken();
if (token.type === Tokenizer.CHARACTER_TOKEN ||
token.type === Tokenizer.WHITESPACE_CHARACTER_TOKEN ||
token.type === Tokenizer.NULL_CHARACTER_TOKEN) {
if (this.options.locationInfo) {
if (this.pendingText === null)
this.currentTokenLocation = token.location;
else
this.currentTokenLocation.end = token.location.end;
}
this.pendingText = (this.pendingText || '') + token.chars;
}
else {
this._emitPendingText();
this._handleToken(token);
}
} while (token.type !== Tokenizer.EOF_TOKEN);
};
//Internals
SimpleApiParser.prototype._handleToken = function (token) {
if (this.options.locationInfo)
this.currentTokenLocation = token.location;
if (token.type === Tokenizer.START_TAG_TOKEN)
this.handlers.startTag(token.tagName, token.attrs, token.selfClosing);
else if (token.type === Tokenizer.END_TAG_TOKEN)
this.handlers.endTag(token.tagName);
else if (token.type === Tokenizer.COMMENT_TOKEN)
this.handlers.comment(token.data);
else if (token.type === Tokenizer.DOCTYPE_TOKEN)
this.handlers.doctype(token.name, token.publicId, token.systemId);
};
SimpleApiParser.prototype._reset = function (html) {
this.tokenizerProxy = new TokenizerProxy(html, this.options);
this.pendingText = null;
this.currentTokenLocation = null;
};
SimpleApiParser.prototype._emitPendingText = function () {
if (this.pendingText !== null) {
this.handlers.text(this.pendingText);
this.pendingText = null;
}
};

View File

@@ -0,0 +1,122 @@
'use strict';
var Tokenizer = require('../tokenization/tokenizer'),
ForeignContent = require('../common/foreign_content'),
UNICODE = require('../common/unicode'),
HTML = require('../common/html');
//Aliases
var $ = HTML.TAG_NAMES,
NS = HTML.NAMESPACES;
//Tokenizer proxy
//NOTE: this proxy simulates adjustment of the Tokenizer which performed by standard parser during tree construction.
var TokenizerProxy = module.exports = function (html, options) {
this.tokenizer = new Tokenizer(html, options);
this.namespaceStack = [];
this.namespaceStackTop = -1;
this.currentNamespace = null;
this.inForeignContent = false;
};
//API
TokenizerProxy.prototype.getNextToken = function () {
var token = this.tokenizer.getNextToken();
if (token.type === Tokenizer.START_TAG_TOKEN)
this._handleStartTagToken(token);
else if (token.type === Tokenizer.END_TAG_TOKEN)
this._handleEndTagToken(token);
else if (token.type === Tokenizer.NULL_CHARACTER_TOKEN && this.inForeignContent) {
token.type = Tokenizer.CHARACTER_TOKEN;
token.chars = UNICODE.REPLACEMENT_CHARACTER;
}
return token;
};
//Namespace stack mutations
TokenizerProxy.prototype._enterNamespace = function (namespace) {
this.namespaceStackTop++;
this.namespaceStack.push(namespace);
this.inForeignContent = namespace !== NS.HTML;
this.currentNamespace = namespace;
this.tokenizer.allowCDATA = this.inForeignContent;
};
TokenizerProxy.prototype._leaveCurrentNamespace = function () {
this.namespaceStackTop--;
this.namespaceStack.pop();
this.currentNamespace = this.namespaceStack[this.namespaceStackTop];
this.inForeignContent = this.currentNamespace !== NS.HTML;
this.tokenizer.allowCDATA = this.inForeignContent;
};
//Token handlers
TokenizerProxy.prototype._ensureTokenizerMode = function (tn) {
if (tn === $.TEXTAREA || tn === $.TITLE)
this.tokenizer.state = Tokenizer.MODE.RCDATA;
else if (tn === $.PLAINTEXT)
this.tokenizer.state = Tokenizer.MODE.PLAINTEXT;
else if (tn === $.SCRIPT)
this.tokenizer.state = Tokenizer.MODE.SCRIPT_DATA;
else if (tn === $.STYLE || tn === $.IFRAME || tn === $.XMP ||
tn === $.NOEMBED || tn === $.NOFRAMES || tn === $.NOSCRIPT) {
this.tokenizer.state = Tokenizer.MODE.RAWTEXT;
}
};
TokenizerProxy.prototype._handleStartTagToken = function (token) {
var tn = token.tagName;
if (tn === $.SVG)
this._enterNamespace(NS.SVG);
else if (tn === $.MATH)
this._enterNamespace(NS.MATHML);
else {
if (this.inForeignContent) {
if (ForeignContent.causesExit(token))
this._leaveCurrentNamespace();
else if (ForeignContent.isMathMLTextIntegrationPoint(tn, this.currentNamespace) ||
ForeignContent.isHtmlIntegrationPoint(tn, this.currentNamespace, token.attrs)) {
this._enterNamespace(NS.HTML);
}
}
else
this._ensureTokenizerMode(tn);
}
};
TokenizerProxy.prototype._handleEndTagToken = function (token) {
var tn = token.tagName;
if (!this.inForeignContent) {
var previousNs = this.namespaceStack[this.namespaceStackTop - 1];
//NOTE: check for exit from integration point
if (ForeignContent.isMathMLTextIntegrationPoint(tn, previousNs) ||
ForeignContent.isHtmlIntegrationPoint(tn, previousNs, token.attrs)) {
this._leaveCurrentNamespace();
}
else if (tn === $.SCRIPT)
this.tokenizer.state = Tokenizer.MODE.DATA;
}
else if ((tn === $.SVG && this.currentNamespace === NS.SVG) ||
(tn === $.MATH && this.currentNamespace === NS.MATHML))
this._leaveCurrentNamespace();
};

View File

@@ -0,0 +1,80 @@
'use strict';
exports.assign = function (tokenizer) {
//NOTE: obtain Tokenizer proto this way to avoid module circular references
var tokenizerProto = Object.getPrototypeOf(tokenizer);
tokenizer.tokenStartLoc = -1;
//NOTE: add location info builder method
tokenizer._attachLocationInfo = function (token) {
token.location = {
start: this.tokenStartLoc,
end: -1
};
};
//NOTE: patch token creation methods and attach location objects
tokenizer._createStartTagToken = function (tagNameFirstCh) {
tokenizerProto._createStartTagToken.call(this, tagNameFirstCh);
this._attachLocationInfo(this.currentToken);
};
tokenizer._createEndTagToken = function (tagNameFirstCh) {
tokenizerProto._createEndTagToken.call(this, tagNameFirstCh);
this._attachLocationInfo(this.currentToken);
};
tokenizer._createCommentToken = function () {
tokenizerProto._createCommentToken.call(this);
this._attachLocationInfo(this.currentToken);
};
tokenizer._createDoctypeToken = function (doctypeNameFirstCh) {
tokenizerProto._createDoctypeToken.call(this, doctypeNameFirstCh);
this._attachLocationInfo(this.currentToken);
};
tokenizer._createCharacterToken = function (type, ch) {
tokenizerProto._createCharacterToken.call(this, type, ch);
this._attachLocationInfo(this.currentCharacterToken);
};
//NOTE: patch token emission methods to determine end location
tokenizer._emitCurrentToken = function () {
//NOTE: if we have pending character token make it's end location equal to the
//current token's start location.
if (this.currentCharacterToken)
this.currentCharacterToken.location.end = this.currentToken.location.start;
this.currentToken.location.end = this.preprocessor.pos + 1;
tokenizerProto._emitCurrentToken.call(this);
};
tokenizer._emitCurrentCharacterToken = function () {
//NOTE: if we have character token and it's location wasn't set in the _emitCurrentToken(),
//then set it's location at the current preprocessor position
if (this.currentCharacterToken && this.currentCharacterToken.location.end === -1) {
//NOTE: we don't need to increment preprocessor position, since character token
//emission is always forced by the start of the next character token here.
//So, we already have advanced position.
this.currentCharacterToken.location.end = this.preprocessor.pos;
}
tokenizerProto._emitCurrentCharacterToken.call(this);
};
//NOTE: patch initial states for each mode to obtain token start position
Object.keys(tokenizerProto.MODE)
.map(function (modeName) {
return tokenizerProto.MODE[modeName];
})
.forEach(function (state) {
tokenizer[state] = function (cp) {
this.tokenStartLoc = this.preprocessor.pos;
tokenizerProto[state].call(this, cp);
};
});
};

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,115 @@
'use strict';
var UNICODE = require('../common/unicode');
//Aliases
var $ = UNICODE.CODE_POINTS;
//Utils
//OPTIMIZATION: these utility functions should not be moved out of this module. V8 Crankshaft will not inline
//this functions if they will be situated in another module due to context switch.
//Always perform inlining check before modifying this functions ('node --trace-inlining').
function isReservedCodePoint(cp) {
return cp >= 0xD800 && cp <= 0xDFFF || cp > 0x10FFFF;
}
function isSurrogatePair(cp1, cp2) {
return cp1 >= 0xD800 && cp1 <= 0xDBFF && cp2 >= 0xDC00 && cp2 <= 0xDFFF;
}
function getSurrogatePairCodePoint(cp1, cp2) {
return (cp1 - 0xD800) * 0x400 + 0x2400 + cp2;
}
//Preprocessor
//NOTE: HTML input preprocessing
//(see: http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#preprocessing-the-input-stream)
var Preprocessor = module.exports = function (html) {
this.write(html);
//NOTE: one leading U+FEFF BYTE ORDER MARK character must be ignored if any are present in the input stream.
this.pos = this.html.charCodeAt(0) === $.BOM ? 0 : -1;
this.gapStack = [];
this.lastGapPos = -1;
this.skipNextNewLine = false;
};
Preprocessor.prototype.write = function (html) {
if (this.html) {
this.html = this.html.substring(0, this.pos + 1) +
html +
this.html.substring(this.pos + 1, this.html.length);
}
else
this.html = html;
this.lastCharPos = this.html.length - 1;
};
Preprocessor.prototype.advanceAndPeekCodePoint = function () {
this.pos++;
if (this.pos > this.lastCharPos)
return $.EOF;
var cp = this.html.charCodeAt(this.pos);
//NOTE: any U+000A LINE FEED (LF) characters that immediately follow a U+000D CARRIAGE RETURN (CR) character
//must be ignored.
if (this.skipNextNewLine && cp === $.LINE_FEED) {
this.skipNextNewLine = false;
this._addGap();
return this.advanceAndPeekCodePoint();
}
//NOTE: all U+000D CARRIAGE RETURN (CR) characters must be converted to U+000A LINE FEED (LF) characters
if (cp === $.CARRIAGE_RETURN) {
this.skipNextNewLine = true;
return $.LINE_FEED;
}
this.skipNextNewLine = false;
//OPTIMIZATION: first perform check if the code point in the allowed range that covers most common
//HTML input (e.g. ASCII codes) to avoid performance-cost operations for high-range code points.
return cp >= 0xD800 ? this._processHighRangeCodePoint(cp) : cp;
};
Preprocessor.prototype._processHighRangeCodePoint = function (cp) {
//NOTE: try to peek a surrogate pair
if (this.pos !== this.lastCharPos) {
var nextCp = this.html.charCodeAt(this.pos + 1);
if (isSurrogatePair(cp, nextCp)) {
//NOTE: we have a surrogate pair. Peek pair character and recalculate code point.
this.pos++;
cp = getSurrogatePairCodePoint(cp, nextCp);
//NOTE: add gap that should be avoided during retreat
this._addGap();
}
}
if (isReservedCodePoint(cp))
cp = $.REPLACEMENT_CHARACTER;
return cp;
};
Preprocessor.prototype._addGap = function () {
this.gapStack.push(this.lastGapPos);
this.lastGapPos = this.pos;
};
Preprocessor.prototype.retreat = function () {
if (this.pos === this.lastGapPos) {
this.lastGapPos = this.gapStack.pop();
this.pos--;
}
this.pos--;
};

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,200 @@
'use strict';
//Node construction
exports.createDocument = function () {
return {
nodeName: '#document',
quirksMode: false,
childNodes: []
};
};
exports.createDocumentFragment = function () {
return {
nodeName: '#document-fragment',
quirksMode: false,
childNodes: []
};
};
exports.createElement = function (tagName, namespaceURI, attrs) {
return {
nodeName: tagName,
tagName: tagName,
attrs: attrs,
namespaceURI: namespaceURI,
childNodes: [],
parentNode: null
};
};
exports.createCommentNode = function (data) {
return {
nodeName: '#comment',
data: data,
parentNode: null
};
};
var createTextNode = function (value) {
return {
nodeName: '#text',
value: value,
parentNode: null
}
};
//Tree mutation
exports.setDocumentType = function (document, name, publicId, systemId) {
var doctypeNode = null;
for (var i = 0; i < document.childNodes.length; i++) {
if (document.childNodes[i].nodeName === '#documentType') {
doctypeNode = document.childNodes[i];
break;
}
}
if (doctypeNode) {
doctypeNode.name = name;
doctypeNode.publicId = publicId;
doctypeNode.systemId = systemId;
}
else {
appendChild(document, {
nodeName: '#documentType',
name: name,
publicId: publicId,
systemId: systemId
});
}
};
exports.setQuirksMode = function (document) {
document.quirksMode = true;
};
exports.isQuirksMode = function (document) {
return document.quirksMode;
};
var appendChild = exports.appendChild = function (parentNode, newNode) {
parentNode.childNodes.push(newNode);
newNode.parentNode = parentNode;
};
var insertBefore = exports.insertBefore = function (parentNode, newNode, referenceNode) {
var insertionIdx = parentNode.childNodes.indexOf(referenceNode);
parentNode.childNodes.splice(insertionIdx, 0, newNode);
newNode.parentNode = parentNode;
};
exports.detachNode = function (node) {
if (node.parentNode) {
var idx = node.parentNode.childNodes.indexOf(node);
node.parentNode.childNodes.splice(idx, 1);
node.parentNode = null;
}
};
exports.insertText = function (parentNode, text) {
if (parentNode.childNodes.length) {
var prevNode = parentNode.childNodes[parentNode.childNodes.length - 1];
if (prevNode.nodeName === '#text') {
prevNode.value += text;
return;
}
}
appendChild(parentNode, createTextNode(text));
};
exports.insertTextBefore = function (parentNode, text, referenceNode) {
var prevNode = parentNode.childNodes[parentNode.childNodes.indexOf(referenceNode) - 1];
if (prevNode && prevNode.nodeName === '#text')
prevNode.value += text;
else
insertBefore(parentNode, createTextNode(text), referenceNode);
};
exports.adoptAttributes = function (recipientNode, attrs) {
var recipientAttrsMap = [];
for (var i = 0; i < recipientNode.attrs.length; i++)
recipientAttrsMap.push(recipientNode.attrs[i].name);
for (var j = 0; j < attrs.length; j++) {
if (recipientAttrsMap.indexOf(attrs[j].name) === -1)
recipientNode.attrs.push(attrs[j]);
}
};
//Tree traversing
exports.getFirstChild = function (node) {
return node.childNodes[0];
};
exports.getChildNodes = function (node) {
return node.childNodes;
};
exports.getParentNode = function (node) {
return node.parentNode;
};
exports.getAttrList = function (node) {
return node.attrs;
};
//Node data
exports.getTagName = function (element) {
return element.tagName;
};
exports.getNamespaceURI = function (element) {
return element.namespaceURI;
};
exports.getTextNodeContent = function (textNode) {
return textNode.value;
};
exports.getCommentNodeContent = function (commentNode) {
return commentNode.data;
};
exports.getDocumentTypeNodeName = function (doctypeNode) {
return doctypeNode.name;
};
exports.getDocumentTypeNodePublicId = function (doctypeNode) {
return doctypeNode.publicId;
};
exports.getDocumentTypeNodeSystemId = function (doctypeNode) {
return doctypeNode.systemId;
};
//Node types
exports.isTextNode = function (node) {
return node.nodeName === '#text';
};
exports.isCommentNode = function (node) {
return node.nodeName === '#comment';
};
exports.isDocumentTypeNode = function (node) {
return node.nodeName === '#documentType';
};
exports.isElementNode = function (node) {
return !!node.tagName;
};

View File

@@ -0,0 +1,317 @@
'use strict';
var Doctype = require('../common/doctype');
//Conversion tables for DOM Level1 structure emulation
var nodeTypes = {
element: 1,
text: 3,
cdata: 4,
comment: 8
};
var nodePropertyShorthands = {
tagName: 'name',
childNodes: 'children',
parentNode: 'parent',
previousSibling: 'prev',
nextSibling: 'next',
nodeValue: 'data'
};
//Node
var Node = function (props) {
for (var key in props) {
if (props.hasOwnProperty(key))
this[key] = props[key];
}
};
Node.prototype = {
get firstChild() {
var children = this.children;
return children && children[0] || null;
},
get lastChild() {
var children = this.children;
return children && children[children.length - 1] || null;
},
get nodeType() {
return nodeTypes[this.type] || nodeTypes.element;
}
};
Object.keys(nodePropertyShorthands).forEach(function (key) {
var shorthand = nodePropertyShorthands[key];
Object.defineProperty(Node.prototype, key, {
get: function () {
return this[shorthand] || null;
},
set: function (val) {
this[shorthand] = val;
return val;
}
});
});
//Node construction
exports.createDocument =
exports.createDocumentFragment = function () {
return new Node({
type: 'root',
name: 'root',
parent: null,
prev: null,
next: null,
children: []
});
};
exports.createElement = function (tagName, namespaceURI, attrs) {
var attribs = {},
attribsNamespace = {},
attribsPrefix = {};
for (var i = 0; i < attrs.length; i++) {
var attrName = attrs[i].name;
attribs[attrName] = attrs[i].value;
attribsNamespace[attrName] = attrs[i].namespace;
attribsPrefix[attrName] = attrs[i].prefix;
}
return new Node({
type: tagName === 'script' || tagName === 'style' ? tagName : 'tag',
name: tagName,
namespace: namespaceURI,
attribs: attribs,
'x-attribsNamespace': attribsNamespace,
'x-attribsPrefix': attribsPrefix,
children: [],
parent: null,
prev: null,
next: null
});
};
exports.createCommentNode = function (data) {
return new Node({
type: 'comment',
data: data,
parent: null,
prev: null,
next: null
});
};
var createTextNode = function (value) {
return new Node({
type: 'text',
data: value,
parent: null,
prev: null,
next: null
});
};
//Tree mutation
exports.setDocumentType = function (document, name, publicId, systemId) {
var data = Doctype.serializeContent(name, publicId, systemId),
doctypeNode = null;
for (var i = 0; i < document.children.length; i++) {
if (document.children[i].type === 'directive' && document.children[i].name === '!doctype') {
doctypeNode = document.children[i];
break;
}
}
if (doctypeNode) {
doctypeNode.data = data;
doctypeNode['x-name'] = name;
doctypeNode['x-publicId'] = publicId;
doctypeNode['x-systemId'] = systemId;
}
else {
appendChild(document, new Node({
type: 'directive',
name: '!doctype',
data: data,
'x-name': name,
'x-publicId': publicId,
'x-systemId': systemId
}));
}
};
exports.setQuirksMode = function (document) {
document.quirksMode = true;
};
exports.isQuirksMode = function (document) {
return document.quirksMode;
};
var appendChild = exports.appendChild = function (parentNode, newNode) {
var prev = parentNode.children[parentNode.children.length - 1];
if (prev) {
prev.next = newNode;
newNode.prev = prev;
}
parentNode.children.push(newNode);
newNode.parent = parentNode;
};
var insertBefore = exports.insertBefore = function (parentNode, newNode, referenceNode) {
var insertionIdx = parentNode.children.indexOf(referenceNode),
prev = referenceNode.prev;
if (prev) {
prev.next = newNode;
newNode.prev = prev;
}
referenceNode.prev = newNode;
newNode.next = referenceNode;
parentNode.children.splice(insertionIdx, 0, newNode);
newNode.parent = parentNode;
};
exports.detachNode = function (node) {
if (node.parent) {
var idx = node.parent.children.indexOf(node),
prev = node.prev,
next = node.next;
node.prev = null;
node.next = null;
if (prev)
prev.next = next;
if (next)
next.prev = prev;
node.parent.children.splice(idx, 1);
node.parent = null;
}
};
exports.insertText = function (parentNode, text) {
var lastChild = parentNode.children[parentNode.children.length - 1];
if (lastChild && lastChild.type === 'text')
lastChild.data += text;
else
appendChild(parentNode, createTextNode(text));
};
exports.insertTextBefore = function (parentNode, text, referenceNode) {
var prevNode = parentNode.children[parentNode.children.indexOf(referenceNode) - 1];
if (prevNode && prevNode.type === 'text')
prevNode.data += text;
else
insertBefore(parentNode, createTextNode(text), referenceNode);
};
exports.adoptAttributes = function (recipientNode, attrs) {
for (var i = 0; i < attrs.length; i++) {
var attrName = attrs[i].name;
if (typeof recipientNode.attribs[attrName] === 'undefined') {
recipientNode.attribs[attrName] = attrs[i].value;
recipientNode['x-attribsNamespace'][attrName] = attrs[i].namespace;
recipientNode['x-attribsPrefix'][attrName] = attrs[i].prefix;
}
}
};
//Tree traversing
exports.getFirstChild = function (node) {
return node.children[0];
};
exports.getChildNodes = function (node) {
return node.children;
};
exports.getParentNode = function (node) {
return node.parent;
};
exports.getAttrList = function (node) {
var attrList = [];
for (var name in node.attribs) {
if (node.attribs.hasOwnProperty(name)) {
attrList.push({
name: name,
value: node.attribs[name],
namespace: node['x-attribsNamespace'][name],
prefix: node['x-attribsPrefix'][name]
});
}
}
return attrList;
};
//Node data
exports.getTagName = function (element) {
return element.name;
};
exports.getNamespaceURI = function (element) {
return element.namespace;
};
exports.getTextNodeContent = function (textNode) {
return textNode.data;
};
exports.getCommentNodeContent = function (commentNode) {
return commentNode.data;
};
exports.getDocumentTypeNodeName = function (doctypeNode) {
return doctypeNode['x-name'];
};
exports.getDocumentTypeNodePublicId = function (doctypeNode) {
return doctypeNode['x-publicId'];
};
exports.getDocumentTypeNodeSystemId = function (doctypeNode) {
return doctypeNode['x-systemId'];
};
//Node types
exports.isTextNode = function (node) {
return node.type === 'text';
};
exports.isCommentNode = function (node) {
return node.type === 'comment';
};
exports.isDocumentTypeNode = function (node) {
return node.type === 'directive' && node.name === '!doctype';
};
exports.isElementNode = function (node) {
return !!node.attribs;
};

View File

@@ -0,0 +1,167 @@
'use strict';
//Const
var NOAH_ARK_CAPACITY = 3;
//List of formatting elements
var FormattingElementList = module.exports = function (treeAdapter) {
this.length = 0;
this.entries = [];
this.treeAdapter = treeAdapter;
this.bookmark = null;
};
//Entry types
FormattingElementList.MARKER_ENTRY = 'MARKER_ENTRY';
FormattingElementList.ELEMENT_ENTRY = 'ELEMENT_ENTRY';
//Noah Ark's condition
//OPTIMIZATION: at first we try to find possible candidates for exclusion using
//lightweight heuristics without thorough attributes check.
FormattingElementList.prototype._getNoahArkConditionCandidates = function (newElement) {
var candidates = [];
if (this.length >= NOAH_ARK_CAPACITY) {
var neAttrsLength = this.treeAdapter.getAttrList(newElement).length,
neTagName = this.treeAdapter.getTagName(newElement),
neNamespaceURI = this.treeAdapter.getNamespaceURI(newElement);
for (var i = this.length - 1; i >= 0; i--) {
var entry = this.entries[i];
if (entry.type === FormattingElementList.MARKER_ENTRY)
break;
var element = entry.element,
elementAttrs = this.treeAdapter.getAttrList(element);
if (this.treeAdapter.getTagName(element) === neTagName &&
this.treeAdapter.getNamespaceURI(element) === neNamespaceURI &&
elementAttrs.length === neAttrsLength) {
candidates.push({idx: i, attrs: elementAttrs});
}
}
}
return candidates.length < NOAH_ARK_CAPACITY ? [] : candidates;
};
FormattingElementList.prototype._ensureNoahArkCondition = function (newElement) {
var candidates = this._getNoahArkConditionCandidates(newElement),
cLength = candidates.length;
if (cLength) {
var neAttrs = this.treeAdapter.getAttrList(newElement),
neAttrsLength = neAttrs.length,
neAttrsMap = {};
//NOTE: build attrs map for the new element so we can perform fast lookups
for (var i = 0; i < neAttrsLength; i++) {
var neAttr = neAttrs[i];
neAttrsMap[neAttr.name] = neAttr.value;
}
for (var i = 0; i < neAttrsLength; i++) {
for (var j = 0; j < cLength; j++) {
var cAttr = candidates[j].attrs[i];
if (neAttrsMap[cAttr.name] !== cAttr.value) {
candidates.splice(j, 1);
cLength--;
}
if (candidates.length < NOAH_ARK_CAPACITY)
return;
}
}
//NOTE: remove bottommost candidates until Noah's Ark condition will not be met
for (var i = cLength - 1; i >= NOAH_ARK_CAPACITY - 1; i--) {
this.entries.splice(candidates[i].idx, 1);
this.length--;
}
}
};
//Mutations
FormattingElementList.prototype.insertMarker = function () {
this.entries.push({type: FormattingElementList.MARKER_ENTRY});
this.length++;
};
FormattingElementList.prototype.pushElement = function (element, token) {
this._ensureNoahArkCondition(element);
this.entries.push({
type: FormattingElementList.ELEMENT_ENTRY,
element: element,
token: token
});
this.length++;
};
FormattingElementList.prototype.insertElementAfterBookmark = function (element, token) {
var bookmarkIdx = this.length - 1;
for (; bookmarkIdx >= 0; bookmarkIdx--) {
if (this.entries[bookmarkIdx] === this.bookmark)
break;
}
this.entries.splice(bookmarkIdx + 1, 0, {
type: FormattingElementList.ELEMENT_ENTRY,
element: element,
token: token
});
this.length++;
};
FormattingElementList.prototype.removeEntry = function (entry) {
for (var i = this.length - 1; i >= 0; i--) {
if (this.entries[i] === entry) {
this.entries.splice(i, 1);
this.length--;
break;
}
}
};
FormattingElementList.prototype.clearToLastMarker = function () {
while (this.length) {
var entry = this.entries.pop();
this.length--;
if (entry.type === FormattingElementList.MARKER_ENTRY)
break;
}
};
//Search
FormattingElementList.prototype.getElementEntryInScopeWithTagName = function (tagName) {
for (var i = this.length - 1; i >= 0; i--) {
var entry = this.entries[i];
if (entry.type === FormattingElementList.MARKER_ENTRY)
return null;
if (this.treeAdapter.getTagName(entry.element) === tagName)
return entry;
}
return null;
};
FormattingElementList.prototype.getElementEntry = function (element) {
for (var i = this.length - 1; i >= 0; i--) {
var entry = this.entries[i];
if (entry.type === FormattingElementList.ELEMENT_ENTRY && entry.element == element)
return entry;
}
return null;
};

View File

@@ -0,0 +1,197 @@
'use strict';
var OpenElementStack = require('./open_element_stack'),
Tokenizer = require('../tokenization/tokenizer'),
HTML = require('../common/html');
//Aliases
var $ = HTML.TAG_NAMES;
function setEndLocation(element, closingToken, treeAdapter) {
var loc = element.__location;
if (!loc)
return;
if (!loc.startTag) {
loc.startTag = {
start: loc.start,
end: loc.end
};
}
if (closingToken.location) {
var tn = treeAdapter.getTagName(element),
// NOTE: For cases like <p> <p> </p> - First 'p' closes without a closing tag and
// for cases like <td> <p> </td> - 'p' closes without a closing tag
isClosingEndTag = closingToken.type === Tokenizer.END_TAG_TOKEN &&
tn === closingToken.tagName;
if (isClosingEndTag) {
loc.endTag = {
start: closingToken.location.start,
end: closingToken.location.end
};
}
loc.end = closingToken.location.end;
}
}
//NOTE: patch open elements stack, so we can assign end location for the elements
function patchOpenElementsStack(stack, parser) {
var treeAdapter = parser.treeAdapter;
stack.pop = function () {
setEndLocation(this.current, parser.currentToken, treeAdapter);
OpenElementStack.prototype.pop.call(this);
};
stack.popAllUpToHtmlElement = function () {
for (var i = this.stackTop; i > 0; i--)
setEndLocation(this.items[i], parser.currentToken, treeAdapter);
OpenElementStack.prototype.popAllUpToHtmlElement.call(this);
};
stack.remove = function (element) {
setEndLocation(element, parser.currentToken, treeAdapter);
OpenElementStack.prototype.remove.call(this, element);
};
}
exports.assign = function (parser) {
//NOTE: obtain Parser proto this way to avoid module circular references
var parserProto = Object.getPrototypeOf(parser),
treeAdapter = parser.treeAdapter;
//NOTE: patch _reset method
parser._reset = function (html, document, fragmentContext) {
parserProto._reset.call(this, html, document, fragmentContext);
this.attachableElementLocation = null;
this.lastFosterParentingLocation = null;
this.currentToken = null;
patchOpenElementsStack(this.openElements, parser);
};
parser._processTokenInForeignContent = function (token) {
this.currentToken = token;
parserProto._processTokenInForeignContent.call(this, token);
};
parser._processToken = function (token) {
this.currentToken = token;
parserProto._processToken.call(this, token);
//NOTE: <body> and <html> are never popped from the stack, so we need to updated
//their end location explicitly.
if (token.type === Tokenizer.END_TAG_TOKEN &&
(token.tagName === $.HTML ||
(token.tagName === $.BODY && this.openElements.hasInScope($.BODY)))) {
for (var i = this.openElements.stackTop; i >= 0; i--) {
var element = this.openElements.items[i];
if (this.treeAdapter.getTagName(element) === token.tagName) {
setEndLocation(element, token, treeAdapter);
break;
}
}
}
};
//Doctype
parser._setDocumentType = function (token) {
parserProto._setDocumentType.call(this, token);
var documentChildren = this.treeAdapter.getChildNodes(this.document),
cnLength = documentChildren.length;
for (var i = 0; i < cnLength; i++) {
var node = documentChildren[i];
if (this.treeAdapter.isDocumentTypeNode(node)) {
node.__location = token.location;
break;
}
}
};
//Elements
parser._attachElementToTree = function (element) {
//NOTE: _attachElementToTree is called from _appendElement, _insertElement and _insertTemplate methods.
//So we will use token location stored in this methods for the element.
element.__location = this.attachableElementLocation || null;
this.attachableElementLocation = null;
parserProto._attachElementToTree.call(this, element);
};
parser._appendElement = function (token, namespaceURI) {
this.attachableElementLocation = token.location;
parserProto._appendElement.call(this, token, namespaceURI);
};
parser._insertElement = function (token, namespaceURI) {
this.attachableElementLocation = token.location;
parserProto._insertElement.call(this, token, namespaceURI);
};
parser._insertTemplate = function (token) {
this.attachableElementLocation = token.location;
parserProto._insertTemplate.call(this, token);
var tmplContent = this.treeAdapter.getChildNodes(this.openElements.current)[0];
tmplContent.__location = null;
};
parser._insertFakeRootElement = function () {
parserProto._insertFakeRootElement.call(this);
this.openElements.current.__location = null;
};
//Comments
parser._appendCommentNode = function (token, parent) {
parserProto._appendCommentNode.call(this, token, parent);
var children = this.treeAdapter.getChildNodes(parent),
commentNode = children[children.length - 1];
commentNode.__location = token.location;
};
//Text
parser._findFosterParentingLocation = function () {
//NOTE: store last foster parenting location, so we will be able to find inserted text
//in case of foster parenting
this.lastFosterParentingLocation = parserProto._findFosterParentingLocation.call(this);
return this.lastFosterParentingLocation;
};
parser._insertCharacters = function (token) {
parserProto._insertCharacters.call(this, token);
var hasFosterParent = this._shouldFosterParentOnInsertion(),
parentingLocation = this.lastFosterParentingLocation,
parent = (hasFosterParent && parentingLocation.parent) ||
this.openElements.currentTmplContent ||
this.openElements.current,
siblings = this.treeAdapter.getChildNodes(parent),
textNodeIdx = hasFosterParent && parentingLocation.beforeElement ?
siblings.indexOf(parentingLocation.beforeElement) - 1 :
siblings.length - 1,
textNode = siblings[textNodeIdx];
//NOTE: if we have location assigned by another token, then just update end position
if (textNode.__location)
textNode.__location.end = token.location.end;
else
textNode.__location = token.location;
};
};

View File

@@ -0,0 +1,379 @@
'use strict';
var HTML = require('../common/html');
//Aliases
var $ = HTML.TAG_NAMES,
NS = HTML.NAMESPACES;
//Element utils
//OPTIMIZATION: Integer comparisons are low-cost, so we can use very fast tag name length filters here.
//It's faster than using dictionary.
function isImpliedEndTagRequired(tn) {
switch (tn.length) {
case 1:
return tn === $.P;
case 2:
return tn === $.RP || tn === $.RT || tn === $.DD || tn === $.DT || tn === $.LI;
case 6:
return tn === $.OPTION;
case 8:
return tn === $.OPTGROUP;
}
return false;
}
function isScopingElement(tn, ns) {
switch (tn.length) {
case 2:
if (tn === $.TD || tn === $.TH)
return ns === NS.HTML;
else if (tn === $.MI || tn === $.MO || tn == $.MN || tn === $.MS)
return ns === NS.MATHML;
break;
case 4:
if (tn === $.HTML)
return ns === NS.HTML;
else if (tn === $.DESC)
return ns === NS.SVG;
break;
case 5:
if (tn === $.TABLE)
return ns === NS.HTML;
else if (tn === $.MTEXT)
return ns === NS.MATHML;
else if (tn === $.TITLE)
return ns === NS.SVG;
break;
case 6:
return (tn === $.APPLET || tn === $.OBJECT) && ns === NS.HTML;
case 7:
return (tn === $.CAPTION || tn === $.MARQUEE) && ns === NS.HTML;
case 8:
return tn === $.TEMPLATE && ns === NS.HTML;
case 13:
return tn === $.FOREIGN_OBJECT && ns === NS.SVG;
case 14:
return tn === $.ANNOTATION_XML && ns === NS.MATHML;
}
return false;
}
//Stack of open elements
var OpenElementStack = module.exports = function (document, treeAdapter) {
this.stackTop = -1;
this.items = [];
this.current = document;
this.currentTagName = null;
this.currentTmplContent = null;
this.tmplCount = 0;
this.treeAdapter = treeAdapter;
};
//Index of element
OpenElementStack.prototype._indexOf = function (element) {
var idx = -1;
for (var i = this.stackTop; i >= 0; i--) {
if (this.items[i] === element) {
idx = i;
break;
}
}
return idx;
};
//Update current element
OpenElementStack.prototype._isInTemplate = function () {
if (this.currentTagName !== $.TEMPLATE)
return false;
return this.treeAdapter.getNamespaceURI(this.current) === NS.HTML;
};
OpenElementStack.prototype._updateCurrentElement = function () {
this.current = this.items[this.stackTop];
this.currentTagName = this.current && this.treeAdapter.getTagName(this.current);
this.currentTmplContent = this._isInTemplate() ? this.treeAdapter.getChildNodes(this.current)[0] : null;
};
//Mutations
OpenElementStack.prototype.push = function (element) {
this.items[++this.stackTop] = element;
this._updateCurrentElement();
if (this._isInTemplate())
this.tmplCount++;
};
OpenElementStack.prototype.pop = function () {
this.stackTop--;
if (this.tmplCount > 0 && this._isInTemplate())
this.tmplCount--;
this._updateCurrentElement();
};
OpenElementStack.prototype.replace = function (oldElement, newElement) {
var idx = this._indexOf(oldElement);
this.items[idx] = newElement;
if (idx === this.stackTop)
this._updateCurrentElement();
};
OpenElementStack.prototype.insertAfter = function (referenceElement, newElement) {
var insertionIdx = this._indexOf(referenceElement) + 1;
this.items.splice(insertionIdx, 0, newElement);
if (insertionIdx == ++this.stackTop)
this._updateCurrentElement();
};
OpenElementStack.prototype.popUntilTagNamePopped = function (tagName) {
while (this.stackTop > -1) {
var tn = this.currentTagName;
this.pop();
if (tn === tagName)
break;
}
};
OpenElementStack.prototype.popUntilTemplatePopped = function () {
while (this.stackTop > -1) {
var tn = this.currentTagName,
ns = this.treeAdapter.getNamespaceURI(this.current);
this.pop();
if (tn === $.TEMPLATE && ns === NS.HTML)
break;
}
};
OpenElementStack.prototype.popUntilElementPopped = function (element) {
while (this.stackTop > -1) {
var poppedElement = this.current;
this.pop();
if (poppedElement === element)
break;
}
};
OpenElementStack.prototype.popUntilNumberedHeaderPopped = function () {
while (this.stackTop > -1) {
var tn = this.currentTagName;
this.pop();
if (tn === $.H1 || tn === $.H2 || tn === $.H3 || tn === $.H4 || tn === $.H5 || tn === $.H6)
break;
}
};
OpenElementStack.prototype.popAllUpToHtmlElement = function () {
//NOTE: here we assume that root <html> element is always first in the open element stack, so
//we perform this fast stack clean up.
this.stackTop = 0;
this._updateCurrentElement();
};
OpenElementStack.prototype.clearBackToTableContext = function () {
while (this.currentTagName !== $.TABLE && this.currentTagName !== $.TEMPLATE && this.currentTagName !== $.HTML)
this.pop();
};
OpenElementStack.prototype.clearBackToTableBodyContext = function () {
while (this.currentTagName !== $.TBODY && this.currentTagName !== $.TFOOT &&
this.currentTagName !== $.THEAD && this.currentTagName !== $.TEMPLATE &&
this.currentTagName !== $.HTML) {
this.pop();
}
};
OpenElementStack.prototype.clearBackToTableRowContext = function () {
while (this.currentTagName !== $.TR && this.currentTagName !== $.TEMPLATE && this.currentTagName !== $.HTML)
this.pop();
};
OpenElementStack.prototype.remove = function (element) {
for (var i = this.stackTop; i >= 0; i--) {
if (this.items[i] === element) {
this.items.splice(i, 1);
this.stackTop--;
this._updateCurrentElement();
break;
}
}
};
//Search
OpenElementStack.prototype.tryPeekProperlyNestedBodyElement = function () {
//Properly nested <body> element (should be second element in stack).
var element = this.items[1];
return element && this.treeAdapter.getTagName(element) === $.BODY ? element : null;
};
OpenElementStack.prototype.contains = function (element) {
return this._indexOf(element) > -1;
};
OpenElementStack.prototype.getCommonAncestor = function (element) {
var elementIdx = this._indexOf(element);
return --elementIdx >= 0 ? this.items[elementIdx] : null;
};
OpenElementStack.prototype.isRootHtmlElementCurrent = function () {
return this.stackTop === 0 && this.currentTagName === $.HTML;
};
//Element in scope
OpenElementStack.prototype.hasInScope = function (tagName) {
for (var i = this.stackTop; i >= 0; i--) {
var tn = this.treeAdapter.getTagName(this.items[i]);
if (tn === tagName)
return true;
var ns = this.treeAdapter.getNamespaceURI(this.items[i]);
if (isScopingElement(tn, ns))
return false;
}
return true;
};
OpenElementStack.prototype.hasNumberedHeaderInScope = function () {
for (var i = this.stackTop; i >= 0; i--) {
var tn = this.treeAdapter.getTagName(this.items[i]);
if (tn === $.H1 || tn === $.H2 || tn === $.H3 || tn === $.H4 || tn === $.H5 || tn === $.H6)
return true;
if (isScopingElement(tn, this.treeAdapter.getNamespaceURI(this.items[i])))
return false;
}
return true;
};
OpenElementStack.prototype.hasInListItemScope = function (tagName) {
for (var i = this.stackTop; i >= 0; i--) {
var tn = this.treeAdapter.getTagName(this.items[i]);
if (tn === tagName)
return true;
var ns = this.treeAdapter.getNamespaceURI(this.items[i]);
if (((tn === $.UL || tn === $.OL) && ns === NS.HTML) || isScopingElement(tn, ns))
return false;
}
return true;
};
OpenElementStack.prototype.hasInButtonScope = function (tagName) {
for (var i = this.stackTop; i >= 0; i--) {
var tn = this.treeAdapter.getTagName(this.items[i]);
if (tn === tagName)
return true;
var ns = this.treeAdapter.getNamespaceURI(this.items[i]);
if ((tn === $.BUTTON && ns === NS.HTML) || isScopingElement(tn, ns))
return false;
}
return true;
};
OpenElementStack.prototype.hasInTableScope = function (tagName) {
for (var i = this.stackTop; i >= 0; i--) {
var tn = this.treeAdapter.getTagName(this.items[i]);
if (tn === tagName)
return true;
var ns = this.treeAdapter.getNamespaceURI(this.items[i]);
if ((tn === $.TABLE || tn === $.TEMPLATE || tn === $.HTML) && ns === NS.HTML)
return false;
}
return true;
};
OpenElementStack.prototype.hasTableBodyContextInTableScope = function () {
for (var i = this.stackTop; i >= 0; i--) {
var tn = this.treeAdapter.getTagName(this.items[i]);
if (tn === $.TBODY || tn === $.THEAD || tn === $.TFOOT)
return true;
var ns = this.treeAdapter.getNamespaceURI(this.items[i]);
if ((tn === $.TABLE || tn === $.HTML) && ns === NS.HTML)
return false;
}
return true;
};
OpenElementStack.prototype.hasInSelectScope = function (tagName) {
for (var i = this.stackTop; i >= 0; i--) {
var tn = this.treeAdapter.getTagName(this.items[i]);
if (tn === tagName)
return true;
var ns = this.treeAdapter.getNamespaceURI(this.items[i]);
if (tn !== $.OPTION && tn !== $.OPTGROUP && ns === NS.HTML)
return false;
}
return true;
};
//Implied end tags
OpenElementStack.prototype.generateImpliedEndTags = function () {
while (isImpliedEndTagRequired(this.currentTagName))
this.pop();
};
OpenElementStack.prototype.generateImpliedEndTagsWithExclusion = function (exclusionTagName) {
while (isImpliedEndTagRequired(this.currentTagName) && this.currentTagName !== exclusionTagName)
this.pop();
};

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,96 @@
{
"_args": [
[
"parse5@1.5.1",
"C:\\Users\\deranjer\\go\\src\\github.com\\deranjer\\goTorrent\\torrent-project"
]
],
"_from": "parse5@1.5.1",
"_id": "parse5@1.5.1",
"_inBundle": false,
"_integrity": "sha1-m387DeMr543CQBsXVzzK8Pb1nZQ=",
"_location": "/react-scripts/parse5",
"_phantomChildren": {},
"_requested": {
"type": "version",
"registry": true,
"raw": "parse5@1.5.1",
"name": "parse5",
"escapedName": "parse5",
"rawSpec": "1.5.1",
"saveSpec": null,
"fetchSpec": "1.5.1"
},
"_requiredBy": [
"/react-scripts/jsdom"
],
"_resolved": "https://registry.npmjs.org/parse5/-/parse5-1.5.1.tgz",
"_spec": "1.5.1",
"_where": "C:\\Users\\deranjer\\go\\src\\github.com\\deranjer\\goTorrent\\torrent-project",
"author": {
"name": "Ivan Nikulin",
"email": "ifaaan@gmail.com",
"url": "https://github.com/inikulin"
},
"bugs": {
"url": "https://github.com/inikulin/parse5/issues"
},
"contributors": [
{
"name": "Alan Clarke",
"url": "https://github.com/alanclarke"
},
{
"name": "Saksham Aggarwal",
"email": "s.agg2021@gmail.com"
},
{
"name": "Sebastian Mayr",
"email": "sebmaster16@gmail.com",
"url": "http://blog.smayr.name"
},
{
"name": "Sean Lang",
"email": "slang800@gmail.com",
"url": "http://slang.cx"
}
],
"description": "WHATWG HTML5 specification-compliant, fast and ready for production HTML parsing/serialization toolset for Node and io.js.",
"devDependencies": {
"mocha": "1.21.4"
},
"homepage": "http://inikulin.github.io/parse5/",
"keywords": [
"html",
"parser",
"html5",
"WHATWG",
"specification",
"fast",
"html parser",
"html5 parser",
"htmlparser",
"parse5",
"serializer",
"html serializer",
"htmlserializer",
"sax",
"simple api"
],
"licenses": [
{
"type": "MIT",
"url": "https://raw.github.com/inikulin/parse5/master/LICENSE"
}
],
"main": "./index.js",
"name": "parse5",
"repository": {
"type": "git",
"url": "git://github.com/inikulin/parse5.git"
},
"scripts": {
"test": "node test/run_tests.js"
},
"version": "1.5.1"
}