parse5-sax-parserStreaming SAX-style HTML parser.
npm install --save parse5-sax-parser
📖 Documentation 📖
Welcome to [email protected]! ✨ This is a huge release with many changes, features and fixes.
From an organisational perspective, the most important change is that parse5 is now maintained by a team, consisting of James (@43081j), Titus (@wooorm) and me (@fb55). We come from three projects that rely on parse5 — namely Cheerio, rehype, and Lit.
We need your support to continue the project! If you care about parse5, please support us financially on OpenCollective.
Headlining features of this release are ES Modules, TypeScript, and performance improvements: 7.0.0 is 45% faster than 6.0.1 with default options, and 167% faster with location information enabled (for the
bench/perf benchmark, on an M1 Mac). Version 7.0.0 is a revamp of every part of the library. There are too many changes to list them all here, so here is a high-level overview:
All of parse5’s packages are now ECMAScript Modules. We are providing dual packages for
parse5-htmlparser2-tree-adapter for now (see https://github.com/inikulin/parse5/pull/418 and https://github.com/inikulin/parse5/pull/496).
To migrate, please read this Gist on how to update. Note that private internals are no longer available; instead, everything that you need should be imported from the main package.
Implemented by @43081j in #351
The codebase has been ported to TypeScript. This helped uncover a number of subtle logic bugs, such as dc4e269022ebbae0767d8f790a29d6be1835fe1e, b4b5d4ad6f90b3c9fd03a90e2ed5267929979a11, or a0aff9578bb44511bc169c1d7f9e2f2780f7f8a0. TypeScript also helps us refactor with confidence and a lot of the changes in this release would have been much harder to do without it.
To migrate, please remove
@types/parse5* as we now ship our own types.
Implemented by @fb55 in #362
parse5-serializer-streampackage was removed https://github.com/inikulin/parse5/pull/481
serializefunction exported by
domhandler’s node interface (https://github.com/inikulin/parse5/pull/327 by @TrySound)
If you are using deep imports for any parts of the codebase, you will likely encounter some breakages:
OpenElementStacknow uses callbacks https://github.com/inikulin/parse5/pull/429
getNextTokenwas removed https://github.com/inikulin/parse5/pull/461
_bootstrapmethod was removed https://github.com/inikulin/parse5/pull/384
entitiesmodule for encoding and decoding entities, sharing maintenance & optimisation work with projects such as htmlparser2 (
entitiesadopted a variant of parse5’s approach of decoding entities. As a result, decoding performance is equivalent, while memory consumption is slightly lower.
<<in comments parsed wrongly as
endTagfor mixed-case foreign elements (#353)
Thanks @anko, @TrySound, @samouri, @alan-agius4, and @pmdartus!
Full Changelog: https://github.com/inikulin/parse5/compare/v6.0.1...v7.0.0
<hr>tags (by @43081j).
updateNodeSourceCodeLocationmethod which enables usage of custom location info formats (GH #314) (by @DMartens).
RewritingStreamnow contains correct
endLinecovering all concatenated raw tokens (GH #266).
RewritingStreamnow flush last buffered chunk when calling
.end()with no parameters (GH #271).
RewritingStreamno longer assume that each binary chunk is a valid finished UTF-8 chunk, and instead accept only decoded strings (GH #269).
Starting from this release
parse5 functionality will be shipped in separate packages. With
parse5 package contatining only basic functionality. Please, refer to the list of packages for more info.
Updated (breaking): source code location now inserted by tree adapter, so tree adapter developers have control over location info property name. Tree adapters should implement setNodeSourceCodeLocation and getNodeSourceCodeLocation methods. Location info property name added by currently implemented tree adapters has been renamed from
sourceCodeLocation (GH #189).
parse5 no longer ship TypeScript definitions. Existing TypeScript definitions have been moved to DefinitelyTyped repo. Please, track the PR in the DefinitelyTyped repo for the updates.