🌜
🌞
cheerio

cheerio

v1.0.0-rc.12

Tiny, fast, and elegant implementation of core jQuery designed specifically for the server

npm install cheerio

README

cheerio

Fast, flexible & lean implementation of core jQuery designed specifically for the server.

中文文档 (Chinese Readme)

const cheerio = require('cheerio');
const $ = cheerio.load('<h2 class="title">Hello world</h2>');

$('h2.title').text('Hello there!');
$('h2').addClass('welcome');

$.html();
//=> <html><head></head><body><h2 class="title welcome">Hello there!</h2></body></html>

Note

We are currently working on the 1.0.0 release of cheerio on the main branch. The source code for the last published version, 0.22.0, can be found here.

Installation

npm install cheerio

Features

❤ Familiar syntax: Cheerio implements a subset of core jQuery. Cheerio removes all the DOM inconsistencies and browser cruft from the jQuery library, revealing its truly gorgeous API.

ϟ Blazingly fast: Cheerio works with a very simple, consistent DOM model. As a result parsing, manipulating, and rendering are incredibly efficient.

❁ Incredibly flexible: Cheerio wraps around parse5 parser and can optionally use @FB55's forgiving htmlparser2. Cheerio can parse nearly any HTML or XML document.

Cheerio is not a web browser

Cheerio parses markup and provides an API for traversing/manipulating the resulting data structure. It does not interpret the result as a web browser does. Specifically, it does not produce a visual rendering, apply CSS, load external resources, or execute JavaScript which is common for a SPA (single page application). This makes Cheerio much, much faster than other solutions. If your use case requires any of this functionality, you should consider browser automation software like Puppeteer and Playwright or DOM emulation projects like JSDom.

API

Markup example we'll be using:

<ul id="fruits">
  <li class="apple">Apple</li>
  <li class="orange">Orange</li>
  <li class="pear">Pear</li>
</ul>

This is the HTML markup we will be using in all of the API examples.

Loading

First you need to load in the HTML. This step in jQuery is implicit, since jQuery operates on the one, baked-in DOM. With Cheerio, we need to pass in the HTML document.

This is the preferred method:

// ES6 or TypeScript:
import * as cheerio from 'cheerio';

// In other environments:
const cheerio = require('cheerio');

const $ = cheerio.load('<ul id="fruits">...</ul>');

$.html();
//=> <html><head></head><body><ul id="fruits">...</ul></body></html>

Similar to web browser contexts, load will introduce <html>, <head>, and <body> elements if they are not already present. You can set load's third argument to false to disable this.

const $ = cheerio.load('<ul id="fruits">...</ul>', null, false);

$.html();
//=> '<ul id="fruits">...</ul>'

Optionally, you can also load in the HTML by passing the string as the context:

$('ul', '<ul id="fruits">...</ul>');

Or as the root:

$('li', 'ul', '<ul id="fruits">...</ul>');

If you need to modify parsing options for XML input, you may pass an extra object to .load():

const $ = cheerio.load('<ul id="fruits">...</ul>', {
  xml: {
    normalizeWhitespace: true,
  },
});

The options in the xml object are taken directly from htmlparser2, therefore any options that can be used in htmlparser2 are valid in cheerio as well. When xml is set, the default options are:

{
    xmlMode: true,
    decodeEntities: true, // Decode HTML entities.
    withStartIndices: false, // Add a `startIndex` property to nodes.
    withEndIndices: false, // Add an `endIndex` property to nodes.
}

For a full list of options and their effects, see domhandler and htmlparser2's options.

Using htmlparser2

Cheerio ships with two parsers, parse5 and htmlparser2. The former is the default for HTML, the latter the default for XML.

Some users may wish to parse markup with the htmlparser2 library, and traverse/manipulate the resulting structure with Cheerio. This may be the case for those upgrading from pre-1.0 releases of Cheerio (which relied on htmlparser2), for those dealing with invalid markup (because htmlparser2 is more forgiving), or for those operating in performance-critical situations (because htmlparser2 may be faster in some cases). Note that "more forgiving" means htmlparser2 has error-correcting mechanisms that aren't always a match for the standards observed by web browsers. This behavior may be useful when parsing non-HTML content.

To support these cases, load also accepts a htmlparser2-compatible data structure as its first argument. Users may install htmlparser2, use it to parse input, and pass the result to load:

// Usage as of htmlparser2 version 6:
const htmlparser2 = require('htmlparser2');
const dom = htmlparser2.parseDocument(document, options);

const $ = cheerio.load(dom);

If you want to save some bytes, you can use Cheerio's slim export, which always uses htmlparser2:

const cheerio = require('cheerio/lib/slim');

Selectors

Cheerio's selector implementation is nearly identical to jQuery's, so the API is very similar.

$( selector, [context], [root] )

selector searches within the context scope which searches within the root scope. selector and context can be a string expression, DOM Element, array of DOM elements, or cheerio object. root is typically the HTML document string.

This selector method is the starting point for traversing and manipulating the document. Like jQuery, it's the primary method for selecting elements in the document.

$('.apple', '#fruits').text();
//=> Apple

$('ul .pear').attr('class');
//=> pear

$('li[class=orange]').html();
//=> Orange
XML Namespaces

You can select with XML Namespaces but due to the CSS specification, the colon (:) needs to be escaped for the selector to be valid.

$('[xml\\:id="main"');

Rendering

When you're ready to render the document, you can call the html method on the "root" selection:

$.root().html();
//=>  <html>
//      <head></head>
//      <body>
//        <ul id="fruits">
//          <li class="apple">Apple</li>
//          <li class="orange">Orange</li>
//          <li class="pear">Pear</li>
//        </ul>
//      </body>
//    </html>

If you want to render the outerHTML of a selection, you can use the html utility functon:

cheerio.html($('.pear'));
//=> <li class="pear">Pear</li>

You may also render the text content of a Cheerio object using the text static method:

const $ = cheerio.load('This is <em>content</em>.');
cheerio.text($('body'));
//=> This is content.

Plugins

Once you have loaded a document, you may extend the prototype or the equivalent fn property with custom plugin methods:

const $ = cheerio.load('<html><body>Hello, <b>world</b>!</body></html>');
$.prototype.logHtml = function () {
  console.log(this.html());
};

$('body').logHtml(); // logs "Hello, <b>world</b>!" to the console

If you're using TypeScript, you should add a type definition for your new method:

declare module 'cheerio' {
  interface Cheerio<T> {
    logHtml(this: Cheerio<T>): void;
  }
}

The "DOM Node" object

Cheerio collections are made up of objects that bear some resemblance to browser-based DOM nodes. You can expect them to define the following properties:

  • tagName
  • parentNode
  • previousSibling
  • nextSibling
  • nodeValue
  • firstChild
  • childNodes
  • lastChild

Screencasts

https://vimeo.com/31950192

This video tutorial is a follow-up to Nettut's "How to Scrape Web Pages with Node.js and jQuery", using cheerio instead of JSDOM + jQuery. This video shows how easy it is to use cheerio and how much faster cheerio is than JSDOM + jQuery.

Cheerio in the real world

Are you using cheerio in production? Add it to the wiki!

Sponsors

Does your company use Cheerio in production? Please consider sponsoring this project! Your help will allow maintainers to dedicate more time and resources to its development and support.

GitHub CryptoCasinos Casinoonlineaams.com Casinofiables.com Apify Free Bets Casino utan svensk licens Casino utan svensk licens

Backers

Become a backer to show your support for Cheerio and help us maintain and improve this open source project.

Airbnb Vasy Kafidoff Espen Klem Jarrod Davis Nishant Singh Gautham Chandra Charles Severance

Special Thanks

This library stands on the shoulders of some incredible developers. A special thanks to:

• @FB55 for node-htmlparser2 & CSSSelect: Felix has a knack for writing speedy parsing engines. He completely re-wrote both @tautologistic's node-htmlparser and @harry's node-soupselect from the ground up, making both of them much faster and more flexible. Cheerio would not be possible without his foundational work

• @jQuery team for jQuery: The core API is the best of its class and despite dealing with all the browser inconsistencies the code base is extremely clean and easy to follow. Much of cheerio's implementation and documentation is from jQuery. Thanks guys.

• @visionmedia: The style, the structure, the open-source"-ness" of this library comes from studying TJ's style and using many of his libraries. This dude consistently pumps out high-quality libraries and has always been more than willing to help or answer questions. You rock TJ.

License

MIT

Release Notes

1.0.0-rc.12
By Felix • Published on June 26, 2022

Bugfix release. Fixed issues:

New Contributors

Full Changelog: https://github.com/cheeriojs/cheerio/compare/v1.0.0-rc.11...v1.0.0-rc.12

1.0.0-rc.11
By Felix • Published on May 20, 2022

[email protected] is hopefully the last RC before the 1.0.0 release of Cheerio. There are two APIs that will be added for the next major release: An exract method (https://github.com/cheeriojs/cheerio/issues/2523) and NodeJS specific loader methods (https://github.com/cheeriojs/cheerio/issues/2051). These are still in flux and I'd appreciate feedback on the proposals.

A big thank you to everyone that contributed to this release! This includes code contributors, as well as the amazing financial support on GitHub Sponsors!

Under the hood, a lot of work for this release went into updating parse5, cheerio's default HTML parser. Have a look at parse5's release notes to see what has changed there.

Breaking

Features

Fixes

Refactor

Development Experience

Docs

New Contributors

Full Changelog: https://github.com/cheeriojs/cheerio/compare/v1.0.0-rc.10...v1.0.0-rc.11

1.0.0-rc.10
By Felix • Published on June 8, 2021

Fixes:

  • .html(node) now moves passed nodes (#1923, fixes #940) 258b26b
  • Boolean attributes are no longer special in xmlMode (#1903, fixes #1805) b393e4a
  • Rename parser adapter files (#1873, fixes #1847) 8f55dd8
  • Make filter work on all collections (#1870, fixes #1867) fb8d31e
  • Bump cheerio-select (#1922, fixes https://www.npmjs.com/advisories/1754) 5cd2b9c

Documentation:

  • Document how to define TS types for Plug-Ins (#1915, fixes #1778) 880fd2c
  • Remove obsolete Testing section e0c7cbb
  • Remove now-invalid require 5dfbd35

Refactors:

  • Wrap shared behavior in traversing (#1909) 58e090a
  • Move is to traversing, optimize (#1908) 1c6fa3e
  • Change order of arguments of internal domEach (#1892) feda230
  • Have load export a function (#1869) c370f4e

https://github.com/cheeriojs/cheerio/compare/v1.0.0-rc.9...v1.0.0-rc.10

1.0.0-rc.9
By Felix • Published on May 6, 2021

Port to TypeScript

Cheerio has been ported entirely to TypeScript (in #1816)! This eliminates a lot of edge-cases within Cheerio and will allow you to use Cheerio with confidence. This release also features a new documentation website based on TypeDoc, allowing you to quickly navigate all available methods: https://cheerio.js.org


Breaking change: If you were using the function exported by Cheerio directly instead of first load()ing a document, you will now have to update the require to use the default export.

- const cheerio = require("cheerio");
+ const cheerio = require("cheerio").default;

cheerio('div', dom)

Please note that this way of using Cheerio is deprecated and might be removed in a future version. Please consider updating your code to:

const cheerio = require("cheerio");

const $ = cheerio.load(dom)
$('div')

Note: Cheerio uses template literal types to determine return types. These are available starting with TypeScript 4.1, so you might have to bump your TypeScript version.

For TypeScript types, Cheerio now implements the ArrayLike<T> interface. That means that Cheerio instances can contain objects of arbitrary types, but not all methods can be called on them.

The TypeScript compiler will figure out what structures you are operating on:

  • When calling a loaded Cheerio instance with an HTML string like $('<div>'), it will product a Cheerio<Node> type.
    • Node is the base class for DOM elements and includes eg. comment and text nodes.
  • When calling Cheerio with a selector like $('.foo'), it will produce a Cheerio<Element>, as only Elements can be part of the result set.
    • Element is the class representing tags.
  • You can still use $('...').map() to map to arbitrary values, and will get a compiler error when trying to call method that are not supported.
    • Eg. $('.foo').map((i, el) => $(el).text()).attr('test') will no longer be possible, as .attr is not allowed to be called on a Cheerio<string>.

This release does not contain other changes to functionality. Feedback is greatly appreciated; if you encounter a problem, please file an issue!

https://github.com/cheeriojs/cheerio/compare/v1.0.0-rc.6...v1.0.0-rc.9

1.0.0-rc.8
By Felix • Published on May 6, 2021

Second botched release. Please use v1.0.0-rc.9 instead.

1.0.0-rc.7
By Felix • Published on May 6, 2021

Published without a lib directory — please ignore.

1.0.0-rc.6
By Felix • Published on April 8, 2021

Breaking:

  • Fixed the ordering of the output of several methods, including prevAll, prevUntil and parentsUntil. The new order matches jQuery.

This release contains three breaking changes inherited from dependencies.

  • Selectors (see [email protected]):
    • Several pseudo selectors are now stricter, in line with the HTML spec.
    • Some attributes are now case-insensitive based on the HTML spec.
  • DOM:
    • In XML mode, all elements will have type: 'tag'.

New features:

  • Add .unwrap (#1651 by @5saviahv) 2037d83
  • Add .wrapAll (#1590 by @5saviahv) cd4a4d9
  • Support prop('innerHTML') (#1578 by @fb55) c58258f
  • Expose the scriptingEnabled parse5 option (#1707 by @5saviahv) 7eb4cc4
    • By setting scriptingEnabled to false, it is now possible to parse the contents of <noscript> tags.

Types:

  • Improve .load type (#1584 by @f0x52) 6a90bda
  • Improve type for .get (#1759 by @karlhorky) d706976
  • Add .wrapAll (#1740 by @5saviahv) b360762
  • Allow for of loops (#1704 by @mcpiroman) 8fef5aa
  • Rename exported variable (#1682 by @dominik-korsa) 897b37f
  • Fix AttrFunction arguments (#1669 by @maxma241) 5f2e9c3

Bug fixes:

  • Fix handling of undefined as value in .attr() (#1757 by @5saviahv) 98186e8
  • Fix filter for {prev,next}Until (#1728 by @fb55) f2615d2
  • Fix parentsUntil filtering (#1708 by @5saviahv) bf899d5
  • Remove module caching dependency (#1691 by @5saviahv) a9d6a43
  • Filter text nodes in find function (#1680 by @5saviahv) 9b28b49
  • Fix .add modifying previous selections (#1656 by @5saviahv) 9f9b493
  • Stop parent() from throwing an error in some cases (#1637 by @5saviahv) 43592d6
  • Make it possible to .find siblings (#1583 by @fb55) 1062a6c
  • Fix replaceWith replacing element with itself (#1581 by @fb55) 88ae636
  • Fix attr handling of undefined as value (#1582 by @fb55) 3b35ae4
  • Support passing a single element to load (#1580 by @fb55) 0855be6
  • Update attribute value when setting prop in .prop (#1579 by @fb55) db3fce7
  • Add length instance property to all Cheerio instances (#1681 by @5saviahv) b3010d7
  • Enforce LF (#1602 by @XhmikosR) 0bbad23
  • .html() send context to parse5 (#1627 by @5saviahv) bf04330

Documentation updates:

  • Switch website to clean-jsdoc-theme (#1648 by @XhmikosR) a336301
  • Switch to shields.io for badges (#1611 by @XhmikosR) 290aa9f
  • Minor typo fixes (#1606 by @XhmikosR) f5d6ac3
  • Fix a few redirects (#1603 by @XhmikosR) 2ae9b14
  • Use https when possible. (#1597 by @XhmikosR) c6679ae
  • Add JSDoc docs, standardize (#1593 by @fb55) 8273e4c
  • Document after, before, slice arguments, improve handling (#1721 by @5saviahv) 732d539
  • Make link explicit (#1616 by @XhmikosR) d7c3817
  • Fix readme link (#1618 by @XhmikosR) bdd6018
  • Fix parameter types (#1615 by @fb55) 15e39cf
  • Enable Markdown IDs in JSDoc (#1610 by @XhmikosR) f3e1a4c
  • Add bugs and homepage properties to package.json (#1609 by @XhmikosR) ad3e30b
  • Explicitly set type for wrap(Inner) (#1668 by @5saviahv) 5b8dd60

Refactors:

  • Use [email protected] (#1594 by @fb55) e8f5e98
    • Fixes a deprecation warning.
  • Simplify quickExpr (#1716 by @fb55) 4aa3d39
  • Enable strict mode for all files (#1650 by @fb55) 208bce1
  • Simplify wrapAll, add some tests (#1640 by @5saviahv) b6d3840
  • Remove unneeded escapes (#1635 by @XhmikosR) bfa114e
  • Move parsers to their own files (#1589 by @fb55) 63e4616
  • Declare vars when used, streamline code (#1588 by @fb55) 69ae308
  • Enable several eslint rules (#1617 by @XhmikosR) 21de2c5
  • Enable eqeqeq eslint rule except for null (#1638 by @XhmikosR) 52f37a1
  • Enable block-scoped-var eslint rule (#1631 by @XhmikosR) b072df8
  • Enable no-unused-expressions eslunt rule (#1630 by @XhmikosR) fc2c7d5
  • Remove eslint --ignore-path (#1612 by @XhmikosR) 17f0d08
  • Merge .gitattributes (#1646 by @XhmikosR) 2fb25aa
  • Remove Makefile, .prettierignore (#1614 by @fb55) 36c4c77
  • Add a "bench" alias script (#1629 by @XhmikosR) bb6cb38
  • Add jest & node eslint plugins (#1642 by @fb55) 075cc5d
  • Remove the now unused xyz devDependency (#1628 by @XhmikosR) b93931a
  • Remove the now unused entities package (#1613 by @XhmikosR) 18c0038
  • Remove the now unused scripts (#1647 by @XhmikosR) a3f6846

CI:

  • Switch to GitHub Actions CI (#1600 by @XhmikosR) b9453ea
  • Update CI config (#1673 by @XhmikosR) 8ace785
  • Fix benchmark skip check (#1634 by @XhmikosR) 703ec16
  • Use actions-gh-pages (#1626 by @fb55) 9ee60cc
  • Update GitHub Actions too (#1605 by @XhmikosR) 05a4757
  • Add versioning-strategy: increase for dependabot, format 71d2aaf

Test changes:

  • Nesting level of some deeply nested tests decreased (#1734 by @5saviahv) 9743030
  • Add tests for mixed elements and text (#1747 by @5saviahv) ca7cd9b
  • test(cheerio): Fix typos in test names (#1748 by @atimidguy) fafae51
  • Add test for #1092 (#1733 by @5saviahv) 1a86118
  • test(tsc): Add expectType test (#1726 by @5saviahv) 6f35a39
  • .map test was actually calling .each (#1711 by @Pustur) 456fbe5
  • Prefer using .toBeUndefined() (#1659 by @XhmikosR) 5aa4272
  • Prefer Jest's toBe true/false matcher. (#1639 by @XhmikosR) 4859684
  • Add some tests (#1653 by @5saviahv) 1ebe05a
  • Migrate to Jest (#1596 by @fb55) d60bac9
  • Add CodeCQL Action (#1601 by @XhmikosR) 990e963

Commit Range: https://github.com/cheeriojs/cheerio/compare/v1.0.0-rc.5...v1.0.0-rc.6

1.0.0-rc.5
By Felix • Published on December 21, 2020

Hotfix release

  • fix(package): Use cheerio-select-tmp until naming issue is resolved 3751929

https://github.com/cheeriojs/cheerio/compare/v1.0.0-rc.4...v1.0.0-rc.5

1.0.0-rc.4
By Felix • Published on December 21, 2020

Welcome to [email protected]! This is the last pre-release before a full 1.0.0 release — please make sure to test this release and report any issues you might find.

This release was made possible by our supporters on Open Collective. If you want to support this project going forward, have a look at sponsorship options!

Breaking:

  • After upgrading parse5, cheerio temporarily has a minimum Node version of Node 6. See #1585 for details.
  • Use the parser's DOM without modifications (#1559 by @fb55)
    • Nodes no longer have a root reference. The root node is now referenced by the parent property.
  • Use parse5 to serialize the DOM (04091a4 by @fb55)
    • We will no longer encode non-ASCII characters by default. Please make sure that you pass on the appropriate content encoding.

New Features:

  • Support for jQuery's positional selectors (:eq, :last, :odd, etc.) (#1565 by @fb55)
  • Add Typescript typings (#1491 by @paulmelnikow)
  • Introduces a new documentation website (#1571 by @fb55, based on work by @jugglinmike)
  • Implement for...of iterator via Symbol.iterator (#1197 by @papandreou)
  • Add wrapInner (9ffc557 by @fb55, based on work by @tomjw64 and @warrengm)
  • Have removeAttr accept a list of attributes to remove (#1561 by @fb55)
  • Add prop(‘outerHTML’) implementation (#945 by @bill-bishop)

Bug Fixes:

  • Prevent stale data on .prev() after replaceWith() (#1254 by @Gei0r)
  • Fix .xml calls on HTML documents (#1572 by @fb55)
  • Correct rendering of root node (#1307 by @jugglinmike)
  • Throw a useful error on invalid input to cheerio.load() (#1087 by @zeke)
  • Fix data attribute shadowing issue (#1139 by @tai2)
  • Pass locationInfo option to parse5 (#1155 by @trevorhreed)

Other notable changes:

  • Dropped Lodash as a dependency
    • Avoid lodash where possible (#1500 by @TrySound)
    • Use domhandler nodes directly (#1564 by @fb55)
  • Formally test deprecated APIs (#1184 by @jugglinmike)

https://github.com/cheeriojs/cheerio/compare/1.0.0-rc.2...v1.0.0-rc.4

0.4.2
By Felix • Published on January 17, 2012
  • Multiple selectors support: $('.apple, .orange'). Thanks @siddMahen!
  • Update package.json to always use latest cheerio-soupselect
  • Fix memory leak in index.js

General

License
MIT
Typescript Types
Built-in
Tree-shakeable
Yes

Popularity

GitHub Stargazers
25.5K
Community Interest
27K
Number of Forks
1,568

Maintenance

Commits
10/219/22060
Last Commit
Open Issues
15
Closed Issues
1,082
Open Pull Requests
2
Closed Pull Requests
393

Versions

Versions Released
10/219/2201
Latest Version Released
Jun 26, 2022
Current Tags
latest1.0.0-rc.12

Contributors

matthewmueller
matthewmueller
Commits: 299
fb55
fb55
Commits: 199
jugglinmike
jugglinmike
Commits: 169
davidchambers
davidchambers
Commits: 53
kpdecker
kpdecker
Commits: 33
XhmikosR
XhmikosR
Commits: 32
5saviahv
5saviahv
Commits: 21
arb
arb
Commits: 11
nleush
nleush
Commits: 7
0xBADC0FFEE
0xBADC0FFEE
Commits: 5
alexindigo
alexindigo
Commits: 5
cvrebert
cvrebert
Commits: 5
stevenvachon
stevenvachon
Commits: 5
bensheldon
bensheldon
Commits: 5
greenkeeper[bot]
greenkeeper[bot]
Commits: 4