UNPKG

pagehtml

Version:

A tool to grab and process a website's html.

1,098 lines (296 loc) 13.4 kB
<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <title>JSDoc: Class: PageHTML</title> <script src="scripts/prettify/prettify.js"> </script> <script src="scripts/prettify/lang-css.js"> </script> <!--[if lt IE 9]> <script src="//html5shiv.googlecode.com/svn/trunk/html5.js"></script> <![endif]--> <link type="text/css" rel="stylesheet" href="styles/prettify-tomorrow.css"> <link type="text/css" rel="stylesheet" href="styles/jsdoc-default.css"> </head> <body> <div id="main"> <h1 class="page-title">Class: PageHTML</h1> <section> <header> <h2><span class="attribs"><span class="type-signature"></span></span>PageHTML<span class="signature">()</span><span class="type-signature"></span></h2> <div class="class-description"><p>A class representing HTML pages.</p></div> </header> <article> <div class="container-overview"> <h2>Constructor</h2> <h4 class="name" id="PageHTML"><span class="type-signature"></span>new PageHTML<span class="signature">()</span><span class="type-signature"></span></h4> <dl class="details"> <dt class="tag-source">Source:</dt> <dd class="tag-source"><ul class="dummy"><li> <a href="src_htmlScraper.js.html">src/htmlScraper.js</a>, <a href="src_htmlScraper.js.html#line11">line 11</a> </li></ul></dd> </dl> </div> <h3 class="subsection-title">Methods</h3> <h4 class="name" id="clear"><span class="type-signature"></span>clear<span class="signature">()</span><span class="type-signature"> &rarr; {null}</span></h4> <div class="description"> <p>A method to clear the array of the dom property of the class.</p> </div> <dl class="details"> <dt class="tag-source">Source:</dt> <dd class="tag-source"><ul class="dummy"><li> <a href="src_htmlScraper.js.html">src/htmlScraper.js</a>, <a href="src_htmlScraper.js.html#line68">line 68</a> </li></ul></dd> </dl> <h5>Returns:</h5> <dl> <dt> Type </dt> <dd> <span class="param-type">null</span> </dd> </dl> <h4 class="name" id="close"><span class="type-signature"></span>close<span class="signature">()</span><span class="type-signature"></span></h4> <div class="description"> <p>Closes the browser instance of Puppeteer and sets the page and browser properties of the PageHTML class to null.</p> </div> <dl class="details"> <dt class="tag-source">Source:</dt> <dd class="tag-source"><ul class="dummy"><li> <a href="src_htmlScraper.js.html">src/htmlScraper.js</a>, <a href="src_htmlScraper.js.html#line272">line 272</a> </li></ul></dd> </dl> <h4 class="name" id="content"><span class="type-signature"></span>content<span class="signature">(index, elementString)</span><span class="type-signature"> &rarr; {Array.&lt;{elementText: string, nodeName: string, outerHTML: string, innerHTML: string, parentElement: string}>}</span></h4> <div class="description"> <p>A method that returns a content object containing the text content and other relevant parameters for an HTML element.</p> </div> <h5>Parameters:</h5> <table class="params"> <thead> <tr> <th>Name</th> <th>Type</th> <th class="last">Description</th> </tr> </thead> <tbody> <tr> <td class="name"><code>index</code></td> <td class="type"> <span class="param-type">number</span> | <span class="param-type">Array.&lt;number></span> </td> <td class="description last"><p>The index of the dom property to perform this method on. This may be a single number or an array of numbers. If left undefined it will apply to all indexes.</p></td> </tr> <tr> <td class="name"><code>elementString</code></td> <td class="type"> <span class="param-type">string</span> </td> <td class="description last"></td> </tr> </tbody> </table> <dl class="details"> <dt class="tag-source">Source:</dt> <dd class="tag-source"><ul class="dummy"><li> <a href="src_htmlScraper.js.html">src/htmlScraper.js</a>, <a href="src_htmlScraper.js.html#line232">line 232</a> </li></ul></dd> </dl> <h5>Returns:</h5> <div class="param-desc"> <p>an array of element objects.</p> </div> <dl> <dt> Type </dt> <dd> <span class="param-type">Array.&lt;{elementText: string, nodeName: string, outerHTML: string, innerHTML: string, parentElement: string}></span> </dd> </dl> <h5>Example</h5> <pre class="prettyprint"><code>```javascript // create the PageHTML class. let pHtml = new PageHTML(); // load the webpage. await pHtml.get('https://www.whatsmyua.info/'); // close the connection. pHtml.close(); // return the list element with id rawUa. // setting index to 0 returns the 0th JSDOM element in the pHtml.dom property. // additionally we set the array index to 0 to confirm we only want the first instance. let userAgentDetected = pHtml.content('li#rawUa',0)[0]; console.log(userAgentDetected); ``` returns the string user agent detected by www.whatsmyua.info</code></pre> <h4 class="name" id="get"><span class="type-signature">(async) </span>get<span class="signature">(url)</span><span class="type-signature"> &rarr; {jsdom.JSDOM}</span></h4> <div class="description"> <p>A function that retrieves data from a webpage.</p> </div> <h5>Parameters:</h5> <table class="params"> <thead> <tr> <th>Name</th> <th>Type</th> <th class="last">Description</th> </tr> </thead> <tbody> <tr> <td class="name"><code>url</code></td> <td class="type"> <span class="param-type">string</span> </td> <td class="description last"><p>The target url from which to grab data.</p></td> </tr> </tbody> </table> <dl class="details"> <dt class="tag-source">Source:</dt> <dd class="tag-source"><ul class="dummy"><li> <a href="src_htmlScraper.js.html">src/htmlScraper.js</a>, <a href="src_htmlScraper.js.html#line47">line 47</a> </li></ul></dd> </dl> <h5>Returns:</h5> <div class="param-desc"> <p>The document object model (dom).</p> </div> <dl> <dt> Type </dt> <dd> <span class="param-type">jsdom.JSDOM</span> </dd> </dl> <h4 class="name" id="links"><span class="type-signature"></span>links<span class="signature">(index)</span><span class="type-signature"> &rarr; {Array.&lt;{href: string, nodeName: string, outerHTML: string, innerHTML: string, parentElement: string}>}</span></h4> <div class="description"> <p>A method that returns a links object containing relevant information to any href elements in the instance of a JSDOM class.</p> </div> <h5>Parameters:</h5> <table class="params"> <thead> <tr> <th>Name</th> <th>Type</th> <th class="last">Description</th> </tr> </thead> <tbody> <tr> <td class="name"><code>index</code></td> <td class="type"> <span class="param-type">number</span> </td> <td class="description last"><p>The index of the dom property to perform this method on. This may be a single number or an array of numbers. If left undefined it will apply to all indexes.</p></td> </tr> </tbody> </table> <dl class="details"> <dt class="tag-source">Source:</dt> <dd class="tag-source"><ul class="dummy"><li> <a href="src_htmlScraper.js.html">src/htmlScraper.js</a>, <a href="src_htmlScraper.js.html#line163">line 163</a> </li></ul></dd> </dl> <h5>Returns:</h5> <div class="param-desc"> <p>an array of link objects.</p> </div> <dl> <dt> Type </dt> <dd> <span class="param-type">Array.&lt;{href: string, nodeName: string, outerHTML: string, innerHTML: string, parentElement: string}></span> </dd> </dl> <h5>Example</h5> <pre class="prettyprint"><code>```javascript // create the PageHTML class. let pHtml = new PageHTML(); // grab the webpage content. await pHtml.get('https://en.wikipedia.org/wiki/List_of_Formula_One_Grand_Prix_winners'); // close the webpage. pHtml.close(); // setting index to 0 returns the link objects for the 0th JSDOM element in the pHtml.dom property. // additionally we set the array index to 2 to confirm we only want the third link. let links = pHtml.links(0)[2]; console.log(tables); ``` returns the third href link from the webpage as a link object.</code></pre> <h4 class="name" id="tables"><span class="type-signature"></span>tables<span class="signature">(index)</span><span class="type-signature"> &rarr; {Array.&lt;string>}</span></h4> <div class="description"> <p>A method to return all tables present in an instance of the JSDOM class.</p> </div> <h5>Parameters:</h5> <table class="params"> <thead> <tr> <th>Name</th> <th>Type</th> <th class="last">Description</th> </tr> </thead> <tbody> <tr> <td class="name"><code>index</code></td> <td class="type"> <span class="param-type">number</span> </td> <td class="description last"><p>The index of the dom property to perform this method on. This may be a single number or an array of numbers. If left undefined it will apply to all indexes.</p></td> </tr> </tbody> </table> <dl class="details"> <dt class="tag-source">Source:</dt> <dd class="tag-source"><ul class="dummy"><li> <a href="src_htmlScraper.js.html">src/htmlScraper.js</a>, <a href="src_htmlScraper.js.html#line97">line 97</a> </li></ul></dd> </dl> <h5>Returns:</h5> <div class="param-desc"> <p>an array of arrays representing rows and columns of an html table.</p> </div> <dl> <dt> Type </dt> <dd> <span class="param-type">Array.&lt;string></span> </dd> </dl> <h5>Example</h5> <pre class="prettyprint"><code>```javascript // create the PageHTML class. let pHtml = new PageHTML(); // grab the webpage content. await pHtml.get('https://en.wikipedia.org/wiki/List_of_Formula_One_Grand_Prix_winners'); // close the webpage. pHtml.close(); // setting index to 0 returns the tables for the 0th JSDOM element in the pHtml.dom property. // additionally we set the array index to 1 to confirm we only want the second table. let links = pHtml.tables(0)[1]; console.log(tables); ``` returns the second html table from the webpage as an array of arrays.</code></pre> </article> </section> </div> <nav> <h2><a href="index.html">Home</a></h2><h3>Classes</h3><ul><li><a href="PageHTML.html">PageHTML</a></li></ul><h3>Global</h3><ul><li><a href="global.html#getElementText">getElementText</a></li><li><a href="global.html#getLinks">getLinks</a></li><li><a href="global.html#makeTable">makeTable</a></li></ul> </nav> <br class="clear"> <footer> Documentation generated by <a href="https://github.com/jsdoc/jsdoc">JSDoc 4.0.4</a> on Wed Feb 05 2025 14:23:55 GMT-0800 (Pacific Standard Time) </footer> <script> prettyPrint(); </script> <script src="scripts/linenumber.js"> </script> </body> </html>