pagehtml
Version:
A tool to grab and process a website's html.
1,098 lines (296 loc) • 13.4 kB
HTML
<html lang="en">
<head>
<meta charset="utf-8">
<title>JSDoc: Class: PageHTML</title>
<script src="scripts/prettify/prettify.js"> </script>
<script src="scripts/prettify/lang-css.js"> </script>
<!--[if lt IE 9]>
<script src="//html5shiv.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
<link type="text/css" rel="stylesheet" href="styles/prettify-tomorrow.css">
<link type="text/css" rel="stylesheet" href="styles/jsdoc-default.css">
</head>
<body>
<div id="main">
<h1 class="page-title">Class: PageHTML</h1>
<section>
<header>
<h2><span class="attribs"><span class="type-signature"></span></span>PageHTML<span class="signature">()</span><span class="type-signature"></span></h2>
<div class="class-description"><p>A class representing HTML pages.</p></div>
</header>
<article>
<div class="container-overview">
<h2>Constructor</h2>
<h4 class="name" id="PageHTML"><span class="type-signature"></span>new PageHTML<span class="signature">()</span><span class="type-signature"></span></h4>
<dl class="details">
<dt class="tag-source">Source:</dt>
<dd class="tag-source"><ul class="dummy"><li>
<a href="src_htmlScraper.js.html">src/htmlScraper.js</a>, <a href="src_htmlScraper.js.html#line11">line 11</a>
</li></ul></dd>
</dl>
</div>
<h3 class="subsection-title">Methods</h3>
<h4 class="name" id="clear"><span class="type-signature"></span>clear<span class="signature">()</span><span class="type-signature"> → {null}</span></h4>
<div class="description">
<p>A method to clear the array of the dom property of the class.</p>
</div>
<dl class="details">
<dt class="tag-source">Source:</dt>
<dd class="tag-source"><ul class="dummy"><li>
<a href="src_htmlScraper.js.html">src/htmlScraper.js</a>, <a href="src_htmlScraper.js.html#line68">line 68</a>
</li></ul></dd>
</dl>
<h5>Returns:</h5>
<dl>
<dt>
Type
</dt>
<dd>
<span class="param-type">null</span>
</dd>
</dl>
<h4 class="name" id="close"><span class="type-signature"></span>close<span class="signature">()</span><span class="type-signature"></span></h4>
<div class="description">
<p>Closes the browser instance of Puppeteer and sets the page and browser properties of the PageHTML class to null.</p>
</div>
<dl class="details">
<dt class="tag-source">Source:</dt>
<dd class="tag-source"><ul class="dummy"><li>
<a href="src_htmlScraper.js.html">src/htmlScraper.js</a>, <a href="src_htmlScraper.js.html#line272">line 272</a>
</li></ul></dd>
</dl>
<h4 class="name" id="content"><span class="type-signature"></span>content<span class="signature">(index, elementString)</span><span class="type-signature"> → {Array.<{elementText: string, nodeName: string, outerHTML: string, innerHTML: string, parentElement: string}>}</span></h4>
<div class="description">
<p>A method that returns a content object containing the text content
and other relevant parameters for an HTML element.</p>
</div>
<h5>Parameters:</h5>
<table class="params">
<thead>
<tr>
<th>Name</th>
<th>Type</th>
<th class="last">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td class="name"><code>index</code></td>
<td class="type">
<span class="param-type">number</span>
|
<span class="param-type">Array.<number></span>
</td>
<td class="description last"><p>The index of the dom property to perform this method on.
This may be a single number or an array of numbers. If left undefined it will apply to all indexes.</p></td>
</tr>
<tr>
<td class="name"><code>elementString</code></td>
<td class="type">
<span class="param-type">string</span>
</td>
<td class="description last"></td>
</tr>
</tbody>
</table>
<dl class="details">
<dt class="tag-source">Source:</dt>
<dd class="tag-source"><ul class="dummy"><li>
<a href="src_htmlScraper.js.html">src/htmlScraper.js</a>, <a href="src_htmlScraper.js.html#line232">line 232</a>
</li></ul></dd>
</dl>
<h5>Returns:</h5>
<div class="param-desc">
<p>an array of element objects.</p>
</div>
<dl>
<dt>
Type
</dt>
<dd>
<span class="param-type">Array.<{elementText: string, nodeName: string, outerHTML: string, innerHTML: string, parentElement: string}></span>
</dd>
</dl>
<h5>Example</h5>
<pre class="prettyprint"><code>```javascript
// create the PageHTML class.
let pHtml = new PageHTML();
// load the webpage.
await pHtml.get('https://www.whatsmyua.info/');
// close the connection.
pHtml.close();
// return the list element with id rawUa.
// setting index to 0 returns the 0th JSDOM element in the pHtml.dom property.
// additionally we set the array index to 0 to confirm we only want the first instance.
let userAgentDetected = pHtml.content('li#rawUa',0)[0];
console.log(userAgentDetected);
```
returns the string user agent detected by www.whatsmyua.info</code></pre>
<h4 class="name" id="get"><span class="type-signature">(async) </span>get<span class="signature">(url)</span><span class="type-signature"> → {jsdom.JSDOM}</span></h4>
<div class="description">
<p>A function that retrieves data from a webpage.</p>
</div>
<h5>Parameters:</h5>
<table class="params">
<thead>
<tr>
<th>Name</th>
<th>Type</th>
<th class="last">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td class="name"><code>url</code></td>
<td class="type">
<span class="param-type">string</span>
</td>
<td class="description last"><p>The target url from which to grab data.</p></td>
</tr>
</tbody>
</table>
<dl class="details">
<dt class="tag-source">Source:</dt>
<dd class="tag-source"><ul class="dummy"><li>
<a href="src_htmlScraper.js.html">src/htmlScraper.js</a>, <a href="src_htmlScraper.js.html#line47">line 47</a>
</li></ul></dd>
</dl>
<h5>Returns:</h5>
<div class="param-desc">
<p>The document object model (dom).</p>
</div>
<dl>
<dt>
Type
</dt>
<dd>
<span class="param-type">jsdom.JSDOM</span>
</dd>
</dl>
<h4 class="name" id="links"><span class="type-signature"></span>links<span class="signature">(index)</span><span class="type-signature"> → {Array.<{href: string, nodeName: string, outerHTML: string, innerHTML: string, parentElement: string}>}</span></h4>
<div class="description">
<p>A method that returns a links object containing relevant information to any href elements in the instance of a JSDOM class.</p>
</div>
<h5>Parameters:</h5>
<table class="params">
<thead>
<tr>
<th>Name</th>
<th>Type</th>
<th class="last">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td class="name"><code>index</code></td>
<td class="type">
<span class="param-type">number</span>
</td>
<td class="description last"><p>The index of the dom property to perform this method on.
This may be a single number or an array of numbers. If left undefined it will apply to all indexes.</p></td>
</tr>
</tbody>
</table>
<dl class="details">
<dt class="tag-source">Source:</dt>
<dd class="tag-source"><ul class="dummy"><li>
<a href="src_htmlScraper.js.html">src/htmlScraper.js</a>, <a href="src_htmlScraper.js.html#line163">line 163</a>
</li></ul></dd>
</dl>
<h5>Returns:</h5>
<div class="param-desc">
<p>an array of link objects.</p>
</div>
<dl>
<dt>
Type
</dt>
<dd>
<span class="param-type">Array.<{href: string, nodeName: string, outerHTML: string, innerHTML: string, parentElement: string}></span>
</dd>
</dl>
<h5>Example</h5>
<pre class="prettyprint"><code>```javascript
// create the PageHTML class.
let pHtml = new PageHTML();
// grab the webpage content.
await pHtml.get('https://en.wikipedia.org/wiki/List_of_Formula_One_Grand_Prix_winners');
// close the webpage.
pHtml.close();
// setting index to 0 returns the link objects for the 0th JSDOM element in the pHtml.dom property.
// additionally we set the array index to 2 to confirm we only want the third link.
let links = pHtml.links(0)[2];
console.log(tables);
```
returns the third href link from the webpage as a link object.</code></pre>
<h4 class="name" id="tables"><span class="type-signature"></span>tables<span class="signature">(index)</span><span class="type-signature"> → {Array.<string>}</span></h4>
<div class="description">
<p>A method to return all tables present in an instance of the JSDOM class.</p>
</div>
<h5>Parameters:</h5>
<table class="params">
<thead>
<tr>
<th>Name</th>
<th>Type</th>
<th class="last">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td class="name"><code>index</code></td>
<td class="type">
<span class="param-type">number</span>
</td>
<td class="description last"><p>The index of the dom property to perform this method on.
This may be a single number or an array of numbers. If left undefined it will apply to all indexes.</p></td>
</tr>
</tbody>
</table>
<dl class="details">
<dt class="tag-source">Source:</dt>
<dd class="tag-source"><ul class="dummy"><li>
<a href="src_htmlScraper.js.html">src/htmlScraper.js</a>, <a href="src_htmlScraper.js.html#line97">line 97</a>
</li></ul></dd>
</dl>
<h5>Returns:</h5>
<div class="param-desc">
<p>an array of arrays representing rows and columns of an html table.</p>
</div>
<dl>
<dt>
Type
</dt>
<dd>
<span class="param-type">Array.<string></span>
</dd>
</dl>
<h5>Example</h5>
<pre class="prettyprint"><code>```javascript
// create the PageHTML class.
let pHtml = new PageHTML();
// grab the webpage content.
await pHtml.get('https://en.wikipedia.org/wiki/List_of_Formula_One_Grand_Prix_winners');
// close the webpage.
pHtml.close();
// setting index to 0 returns the tables for the 0th JSDOM element in the pHtml.dom property.
// additionally we set the array index to 1 to confirm we only want the second table.
let links = pHtml.tables(0)[1];
console.log(tables);
```
returns the second html table from the webpage as an array of arrays.</code></pre>
</article>
</section>
</div>
<nav>
<h2><a href="index.html">Home</a></h2><h3>Classes</h3><ul><li><a href="PageHTML.html">PageHTML</a></li></ul><h3>Global</h3><ul><li><a href="global.html#getElementText">getElementText</a></li><li><a href="global.html#getLinks">getLinks</a></li><li><a href="global.html#makeTable">makeTable</a></li></ul>
</nav>
<br class="clear">
<footer>
Documentation generated by <a href="https://github.com/jsdoc/jsdoc">JSDoc 4.0.4</a> on Wed Feb 05 2025 14:23:55 GMT-0800 (Pacific Standard Time)
</footer>
<script> prettyPrint(); </script>
<script src="scripts/linenumber.js"> </script>
</body>
</html>