UNPKG

kasha

Version:

Pre-render your Single-Page Application.

373 lines (289 loc) 13.1 kB
![logo](https://github.com/kasha-io/kasha/raw/master/static/kasha.png) # Kasha Pre-render your Single-Page Application. ![workflow](https://github.com/kasha-io/kasha/raw/master/static/workflow.png) ## Features * Prerender the Single-Page Application. * Automatically collect sitemaps from `<meta>`s. * Generate `robots.txt` with sitemap directives. * Sync prerendering. * Async prerendering with callback URL. * URL rewriting. * Works as a proxy server. * Rich APIs. * Caching. ## Requirements * [MongoDB](https://www.mongodb.com/) * [nsq](http://nsq.io/) ## SPA compatibility adjustments In order to make the pre-rendered SPA works correctly in the client-side browser, you need to do some works: * When pre-rendering, intercept the anonymous AJAX requests and store the responses into `<script>` tag, so AJAX requests would not send again on the client-side. Our AJAX library [teleman](https://github.com/kasha-io/teleman) and [teleman-ssr-cache](https://github.com/kasha-io/teleman-ssr-cache) may help you. * On the client-side, mount the SPA and replace the pre-rendered content. * Set `<meta>` tags, so search engine can know more about the page. You can use [set-meta](https://github.com/kasha-io/set-meta). ## Installation ```sh npm i -g kasha ``` Docker: ```sh docker pull kasha/kasha ``` ## Configuration See [config.sample.js](config.sample.js) ## Running ### Start the server: ```sh kasha server --config=/path/to/config.js ``` Docker: ```sh docker run -v /path/to/config.js:/dest/to/config.js kasha/kasha server --config=/dest/to/config.js ``` ### Start the worker: ```sh kasha worker --config=/path/to/config.js # async worker # requests with 'callbackURL' parameter will be dispatched to async workers. kasha worker --async --config=/path/to/config.js ``` Docker: ```sh docker run -v /path/to/config.js:/dest/to/config.js kasha/kasha worker [--async] --config=/dest/to/config.js ``` ## Site Config ```js db.sites.insert({ // The hostname of your site. host: 'www.example.com', // In proxy mode, if the request doesn't contain 'X-Forwarded-Proto' or 'Forwarded:...proto=...' header, // then use 'defaultProtocol'. defaultProtocol: 'https', // If your site use REST-style URLs, like /article/123, the query string isn't necessary to the page, // you can remove the query string to improve the cache hit rate: // keepQuery: false, // You can also keep the required query parameter of some URLs keepQuery: [ [ '/search', // the first element is the pathname of URL. 'type', // starting from the second element, specifies the query names you need to keep. 'keyword' ], // another URL and its query names ['/product', 'id'] ], // You can use the '/render' API to crawl the hash-based Single-page application. // For example, you can crawl https://www.example.com/app/#/home via // /render?url=https%3A%2F%2Fwww.example.com%2Fapp%2F%23%2Fhome // But if this site is not hash-based, you can remove the hash: keepHash: false, // Rewrites the request URL. rewrites: [ // [from, to] // If 'to' is an empty string, the request will be aborted. // pattern syntax see https://github.com/jiangfengming/url-router#pattern // route all requests to the entry point HTML file ['https://www.example.com/(.*)', 'https://static.example.com/index.html'], // except robots.txt ['https://www.example.com/robots.txt', 'https://static.example.com/robots.txt'], // or block it if you do not have one // ['https://www.example.com/robots.txt', ''], // block google analytics requests ['https://www.googletagmanager.com/(.*)', ''] ], // Excludes the pages that don't need pre-rendering. excludes: [ '/your-account/(.*)', '/your-orders/(.*)' ], // But include these pages that matched the excludes pattern includes: [ 'your-account/signin' ], // Specifies the User-Agent userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/73.0.3683.103 Safari/537.36', // You can create profiles for different device types. // A profile can override keepQuery, keepHash, rewrites, excludes, includes, userAgent. profiles: { desktop: { userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/73.0.3683.103 Safari/537.36', rewrites: [ [ 'https://www.example.com/(.*)', 'https://static.example.com/desktop/index.html' ] ] }, mobile: { userAgent: 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/73.0.3683.103 Mobile Safari/537.36', rewrites: [ [ 'https://www.example.com/(.*)', 'https://static.example.com/mobile/index.html' ] ] } }, // If profile param of the request isn't set, use this profile defaultProfile: 'desktop' }) ``` ## APIs Please confirm `apiHost` has been set correctly. For example, if set `apiHost: '127.0.0.1:3000'`, then only requests from `http(s)://127.0.0.1:3000/*` can access the APIs, All other domains are served in proxy mode. ### GET /render Renders the page. #### Query string params: `url`: The encoded URL of the webpage to render. `profile`: The profile to use. `type`: Set the response type. Defaults to `json`. * `html`: Returns html with header `Content-Type: text/html`. * `json`: Returns json with header `Content-Type: application/json`. * `static`: Returns html with header `Content-Type: text/html`, but stripped the `<script>` tags and `on*` event handlers. `callbackURL`: Don't wait the result. Once the job is done, `POST` the result to the given URL with `json` format. If `callbackURL` is set, `type` is ignored. `metaOnly`: If `type` is `json`, only returns meta data without html content. `followRedirect`: Follows the redirects if the page return `301`/`302`. `refresh`: Forces to refresh the cache. `noWait`: Don't wait for the response. It is useful for pre-caching the page. `fallback`: If no cache found or the cache is expired, the request is proxied to the origin directly. If `fallback` is set, `type` must be `html`, `callbackURL`, `metaOnly`, `followRedirect`, `refresh` and `noWait` can not be set. To the boolean parameters, if the param is absent or set to `0`, it means `false`. If set to `1` or empty value (e.g., `&refresh`, `&refresh=`, `&refresh=1`), it means `true`. Example: `http://localhost:3000/render?url=https%3A%2F%2Fdavidwalsh.name%2Ffacebook-meta-tags` #### The returned JSON format example: ```json { "url": "https://davidwalsh.name/facebook-meta-tags", "profile": "", "status": 200, "redirect": null, "meta": { "title": "Facebook Open Graph META Tags", "description": "Facebook's Open Graph protocol allows for web developers to turn their websites into Facebook \"graph\" objects, allowing a certain level of customization over how information is carried over from a non-Facebook website to Facebook when a page is \"recommended\" and \"liked\".", "image": "https://davidwalsh.name/demo/facebook-developers-logo.png", "canonicalUrl": "https://davidwalsh.name/facebook-meta-tags", "author": "David Walsh", "keywords": null }, "openGraph": { "og": { "locale": { "current": "en_US" }, "type": "article", "title": "Facebook Open Graph META Tags", "description": "Facebook's Open Graph protocol allows for web developers to turn their websites into Facebook \"graph\" objects, allowing a certain level of customization over how information is carried over from a non-Facebook website to Facebook when a page is \"recommended\" and \"liked\".", "url": "https://davidwalsh.name/facebook-meta-tags", "site_name": "David Walsh Blog", "updated_time": "2016-02-23T00:44:54+00:00", "image": [ { "url": "https://davidwalsh.name/demo/facebook-developers-logo.png", "secure_url": "https://davidwalsh.name/demo/facebook-developers-logo.png" }, { "url": "https://davidwalsh.name/demo/david-facebook-share.png", "secure_url": "https://davidwalsh.name/demo/david-facebook-share.png" } ] }, "article": { "publisher": "https://www.facebook.com/davidwalshblog", "section": "APIs", "published_time": "2011-04-25T09:24:28+00:00", "modified_time": "2016-02-23T00:44:54+00:00" } }, "content": "<!DOCTYPE html><html>...</html>", "date": "2018-03-13T09:53:00.921Z" } ``` ### GET /:url Alias of `/render?url=ENCODED_URL&type=html`. For example, `http://localhost:3000/https://www.example.com/` is equivalent to `http://localhost:3000/render?url=https%3A%2F%2Fwww.example.com%2F&type=html` And `profile` param can be set from `Kasha-Profile` header, `fallback` can be set from `Kasha-Fallback` header. Notice: the `hash` of the url won't be sent to server. If you need the `hash` to be sent to the server, use the `/render` API. ### Proxy mode If `host` header of the request is not `apiHost`, or `X-Forwarded-Host` or `Forwarded:...host=...` header is set, Then the requested URL will be treated as `url` query param of `/render` API. And `type` is set to `html`. For example, the following request ``` GET / Host: www.example.com Kasha-Profile: mobile Kasha-Fallback: 1 ``` is equivalent to `http://localhost:3000/render?url=https%3A%2F%2Fwww.example.com%2F&type=html&profile=mobile&fallback=1` ### GET /cache?url=URL Alias of `/render?url=ENCODED_URL&noWait` ### GET /:site/robots.txt Get `robots.txt` file with sitemaps collected by kasha. e.g.: ``` http://localhost:3000/https://www.example.com/robots.txt ``` It will fetch the `https://www.example.com/robots.txt` file, then append sitemap directives at the end. The result example: ```txt User-agent: * Disallow: /cgi-bin/ Disallow: /tmp/ Disallow: /private/ Sitemap: https://www.example.com/sitemaps.index.1.xml Sitemap: https://www.example.com/sitemaps.index.google.1.xml Sitemap: https://www.example.com/sitemaps.index.google.news.1.xml Sitemap: https://www.example.com/sitemaps.index.google.image.1.xml Sitemap: https://www.example.com/sitemaps.index.google.video.1.xml ``` ### GET /:site/sitemaps.:page.xml Get [sitemap](https://www.sitemaps.org/protocol.html) of page N. For example: ``` http://localhost:3000/https://www.example.com/sitemaps.1.xml ``` ### GET /:site/sitemaps.google.:page.xml Get [Google sitemap](https://support.google.com/webmasters/answer/183668) of page N. ### GET /:site/sitemaps.google.news.:page.xml Get [Google news sitemap](https://support.google.com/webmasters/answer/74288) of page N. ### GET /:site/sitemaps.google.image.:page.xml Get [Google image sitemap](https://support.google.com/webmasters/answer/178636) of page N. ### GET /:site/sitemaps.google.video.:page.xml Get [Google video sitemap](https://support.google.com/webmasters/answer/80471) of page N. ### GET /:site/sitemaps.index.:page.xml Get [sitemap index file](https://www.sitemaps.org/protocol.html#index) of page N. ### GET /:site/sitemaps.index.google.:page.xml Get Google sitemap index file of page N. ### GET /:site/sitemaps.index.google.news.:page.xml Get Google news sitemap index file of Page N. ### GET /:site/sitemaps.index.google.image.:page.xml Get Google image sitemap index file of Page N. ### GET /:site/sitemaps.index.google.video.:page.xml Get Google video sitemap index file of page N. ## Collecting sitemap data kasha can collect sitemap data from custom Open Graph `<meta>` tags. For example: ```html <head prefix="og: http://ogp.me/ns# sitemap: https://kasha-io.github.io/kasha/ns/sitemap#"> <!-- canonical url is used as <loc> tag of sitemap xml. <meta property="og:url" content="..."> can be used also. --> <link rel="canonical" href="https://www.example.com/test.html"> <meta property="sitemap:changefreq" content="hourly"> <meta property="sitemap:priority" content="1"> <meta property="sitemap:news:publication:name" content="The Example Times"> <meta property="sitemap:news:publication:language" content="en"> <meta property="sitemap:news:publication_date" content="2018-05-25T09:19:54.000Z"> <meta property="sitemap:news:title" content="Page Title"> <meta property="sitemap:image:loc" content="http://examples.opengraphprotocol.us/media/images/train.jpg"> <meta property="sitemap:image:caption" content="The caption of the image."> <meta property="sitemap:image:geo_location" content="Limerick, Ireland"> </head> ``` Sitemap data will be collected only if the `origin` of the canonical URL is the same as the current page. See here for available tags: [sitemap protocol](https://www.sitemaps.org/protocol.html) and [Google sitemap extensions](https://support.google.com/webmasters/answer/183668) ## License [MIT](LICENSE) The logo is made from [Prosymbols](https://www.flaticon.com/authors/prosymbols)</a>'s [camera](https://www.flaticon.com/free-icon/camera_204286) icon licensed by [Creative Commons BY 3.0](https://creativecommons.org/licenses/by/3.0/).