UNPKG

third-party-web

Version:

Categorized data on third party entities on the web.

152 lines (101 loc) 4.85 kB
# [Third Party Web](https://www.thirdpartyweb.today/) ## Check out the shiny new web UI https://www.thirdpartyweb.today/ Data on third party entities and their impact on the web. This document is a summary of which third party scripts are most responsible for excessive JavaScript execution on the web today. ## Table of Contents 1. [Goals](#goals) 1. [Methodology](#methodology) 1. [npm Module](#npm-module) 1. [Updates](#updates) 1. [Data](#data) 1. [Summary](#summary) 1. [How to Interpret](#how-to-interpret) 1. [Third Parties by Category](#by-category) <%= category_table_of_contents %> 1. [Third Parties by Total Impact](#by-total-impact) 1. [Future Work](#future-work) 1. [FAQs](#faqs) 1. [Contributing](#contributing) ## Goals <%= partials.goals %> ## Methodology <%= partials.methodology %> ## npm Module The entity classification data is available as an npm module. ```js const {getEntity} = require('third-party-web') const entity = getEntity('https://d36mpcpuzc4ztk.cloudfront.net/js/visitor.js') console.log(entity) // { // "name": "Freshdesk", // "homepage": "https://freshdesk.com/", // "category": "customer-success", // "domains": ["d36mpcpuzc4ztk.cloudfront.net"] // } ``` ## Updates <%= updates_contents %> ## Data ### Summary Across top ~4 million sites, ~2700 origins account for ~57% of all script execution time with the top 50 entities already accounting for ~47%. Third party script execution is the majority chunk of the web today, and it's important to make informed choices. ### How to Interpret Each entity has a number of data points available. 1. **Usage (Total Number of Occurrences)** - how many scripts from their origins were included on pages 1. **Total Impact (Total Execution Time)** - how many seconds were spent executing their scripts across the web 1. **Average Impact (Average Execution Time)** - on average, how many milliseconds were spent executing each script 1. **Category** - what type of script is this <a name="by-category"></a> ### Third Parties by Category This section breaks down third parties by category. The third parties in each category are ranked from first to last based on the average impact of their scripts. Perhaps the most important comparisons lie here. You always need to pick an analytics provider, but at least you can pick the most well-behaved analytics provider. #### Overall Breakdown Unsurprisingly, ads account for the largest identifiable chunk of third party script execution. ![breakdown by category](./by-category.png) <%= category_contents %> <a name="by-total-impact"></a> ### Third Parties by Total Impact This section highlights the entities responsible for the most script execution across the web. This helps inform which improvements would have the largest total impact. <%= all_data %> ## Future Work 1. Introduce URL-level data for more fine-grained analysis, i.e. which libraries from Cloudflare/Google CDNs are most expensive. 1. Expand the scope, i.e. include more third parties and have greater entity/category coverage. ## FAQs <%= partials.faqs %> ## Contributing ### Thanks A **huge** thanks to [@simonhearne](https://twitter.com/simonhearne) and [@soulgalore](https://twitter.com/soulislove) for their assistance in classifying additional domains! ### Updating the Entities The domain->entity mapping can be found in `data/entities.js`. Adding a new entity is as simple as adding a new array item with the following form. ```js { "name": "Facebook", "homepage": "https://www.facebook.com", "category": "social", "domains": [ "*.facebook.com", "*.fbcdn.net" ], "examples": [ "www.facebook.com", "connect.facebook.net", "staticxx.facebook.com", "static.xx.fbcdn.net", "m.facebook.com" ] } ``` ### Updating Attribution Logic The logic for attribution to individual script URLs can be found in the [Lighthouse repo](https://github.com/GoogleChrome/lighthouse). File an issue over there to discuss further. ### Updating the Data This is now automated! Run `yarn start:update-ha-data` with a `gcp-credentials.json` file in the root directory of this project (look at `bin/automated-update.js` for the steps involved). ### Updating this README This README is auto-generated from the templates `lib/` and the computed data. In order to update the charts, you'll need to make sure you have `cairo` installed locally in addition to `yarn install`. ```bash # Install `cairo` and dependencies for node-canvas brew install pkg-config cairo pango libpng jpeg giflib # Build the requirements in this repo yarn build # Regenerate the README yarn start ``` ### Updating the website The web code is located in `www/` directory of this repository. Open a PR to make changes.