third-party-web
Version:
Categorized data on third party entities on the web.
152 lines (101 loc) • 4.85 kB
Markdown
# [Third Party Web](https://www.thirdpartyweb.today/)
## Check out the shiny new web UI https://www.thirdpartyweb.today/
Data on third party entities and their impact on the web.
This document is a summary of which third party scripts are most responsible for excessive JavaScript execution on the web today.
## Table of Contents
1. [Goals](#goals)
1. [Methodology](#methodology)
1. [npm Module](#npm-module)
1. [Updates](#updates)
1. [Data](#data)
1. [Summary](#summary)
1. [How to Interpret](#how-to-interpret)
1. [Third Parties by Category](#by-category)
<%= category_table_of_contents %>
1. [Third Parties by Total Impact](#by-total-impact)
1. [Future Work](#future-work)
1. [FAQs](#faqs)
1. [Contributing](#contributing)
## Goals
<%= partials.goals %>
## Methodology
<%= partials.methodology %>
## npm Module
The entity classification data is available as an npm module.
```js
const {getEntity} = require('third-party-web')
const entity = getEntity('https://d36mpcpuzc4ztk.cloudfront.net/js/visitor.js')
console.log(entity)
// {
// "name": "Freshdesk",
// "homepage": "https://freshdesk.com/",
// "category": "customer-success",
// "domains": ["d36mpcpuzc4ztk.cloudfront.net"]
// }
```
## Updates
<%= updates_contents %>
## Data
### Summary
Across top ~4 million sites, ~2700 origins account for ~57% of all script execution time with the top 50 entities already accounting for ~47%. Third party script execution is the majority chunk of the web today, and it's important to make informed choices.
### How to Interpret
Each entity has a number of data points available.
1. **Usage (Total Number of Occurrences)** - how many scripts from their origins were included on pages
1. **Total Impact (Total Execution Time)** - how many seconds were spent executing their scripts across the web
1. **Average Impact (Average Execution Time)** - on average, how many milliseconds were spent executing each script
1. **Category** - what type of script is this
<a name="by-category"></a>
### Third Parties by Category
This section breaks down third parties by category. The third parties in each category are ranked from first to last based on the average impact of their scripts. Perhaps the most important comparisons lie here. You always need to pick an analytics provider, but at least you can pick the most well-behaved analytics provider.
#### Overall Breakdown
Unsurprisingly, ads account for the largest identifiable chunk of third party script execution.

<%= category_contents %>
<a name="by-total-impact"></a>
### Third Parties by Total Impact
This section highlights the entities responsible for the most script execution across the web. This helps inform which improvements would have the largest total impact.
<%= all_data %>
## Future Work
1. Introduce URL-level data for more fine-grained analysis, i.e. which libraries from Cloudflare/Google CDNs are most expensive.
1. Expand the scope, i.e. include more third parties and have greater entity/category coverage.
## FAQs
<%= partials.faqs %>
## Contributing
### Thanks
A **huge** thanks to [@simonhearne](https://twitter.com/simonhearne) and [@soulgalore](https://twitter.com/soulislove) for their assistance in classifying additional domains!
### Updating the Entities
The domain->entity mapping can be found in `data/entities.js`. Adding a new entity is as simple as adding a new array item with the following form.
```js
{
"name": "Facebook",
"homepage": "https://www.facebook.com",
"category": "social",
"domains": [
"*.facebook.com",
"*.fbcdn.net"
],
"examples": [
"www.facebook.com",
"connect.facebook.net",
"staticxx.facebook.com",
"static.xx.fbcdn.net",
"m.facebook.com"
]
}
```
### Updating Attribution Logic
The logic for attribution to individual script URLs can be found in the [Lighthouse repo](https://github.com/GoogleChrome/lighthouse). File an issue over there to discuss further.
### Updating the Data
This is now automated! Run `yarn start:update-ha-data` with a `gcp-credentials.json` file in the root directory of this project (look at `bin/automated-update.js` for the steps involved).
### Updating this README
This README is auto-generated from the templates `lib/` and the computed data. In order to update the charts, you'll need to make sure you have `cairo` installed locally in addition to `yarn install`.
```bash
# Install `cairo` and dependencies for node-canvas
brew install pkg-config cairo pango libpng jpeg giflib
# Build the requirements in this repo
yarn build
# Regenerate the README
yarn start
```
### Updating the website
The web code is located in `www/` directory of this repository. Open a PR to make changes.