meta-og-scrape
Version:
Scrape meta, link & open graph tags from the head of document as a JavaScript object (JSON)
514 lines (418 loc) • 24.3 kB
HTML
<html>
<head prefix="og: http://ogp.me/ns#">
<meta charset="utf-8">
<title>The Open Graph protocol</title>
<meta name="description" content="The Open Graph protocol enables any web page to become a rich object in a social graph.">
<script type="text/javascript">var _sf_startpt=(new Date()).getTime()</script>
<link rel="stylesheet" href="base.css" type="text/css">
<meta property="og:title" content="Open Graph protocol">
<meta property="og:type" content="website">
<meta property="og:url" content="http://ogp.me/">
<meta property="og:image" content="http://ogp.me/logo.png">
<meta property="og:image:type" content="image/png">
<meta property="og:image:width" content="300">
<meta property="og:image:height" content="300">
<meta property="og:description" content="The Open Graph protocol enables any web page to become a rich object in a social graph.">
<meta property="og:audio" content="http://example.com/bond/theme.mp3" />
<meta property="og:description"
content="Sean Connery found fame and fortune as the suave, sophisticated British agent, James Bond." />
<meta property="og:determiner" content="the" />
<meta property="og:locale" content="en_GB" />
<meta property="og:locale:alternate" content="fr_FR" />
<meta property="og:locale:alternate" content="es_ES" />
<meta property="og:site_name" content="IMDb" />
<meta property="og:video" content="http://example.com/bond/trailer.swf" />
<meta prefix="fb: http://ogp.me/ns/fb#" property="fb:app_id" content="115190258555800">
<link rel="alternate" type="application/rdf+xml" href="http://ogp.me/ns/ogp.me.rdf">
<link rel="alternate" type="text/turtle" href="http://ogp.me/ns/ogp.me.ttl">
</head>
<body>
<div id="body">
<div id="header">
<h1>The Open Graph protocol</h1>
<img alt="Open Graph protocol logo" src="http://ogp.me/logo.png" width="300" height="300">
</div>
<div id="content">
<h2><a id="intro" href="#intro">Introduction</a></h2>
<p>The <a href="http://ogp.me/">Open Graph protocol</a> enables any web page to become a
rich object in a social graph. For instance, this is used on Facebook to allow
any web page to have the same functionality as any other object on Facebook.</p>
<p>While many different technologies and schemas exist and could be combined
together, there isn't a single technology which provides enough information to
richly represent any web page within the social graph. The Open Graph protocol
builds on these existing technologies and gives developers one thing to
implement. Developer simplicity is a key goal of the Open Graph protocol which
has informed many of
<a href="http://www.scribd.com/doc/30715288/The-Open-Graph-Protocol-Design-Decisions">the technical design decisions</a>.</p>
<hr />
<h2><a id="metadata" href="#metadata">Basic Metadata</a></h2>
<p>To turn your web pages into graph objects, you need to add basic metadata to
your page. We've based the initial version of the protocol on
<a href="http://en.wikipedia.org/wiki/RDFa">RDFa</a> which means that you'll place
additional <code><meta></code> tags in the <code><head></code> of your web page. The four required
properties for every page are:</p>
<ul>
<li><code>og:title</code> - The title of your object as it should appear within the graph,
e.g., "The Rock".</li>
<li><code>og:type</code> - The <a href="#types">type</a> of your object, e.g., "video.movie". Depending on
the type you specify, other properties may also be required.</li>
<li><code>og:image</code> - An image URL which should represent your object within the
graph.</li>
<li><code>og:url</code> - The canonical URL of your object that will be used as its
permanent ID in the graph, e.g., "http://www.imdb.com/title/tt0117500/".</li>
</ul>
<p>As an example, the following is the Open Graph protocol markup for <a href="http://www.imdb.com/title/tt0117500/">The Rock on
IMDB</a>:</p>
<pre><code><html prefix="og: http://ogp.me/ns#">
<head>
<title>The Rock (1996)</title>
<meta property="og:title" content="The Rock" />
<meta property="og:type" content="video.movie" />
<meta property="og:url" content="http://www.imdb.com/title/tt0117500/" />
<meta property="og:image" content="http://ia.media-imdb.com/images/rock.jpg" />
...
</head>
...
</html>
</code></pre>
<h3><a id="optional" href="#optional">Optional Metadata</a></h3>
<p>The following properties are optional for any object and are generally
recommended:</p>
<ul>
<li><code>og:audio</code> - A URL to an audio file to accompany this object.</li>
<li><code>og:description</code> - A one to two sentence description of your object.</li>
<li><code>og:determiner</code> - The word that appears before this object's title
in a sentence. An <a href="#enum">enum</a> of (a, an, the, "", auto). If <code>auto</code> is
chosen, the consumer of your data should chose between "a" or "an".
Default is "" (blank).</li>
<li><code>og:locale</code> - The locale these tags are marked up in.
Of the format <code>language_TERRITORY</code>. Default is <code>en_US</code>.</li>
<li><code>og:locale:alternate</code> - An <a href="#array">array</a> of other locales this page is
available in.</li>
<li><code>og:site_name</code> - If your object is part of a larger web site, the name which
should be displayed for the overall site. e.g., "IMDb".</li>
<li><code>og:video</code> - A URL to a video file that complements this object.</li>
</ul>
<p>For example (line-break solely for display purposes):</p>
<pre><code><meta property="og:audio" content="http://example.com/bond/theme.mp3" />
<meta property="og:description"
content="Sean Connery found fame and fortune as the
suave, sophisticated British agent, James Bond." />
<meta property="og:determiner" content="the" />
<meta property="og:locale" content="en_GB" />
<meta property="og:locale:alternate" content="fr_FR" />
<meta property="og:locale:alternate" content="es_ES" />
<meta property="og:site_name" content="IMDb" />
<meta property="og:video" content="http://example.com/bond/trailer.swf" />
</code></pre>
<p>The RDF schema (in <a href="http://en.wikipedia.org/wiki/Turtle_(syntax)">Turtle</a>)
can be found at <a href="http://ogp.me/ns/ogp.me.ttl">ogp.me/ns</a>.</p>
<hr />
<h2><a id="structured" href="#structured">Structured Properties</a></h2>
<p>Some properties can have extra metadata attached to them.
These are specified in the same way as other metadata with <code>property</code> and
<code>content</code>, but the <code>property</code> will have extra <code>:</code>.</p>
<p>The <code>og:image</code> property has some optional structured properties:</p>
<ul>
<li><code>og:image:url</code> - Identical to <code>og:image</code>.</li>
<li><code>og:image:secure_url</code> - An alternate url to use if the webpage requires
HTTPS.</li>
<li><code>og:image:type</code> - A <a href="http://en.wikipedia.org/wiki/Internet_media_type">MIME type</a> for this image.</li>
<li><code>og:image:width</code> - The number of pixels wide.</li>
<li><code>og:image:height</code> - The number of pixels high.</li>
</ul>
<p>A full image example:</p>
<pre><code><meta property="og:image" content="http://example.com/ogp.jpg" />
<meta property="og:image:secure_url" content="https://secure.example.com/ogp.jpg" />
<meta property="og:image:type" content="image/jpeg" />
<meta property="og:image:width" content="400" />
<meta property="og:image:height" content="300" />
</code></pre>
<p>The <code>og:video</code> tag has the identical tags as <code>og:image</code>. Here is an example:</p>
<pre><code><meta property="og:video" content="http://example.com/movie.swf" />
<meta property="og:video:secure_url" content="https://secure.example.com/movie.swf" />
<meta property="og:video:type" content="application/x-shockwave-flash" />
<meta property="og:video:width" content="400" />
<meta property="og:video:height" content="300" />
</code></pre>
<p>The <code>og:audio</code> tag only has the first 3 properties available
(since size doesn't make sense for sound):</p>
<pre><code><meta property="og:audio" content="http://example.com/sound.mp3" />
<meta property="og:audio:secure_url" content="https://secure.example.com/sound.mp3" />
<meta property="og:audio:type" content="audio/mpeg" />
</code></pre>
<hr />
<h2><a id="array" href="#array">Arrays</a></h2>
<p>If a tag can have multiple values, just put multiple versions of the same
<code><meta></code> tag on your page. The first tag (from top to bottom) is given
preference during conflicts.</p>
<pre><code><meta property="og:image" content="http://example.com/rock.jpg" />
<meta property="og:image" content="http://example.com/rock2.jpg" />
</code></pre>
<p>Put structured properties after you declare their root tag. Whenever
another root element is parsed, that structured property
is considered to be done and another one is started.</p>
<p>For example:</p>
<pre><code><meta property="og:image" content="http://example.com/rock.jpg" />
<meta property="og:image:width" content="300" />
<meta property="og:image:height" content="300" />
<meta property="og:image" content="http://example.com/rock2.jpg" />
<meta property="og:image" content="http://example.com/rock3.jpg" />
<meta property="og:image:height" content="1000" />
</code></pre>
<p>means there are 3 images on this page, the first image is <code>300x300</code>, the middle
one has unspecified dimensions, and the last one is <code>1000</code>px tall.</p>
<hr />
<h2><a id="types" href="#types">Object Types</a></h2>
<p>In order for your object to be represented within the graph, you need to
specify its type. This is done using the <code>og:type</code> property:</p>
<pre><code><meta property="og:type" content="website" />
</code></pre>
<p>When the community agrees on the schema for a type, it is added to the list
of global types. All other objects in the type system are
<a href="http://en.wikipedia.org/wiki/CURIE">CURIEs</a> of the form</p>
<pre><code><head prefix="my_namespace: http://example.com/ns#">
<meta property="og:type" content="my_namespace:my_type" />
</code></pre>
<p>The global types are grouped into verticals. Each vertical has its
own namespace. The <code>og:type</code> values for a namespace are always prefixed with
the namespace and then a period.
This is to reduce confusion with user-defined namespaced types which always
have colons in them.</p>
<h3><a id="type_music" href="#type_music">Music</a></h3>
<ul>
<li>Namespace URI: <a href="http://ogp.me/ns/music"><code>http://ogp.me/ns/music#</code></a></li>
</ul>
<p><code>og:type</code> values:</p>
<p><a name="type_music.song" href="#type_music.song"><code>music.song</code></a></p>
<ul>
<li><code>music:duration</code> - <a href="#integer">integer</a> >=1 - The song's length in seconds.</li>
<li><code>music:album</code> - <a href="#type_music.album">music.album</a> <a href="#array">array</a> -
The album this song is from.</li>
<li><code>music:album:disc</code> - <a href="#integer">integer</a> >=1 -
Which disc of the album this song is on.</li>
<li><code>music:album:track</code> - <a href="#integer">integer</a> >=1 -
Which track this song is.</li>
<li><code>music:musician</code> - <a href="#type_profile">profile</a> <a href="#array">array</a> -
The musician that made this song.</li>
</ul>
<p><a name="type_music.album" href="#type_music.album"><code>music.album</code></a></p>
<ul>
<li><code>music:song</code> - <a href="#type_music.song">music.song</a> - The song on this album.</li>
<li><code>music:song:disc</code> - <a href="#integer">integer</a> >=1 -
The same as <code>music:album:disc</code> but in reverse.</li>
<li><code>music:song:track</code> - <a href="#integer">integer</a> >=1 -
The same as <code>music:album:track</code> but in reverse.</li>
<li><code>music:musician</code> - <a href="#type_profile">profile</a> -
The musician that made this song.</li>
<li><code>music:release_date</code> - <a href="#datetime">datetime</a> -
The date the album was released.</li>
</ul>
<p><a name="type_music.playlist" href="#type_music.playlist"><code>music.playlist</code></a></p>
<ul>
<li><code>music:song</code> - Identical to the ones on <a href="#type_music.album">music.album</a></li>
<li><code>music:song:disc</code></li>
<li><code>music:song:track</code></li>
<li><code>music:creator</code> - <a href="#type_profile">profile</a> - The creator of this playlist.</li>
</ul>
<p><a name="type_music.radio_station" href="#type_music.radio_station"><code>music.radio_station</code></a></p>
<ul>
<li><code>music:creator</code> - <a href="#type_profile">profile</a> - The creator of this station.</li>
</ul>
<h3><a id="type_video" href="#type_video">Video</a></h3>
<ul>
<li>Namespace URI: <a href="http://ogp.me/ns/video"><code>http://ogp.me/ns/video#</code></a></li>
</ul>
<p><code>og:type</code> values:</p>
<p><a name="type_video.movie" href="#type_video.movie"><code>video.movie</code></a></p>
<ul>
<li><code>video:actor</code> - <a href="#type_profile">profile</a> <a href="#array">array</a> -
Actors in the movie.</li>
<li><code>video:actor:role</code> - <a href="#string">string</a> - The role they played.</li>
<li><code>video:director</code> - <a href="#type_profile">profile</a> <a href="#array">array</a> -
Directors of the movie.</li>
<li><code>video:writer</code> - <a href="#type_profile">profile</a> <a href="#array">array</a> -
Writers of the movie.</li>
<li><code>video:duration</code> - <a href="#integer">integer</a> >=1 -
The movie's length in seconds.</li>
<li><code>video:release_date</code> - <a href="#datetime">datetime</a> -
The date the movie was released.</li>
<li><code>video:tag</code> - <a href="#string">string</a> <a href="#array">array</a> -
Tag words associated with this movie.</li>
</ul>
<p><a name="type_video.episode" href="#type_video.episode"><code>video.episode</code></a></p>
<ul>
<li><code>video:actor</code> - Identical to <a href="#type_video.movie">video.movie</a></li>
<li><code>video:actor:role</code></li>
<li><code>video:director</code></li>
<li><code>video:writer</code></li>
<li><code>video:duration</code></li>
<li><code>video:release_date</code></li>
<li><code>video:tag</code></li>
<li><code>video:series</code> - <a href="#type_video.tv_show">video.tv_show</a> -
Which series this episode belongs to.</li>
</ul>
<p><a name="type_video.tv_show" href="#type_video.tv_show"><code>video.tv_show</code></a></p>
<p>A multi-episode TV show.
The metadata is identical to <a href="#type_video.movie">video.movie</a>.</p>
<p><a name="type_video.other" href="#type_video.other"><code>video.other</code></a></p>
<p>A video that doesn't belong in any other category.
The metadata is identical to <a href="#type_video.movie">video.movie</a>.</p>
<h3><a id="no_vertical" href="#no_vertical">No Vertical</a></h3>
<p>These are globally defined objects that just don't fit into a vertical but
yet are broadly used and agreed upon.</p>
<p><code>og:type</code> values:</p>
<p><a name="type_article" href="#type_article"><code>article</code></a> - Namespace URI: <a href="http://ogp.me/ns/article"><code>http://ogp.me/ns/article#</code></a></p>
<ul>
<li><code>article:published_time</code> - <a href="#datetime">datetime</a> -
When the article was first published.</li>
<li><code>article:modified_time</code> - <a href="#datetime">datetime</a> -
When the article was last changed.</li>
<li><code>article:expiration_time</code> - <a href="#datetime">datetime</a> -
When the article is out of date after.</li>
<li><code>article:author</code> - <a href="#type_profile">profile</a> <a href="#array">array</a> -
Writers of the article.</li>
<li><code>article:section</code> - <a href="#string">string</a> - A high-level section name. E.g. Technology</li>
<li><code>article:tag</code> - <a href="#string">string</a> <a href="#array">array</a> -
Tag words associated with this article.</li>
</ul>
<p><a name="type_book" href="#type_book"><code>book</code></a> - Namespace URI: <a href="http://ogp.me/ns/book"><code>http://ogp.me/ns/book#</code></a></p>
<ul>
<li><code>book:author</code> - <a href="#type_profile">profile</a> <a href="#array">array</a> - Who wrote this book.</li>
<li><code>book:isbn</code> - <a href="#string">string</a> -
The <a href="http://en.wikipedia.org/wiki/International_Standard_Book_Number">ISBN</a></li>
<li><code>book:release_date</code> - <a href="#datetime">datetime</a> - The date the book was released.</li>
<li><code>book:tag</code> - <a href="#string">string</a> <a href="#array">array</a> -
Tag words associated with this book.</li>
</ul>
<p><a name="type_profile" href="#type_profile"><code>profile</code></a> - Namespace URI: <a href="http://ogp.me/ns/profile"><code>http://ogp.me/ns/profile#</code></a></p>
<ul>
<li><code>profile:first_name</code> - <a href="#string">string</a> - A name normally given to an individual by a parent or self-chosen.</li>
<li><code>profile:last_name</code> - <a href="#string">string</a> - A name inherited from a family or marriage and by which the individual is commonly known.</li>
<li><code>profile:username</code> - <a href="#string">string</a> - A short unique string to identify them.</li>
<li><code>profile:gender</code> - <a href="#enum">enum</a>(male, female) - Their gender.</li>
</ul>
<p><a name="type_website" href="#type_website"><code>website</code></a> - Namespace URI: <a href="http://ogp.me/ns/website"><code>http://ogp.me/ns/website#</code></a></p>
<p>No additional properties other than the basic ones.
Any non-marked up webpage should be treated as <code>og:type</code> website.</p>
<hr />
<h2><a id="data_types" href="#data_types">Types</a></h2>
<p>The following types are used when defining attributes in Open Graph protocol.</p>
<table>
<tr>
<th width=150><b>Type</b></th>
<th width=300><b>Description</b></th>
<th width=200><b>Literals</b></th>
</tr>
<tr>
<td><a name="bool" href="#bool">Boolean</td>
<td>A Boolean represents a true or false value</td>
<td>true, false, 1, 0</td>
</tr>
<tr>
<td><a name="datetime" href="#datetime">DateTime</td>
<td>A DateTime represents a temporal value composed of a date
(year, month, day) and an optional time component (hours, minutes)</td>
<td><a href="http://en.wikipedia.org/wiki/ISO_8601">ISO 8601</a></td>
</tr>
<tr>
<td><a name="enum" href="#enum">Enum</td>
<td>A type consisting of bounded set of constant string values
(enumeration members).
<td>A string value that is a member of the enumeration</td>
</tr>
<tr>
<td><a name="float" href="#float">Float</td>
<td>A 64-bit signed floating point number</td>
<td>All literals that conform to the following formats:<br><br>
1.234<br>
-1.234<br>
1.2e3<br>
-1.2e3<br>
7E-10<br>
</td>
</tr>
<tr>
<td><a name="integer" href="#integer">Integer</td>
<td>A 32-bit signed integer. In many languages integers over 32-bits become
floats, so we limit Open Graph protocol for easy multi-language use.</td>
<td>All literals that conform to the following formats:<br><br>
1234<br>
-123<br>
</td>
</tr>
<tr>
<td><a name="string" href="#string">String</td>
<td>A sequence of Unicode characters</td>
<td>All literals composed of Unicode characters with no escape characters</td>
</tr>
<tr>
<td><a name="url" href="#url">URL</td>
<td>A sequence of Unicode characters that identify an Internet resource.
<td>All valid URLs that utilize the http:// or https:// protocols</td>
</tr>
</table>
<hr />
<h2><a id='discuss' href="#discuss">Discussion and support</a></h2>
<p>You can discuss the Open Graph Protocol in
<a href="https://www.facebook.com/groups/opengraph/">the Facebook group</a> or on
<a href="http://groups.google.com/group/open-graph-protocol">the developer mailing list</a>.
It is currently being consumed by Facebook
(<a href="http://developers.facebook.com/docs/opengraph/">see their documentation</a>), Google (<a href="https://developers.google.com/+/web/+1button/#plus-snippet">see their documentation</a>), and
<a href="http://developer.mixi.co.jp/en/connect/mixi_plugin/mixi_check/spec_mixi_check/">mixi</a>.
It is being published by IMDb, Microsoft, NHL, Posterous, Rotten Tomatoes,
TIME, Yelp, and many many others.</p>
<hr />
<h2><a id='implementations' href="#implementations">Implementations</a></h2>
<p>The open source community has developed a number of parsers and publishing
tools. Let the Facebook group know if you've built something awesome too!</p>
<ul>
<li><a href="http://developers.facebook.com/tools/debug/">Facebook Object Debugger</a> -
Facebook's official parser and debugger</li>
<li><a href="http://www.google.com/webmasters/tools/richsnippets">Google Rich Snippets Testing Tool</a> - Open Graph protocol support in specific verticals and Search Engines.</li>
<li><a href="https://github.com/niallkennedy/open-graph-protocol-tools">PHP Validator and Markup Generator</a> - OGP 2011 input validator and markup generator in PHP5 objects</li>
<li><a href="http://github.com/scottmac/opengraph">PHP Consumer</a> -
a small library for accessing of Open Graph Protocol data in PHP</li>
<li><a href="http://buzzword.org.uk/2010/opengraph/#php">OpenGraphNode in PHP</a> -
a simple parser for PHP</li>
<li><a href="http://pypi.python.org/pypi/PyOpenGraph">PyOpenGraph</a> -
a library written in Python for parsing Open Graph protocol
information from web sites</li>
<li><a href="http://github.com/intridea/opengraph">OpenGraph Ruby</a> -
Ruby Gem which parses web pages and extracts Open Graph protocol markup</li>
<li><a href="http://github.com/callumj/opengraph-java">OpenGraph for Java</a> -
small Java class used to represent the Open Graph protocol</li>
<li><a href="http://search.cpan.org/~tobyink/RDF-RDFa-Parser/lib/RDF/RDFa/Parser.pm">RDF::RDFa::Parser</a> -
Perl RDFa parser which understands the Open Graph protocol</li>
<li><a href="http://wordpress.org/plugins/facebook/">WordPress plugin</a> -
Facebook's official WordPress plugin, which adds Open Graph metadata to WordPress powered sites. </li>
<li><a href="http://wordpress.org/plugins/wp-facebook-open-graph-protocol/">Alternate WordPress OGP plugin</a> -</li>
</ul>
<h2> A simple lightweight WordPress plugin which adds Open Graph metadata to WordPress powered sites.</h2>
</div>
<div id="footer">
<iframe src="//www.facebook.com/plugins/like.php?href=http%3A%2F%2Fogp.me%2F&send=false&layout=standard&width=450&show_faces=true&action=like&colorscheme=light&height=80&appId=115190258555800" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe>
<p>The Open Graph protocol was originally created at Facebook and is inspired by <a href="http://en.wikipedia.org/wiki/Dublin_Core">Dublin Core</a>, <a href="http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html">link-rel canonical</a>, <a href="http://microformats.org/">Microformats</a>, and <a href="http://en.wikipedia.org/wiki/RDFa">RDFa</a>. The specification described on this page is available under the <a href="http://openwebfoundation.org/legal/the-0-9-agreements---necessary-claims">Open Web Foundation Agreement, Version 0.9</a>. This website is <a href="https://github.com/facebook/open-graph-protocol">Open Source</a>. Last updated <time pubdate="2014-10-20T23:03:05+00:00">October 20th, 2014</time></p>
</div>
</div>
<script type="text/javascript">
var _sf_async_config={uid:1415,domain:"ogp.me"};
(function(){
function loadChartbeat() {
window._sf_endpt=(new Date()).getTime();
var e = document.createElement('script');
e.setAttribute('language', 'javascript');
e.setAttribute('type', 'text/javascript');
e.setAttribute('src',
(("https:" == document.location.protocol) ? "https://s3.amazonaws.com/" : "http://") +
"static.chartbeat.com/js/chartbeat.js");
document.body.appendChild(e);
}
var oldonload = window.onload;
window.onload = (typeof window.onload != 'function') ?
loadChartbeat : function() { oldonload(); loadChartbeat(); };
})();
</script>
</body>
</html>