UNPKG

elsewhere

Version:

A node project that aims to replicate the functionality of the Google Social Graph API

419 lines (260 loc) 14.7 kB
<!doctype html> <!-- _____ __ ___ __ | \| |~-.~-~.-.-~-~.-~-~-~-~.-~-.-.' _| |.-~.~-. | | | | _ | _| | _ | _| || | | |_____/|__|__|___._|__| |__|__|__|___._|__| |__||___ | |_____| dharmafly.com --> <!--[if IE 9]><html class="ie" lang="en"><![endif]--> <!--[if lt IE 9 ]> <html class="ie ltIE9" lang="en"> <![endif]--> <!--[if !IE ]><!--> <html lang="en"> <!--<![endif]--> <head> <meta charset="utf-8"> <meta name="viewport" content=""> <title>Elsewhere</title> <script> var satya = satya || {}; satya.isIPad = navigator.userAgent.match(/iPad/i) != null; if(document.getElementsByName && !satya.isIPad){ document.getElementsByName('viewport')[0] .setAttribute('content','width=device-width,initial-scale=0.75,user-scalable=no,maximum-scale=0.75'); } </script> <!--[if lt IE 9]><script src="/javascript/html5shiv.js"></script><![endif]--> <script type="text/javascript"> var WebFontConfig = { custom: { families: [ 'Open+Sans:400,600:latin', 'Zeyada::latin', 'Ubuntu+Mono::latin' ], urls: [ '/css/fonts/seagrass.css' ] } }; (function() { var wf = document.createElement('script'); wf.src = ('https:' == document.location.protocol ? 'https' : 'http') + '://ajax.googleapis.com/ajax/libs/webfont/1/webfont.js'; wf.type = 'text/javascript'; wf.async = 'true'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(wf, s); })(); </script> <link href="/css/seagrass.css?v3" rel="stylesheet"> <link rel="shortcut icon" sizes="16x16" type="image/vnd.microsoft.icon" href="/img/seagrass-favicon.ico"> <link rel="apple-touch-icon" href="/img/seagrass-apple-touch-icon.png"> </head> <body class="home loading"> <header> <h1 class="title"> <a href="/"> Elsewhere </a> </h1> </header> <nav id="navigation"> <ul> <li class="show-subnav"><a href="#subnav" class="icon">Go to navigation for this page</a></li> <li> <a href="/" class="active"> Overview </a> </li> <li> <a href="/reference/"> Reference </a> </li> </ul> </nav> <section class="content"> <nav id="subnav"> <ul> <li> <a href="#Overview">Overview</a> </li> <li> <a href="#download">Download</a> </li> <li> <a href="#How-does-it-work"> How does it work? </a> </li> <li> <a href="#Getting-Started"> Getting Started </a> </li> <li> <a href="#Changelog"> Changelog </a> </li> </ul> </nav> <section class="overview"> <h1> <a id="Overview" class="permalink" href="#Overview">&#9875;</a>Overview </h1> <article class="embedded markdown"> <p>&#8202;<span class='project-name'>Elsewhere</span> is a <a href='http://nodejs.org'>Node.js</a> project that aims to replicate part of the functionality of the Google&#8217;s now discontinued <a href='http://ajaxian.com/archives/google-social-graph-api-released'>Social Graph API</a>. When given the URL of a person&#8217;s website or social media profile (e.g. a <a href='https://twitter.com/dharmafly'>Twitter account</a>), it outputs a JSON-formatted list of the other websites and social media profiles that belong to that person. In other words, it can determine a person&#8217;s <a href='https://en.wikipedia.org/wiki/Social_graph'>&#8216;social graph&#8217;</a> from a single URL in the graph.</p> <p>Elsewhere can be set up as a web service, providing a JSON API that can be easily queried over a network. It can also be included as a Node module and used directly within a server-side project.</p> </article> </section> <aside class="icons"> <ul> <li class="github"><a href="https://github.com/dharmafly/elsewhere" title="Elsewhere on Github">Elsewhere on Github</a></li> <li class="twitter"><a href="https://twitter.com/dharmafly" title="Dharmafly on Twitter"> Dharmafly on Twitter</a></li> <li class="code-javascript" title="This is a JavaScript project"><span>This is a JavaScript project</span></li> </ul> </aside> <section> <h1><a id="download" class="permalink" href="#download">&#9875;</a> Download</h1> <p><a href="https://github.com/dharmafly/elsewhere" target="_blank">Star the project</a> on GitHub, or download it:</p> <p class="buttons"> <a class="badge github" href="https://github.com/dharmafly/elsewhere" title="Elsewhere on GitHub" target="_blank"><span>Elsewhere on GitHub</span></a> <a class="button" href="https://github.com/dharmafly/elsewhere/zipball/master" target="_blank" title="Master branch zip"> Elsewhere <span class="subtext">v0.0.4</span> </a> </p> </section> <section> <h1> <a id="How-does-it-work" class="permalink" href="#How-does-it-work">&#9875;</a> How does it work? </h1> <article class="embedded markdown"> <p>To use Elsewhere, simply provide it with a URL. Elsewhere will use this target as the entry point to the graph and will search it for links that contain the attribute <a href='http://microformats.org/wiki/rel-me'><code>rel=me</code></a>:</p> <pre><code>&lt;a href=&quot;http://dharmafly.com&quot; rel=&quot;me&quot;&gt;Dharmafly&lt;/a&gt;</code></pre> <p>The <code>rel=me</code> attribute is a microformat to assert that the link is to a website, page or resource that is owned by (or is about) the same person as the page the link is on.</p> <p>For example, if you&#8217;ve given Elsewhere a URL that&#8217;s a Twitter profile, they usually contain a link to that person or company&#8217;s webpage; this link has the <code>rel=me</code> microformat.</p> <p>When Elsewhere finds a <code>rel=me</code> link or links at a URL, it searches each of them for more, building a comprehensive graph along the way.</p> <p>For example, a person&#8217;s Twitter profile page may link to his or her home page, which in turn links to their Last.fm, Flickr, Facebook, GitHub, LinkedIn, Google+ profiles etc.</p> <p>Elsewhere can only search public profiles and webpages for links. If a page isn&#8217;t public, Elsewhere can&#8217;t search it for links. It&#8217;s also worth noting that profile owners deliberately place these links on their profiles to make them discoverable. If the profile owner has neglected to place a link there, Elsewhere won&#8217;t find one.</p> <p>Once Elsewhere has run out of new <code>rel=me</code> links to search, it returns a list of all the URLs it has found. This list is what is referred to as the &#8216;social graph&#8217;, the owner of which being the owner of the URL you initially gave Elsewhere.</p> <h2 id='strict_mode_and_verified_links'>Strict Mode and verified links</h2> <p>Elsewhere can make strict checks to verify that that each linked URL is indeed owned by the same person as the original site. After all, anyone could create a website, add a <code>rel=me</code> link to <a href='http://www.elvis.com'>Elvis Presley</a>&#8217;s website and claim to be him.</p> <p>Elsewhere checks if the linked page itself has a <code>rel=me</code> link back to the original URL. If there is such a reciprocal link, then the relationship is deemed to be &#8216;verified&#8217;.</p> <p>But Elsewhere is more sophisticated than that. The reciprocal link doesn&#8217;t have to be directly between the two sites. For example, if a Twitter account links to a GitHub account, which links to a home page, which links back to the Twitter account, then the relationship between the Twitter account and home page will be verified, even though the two don&#8217;t directly link to each other.</p> <p>Elsewhere operates in non-strict mode by default, in which it will return both verified and unverified URLs. This mode is useful because many profile pages and personal websites lack <code>rel=me</code> links, making it difficult to verify those links and leading to many legitimate links being missed.</p> <p>To be absolutely sure of the stated relationships, turn on strict mode (by setting the <code>strict</code> option to <code>true</code>) and only verified URLs will be returned.</p> <h2 id='url_shortners_and_redirects'>URL shortners and redirects</h2> <p>When elsewhere follows a link and that link resolves to a different URL, that new resolved URL takes precedence over the original. For instance:</p> <pre><code>http://github.com/chrisnewtn -&gt; https://github.com/chrisnewtn http://t.co/vV5BWNxil2 -&gt; http://chrisnewtn.com</code></pre> <p>The original links to a page are still shown in the graph in that page&#8217;s <code>urlAliases</code> collection, but as far as the rest of the graph is concerned, that link is now known by its resolved name.</p> <p>Were URL shorteners and redirects ignored, you&#8217;d end up with a situation where both <code>http://github.com/user</code> and <code>https://github.com/user</code> were in your graph as two seperate pages, which is clearly incorrect.</p> </article> </section> <section> <h1> <a id="Getting-Started" class="permalink" href="#Getting-Started">&#9875;</a> Getting Started </h1> <article class="embedded markdown"> <p>Elsewhere requires <a href='http://nodejs.org'>Node.js</a> to be installed first.</p> <p>Clone the repo and start the server by running these commands in the terminal:</p> <pre><code>git clone git@github.com:dharmafly/elsewhere.git cd elsewhere npm install bin/elsewhere</code></pre> <p>Now head to <a href='http://localhost:8888'><code>localhost:8888</code></a>. You can test the API on this page by entering a URL into the &#8216;url&#8217; box and clicking &#8216;Parse&#8217;. This will render the graph as a list on the page, complete with the names of each page of the graph and their respective favicons.</p> <p>You can also test the API by simply appending the target URL to your address bar like so:</p> <pre><code>http://localhost:8888/?url=chrisnewtn.com</code></pre> <p>This will return a JSON version of the graph e.g.</p> <pre><code>{ results: [ { url: &quot;http://chrisnewtn.com&quot;, title: &quot;Chris Newton&quot;, favicon: &quot;http://chrisnewtn.com/favicon.ico&quot;, outboundLinks: { verified: [ ... ], unverified: [ ] }, inboundCount: { verified: 4, unverified: 0 }, verified: true, urlAliases: [ &quot;http://t.co/vV5BWNxil2&quot; ] } ], warnings: [ &quot;http error: 404 (Not Found) - http://twitter.com/statuses/user_timeline/chrisnewtn.rss&quot; ], query: &quot;http://chrisnewtn.com&quot;, created: &quot;2012-10-12T16:30:57.270Z&quot;, crawled: 9, verified: 9 }</code></pre> <p>The initial crawl will take a while, as each page needs to be visited, checked and cached. Once cached though, it should be pretty snappy.</p> <p><strong><a href='reference/'>See the API Reference</a></strong> for more details.</p> </article> </section> <section> <h1> <a id="Changelog" class="permalink" href="#Changelog">&#9875;</a> Changelog </h1> <article class="embedded markdown"> <ul> <li>v0.0.4 - Changed error handing and surfaced warnings in the graph data. Various other changes.</li> <li>v0.0.3 - Replace JSDOM with Cheerio. Add domain limiter. Various other changes.</li> <li>v0.0.2 - Tweaks to the signature of the graph method. Various internal changes.</li> <li>v0.0.1 - First viable version of</li> </ul> </article> </section> </section> <footer> <div> <p> <em>by</em> <a href="http://dharmafly.com">Dharmafly</a> </p> </div> </footer> <script> satya.narrowScreen = window.screen.width < 480; satya.isltIE10 = false; satya.relative_path = './'; satya.noConflict = 'true' === 'true' || false; </script> <!--[if lt IE 10]> <script> satya.isltIE10 = true; </script> <![endif]--> <!--[if IE 9]> <script src="/javascript/IEshims.js"></script> <![endif]--> <script src="//ajax.googleapis.com/ajax/libs/jquery/1.8.1/jquery.min.js"></script> <script>window.jQuery || document.write('<script src="/javascript/jquery-1.8.1.min.js"><\/script>')</script> <script src="/javascript/main.js"></script> <script src="/javascript/hijs.js"></script> <script src="/javascript/demo.js?v2"></script> <script src="https://github.com/downloads/dharmafly/pablo/pablo.min.js"></script> <script> var _gaq = _gaq || []; _gaq.push(['_setAccount', 'UA-34978047-3']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); }()); </script> </body> </html>