UNPKG

phpjs

Version:

php.js offers community built php functions in javascript

133 lines (104 loc) 5.27 kB
<!-- Generated by Rakefile:build --> <strong> Joris van der Wel </strong> on 2009-09-29 16:06:36 <br /> heh :) Well, if a high surrogate is found, the i++; is just there so we do not loop over the low surrogate the next time. It then goes all the way to <pre><code>if (code &gt;= 65536) { // 4 byte</code></pre> to turn it into utf-8 That just me accounting for the remote possibility the specification changes (aka charCodeAt returning something bigger then 65535) Funny thing is, I actually wrote my own rawurlencode function before finding this one and it was nearly identical. <hr /> <strong> <a href="http://brett-zamir.me" rel="nofollow">Brett Zamir</a> </strong> on 2009-09-10 04:12:09 <br /> @Joris: Good catch about the non-BMP code points; ironic you caught me making the mistake, since I was the one who edited the article you cited for the correction to point this problem out! :) That's what I get for adapting someone else's pattern without thinking... Anyways, your addition is good, except that it should not assign to &quot;code&quot; but instead to &quot;ret&quot; and then do a &quot;continue&quot; after the &quot;i++&quot; or ensure we are in a continuous else/else-if block (I chose the latter). Also, thanks for the catch on the hex needing two chars min... Fixed in git... <hr /> <strong> Joris </strong> on 2009-09-09 15:15:51 <br /> This function does not work properly for 4 byte unicode characters. Browsers use UTF-16 for strings. That means any unicode character above 65536 is split up into two surrogates values. So &quot;code &gt;= 65536&quot; is NEVER true. Oh and PHP always makes sure a percentage value is composed of two hex numbers. Here is a version that does urlencode as if the string were really UTF-8: <pre><code> var hexStr = function (dec) { return '%' + (dec &lt; 16 ? '0' : '') + dec.toString(16).toUpperCase(); }; var ret = '', unreserved = /[\w.~-]/; // A-Za-z0-9_.~- str = (str+'').toString(); for (var i = 0, dl = str.length; i &lt; dl; i++) { var ch = str.charAt(i); if (unreserved.test(ch)) { ret += ch; } else { var code = str.charCodeAt(i); if (0xD800 &lt;= code &amp;&amp; code &lt;= 0xDBFF) // High surrogate (could change last hex to 0xDB7F to treat high private surrogates as single characters); https://developer.mozilla.org/index.php?title=en/Core_JavaScript_1.5_Reference/Global_Objects/String/charCodeAt&amp;revision=39 { code = ((code - 0xD800) * 0x400) + (str.charCodeAt(i+1) - 0xDC00) + 0x10000; i++; // skip the next one } // We never come across a low surrogate because we skip them // Reserved assumed to be in UTF-8, as in PHP if (code &lt; 128) { // 1 byte ret += hexStr(code); } else if (code &gt;= 128 &amp;&amp; code &lt; 2048) { // 2 bytes ret += hexStr((code &gt;&gt; 6) | 0xC0); ret += hexStr((code &amp; 0x3F) | 0x80); } else if (code &gt;= 2048 &amp;&amp; code &lt; 65536) { // 3 bytes ret += hexStr((code &gt;&gt; 12) | 0xE0); ret += hexStr(((code &gt;&gt; 6) &amp; 0x3F) | 0x80); ret += hexStr((code &amp; 0x3F) | 0x80); } else if (code &gt;= 65536) { // 4 bytes ret += hexStr((code &gt;&gt; 18) | 0xF0); ret += hexStr(((code &gt;&gt; 12) &amp; 0x3F) | 0x80); ret += hexStr(((code &gt;&gt; 6) &amp; 0x3F) | 0x80); ret += hexStr((code &amp; 0x3F) | 0x80); } } } return ret; </code></pre> Gr. Joris <hr /> <strong> <a href="http://bahai-library.com" rel="nofollow">Brett Zamir</a> </strong> on 2009-06-02 02:46:00 <br /> Even encodeURIComponent differs. See http://www.devpro.it/examples/php_js_escaping.php <hr /> <strong> <a href="http://www.webfaktory.info/" rel="nofollow">Kankrelune</a> </strong> on 2009-06-02 00:01:39 <br /> it's not exactly the same chars list in escape and rawurlencode... .. . The escape and unescape functions do not work properly for non-ASCII characters and have been deprecated. In JavaScript 1.5 and later, use encodeURI or encodeURIComponent... .. . ;o) @ tchaOo° <hr /> <strong> Me </strong> on 2009-06-01 23:15:34 <br /> Isn't this simpler and achieving the same result: <pre><code>escape(str);</code></pre> <hr /> <strong> <a href="http://bahai-library.com" rel="nofollow">Brett Zamir</a> </strong> on 2009-04-21 08:43:35 <br /> Good catch! I'm not sure how that happened, but it is now fixed in SVN. I've actually been meaning to review these functions, as I'm not 100% sure now that the recent changes to the histogram have all been correct, at least for all functions... <hr /> <strong> Michael Grier </strong> on 2009-04-21 06:09:56 <br /> Not encoding spaces is not the behavior of rawurlencode or urlencode, for that matter. urlencode and rawurlencode both encode anything that is not &quot;A to Z&quot;, &quot;a to z&quot;, &quot;0 to 9&quot;, &quot;-&quot;, &quot;_&quot; or &quot;.&quot; ... the only difference between them is how spaces are encoded... urlencode encodes spaces as &quot;+&quot; and rawurlencode encodes spaces as &quot;%20&quot;. <hr />