UNPKG

phpjs

Version:

php.js offers community built php functions in javascript

58 lines (39 loc) 3.27 kB
<!-- Generated by Rakefile:build --> <strong> <a href="http://brett-zamir.me" rel="nofollow">Brett Zamir</a> </strong> on 2010-02-13 08:29:31 <br /> @Bug?: Yes, you are correct. Thanks for the feedback. I have now fixed it in Git: http://github.com/kvz/phpjs/raw/master/functions/strings/str_word_count.js . Note that the new version requires ctype_alpha now (and which I also needed to update now along with a lot of other functions dependent on RegExp.test()), and that function depends on setlocale() because this function should check in a way that potentially supports what other locales consider a word. I also added support for the very rare non-BMP characters, and as per PHP, allowed hyphens in the middle or apostrophes at the middle or end (and everywhere if the charlist includes these). <hr /> <strong> Bug? </strong> on 2010-02-03 12:12:59 <br /> The javascript function returns 5 words for this string: <pre><code> Lorem ipsum dolor asdf asdf asdf </code></pre> And 6 words for this one: <pre><code> Lorem ipsum dolor asdf asdf asdf. </code></pre> The PHP function returns 6 for both. Cheers, Chris <hr /> <strong> <a href="http://bahai-library.com" rel="nofollow">Brett Zamir</a> </strong> on 2009-06-18 08:03:12 <br /> I believe that to do this correctly for not only Chinese but other languages, we'll need to take a good look at the source code, specifically for PHP 6, since that is where full Unicode support is being added. We might be able to use XRegExp (see http://stevenlevithan.com/regex/xregexp/ ) and its Unicode plug-in (at http://blog.stevenlevithan.com/archives/xregexp-unicode-plugin ) for our preg_ functions and then make str_word_count() dependent on it, though that won't really help determine what a &quot;word&quot; is (since, in Chinese, a character is technically only a graphical morpheme, and not necessarily also an independent word), though at least it will tell us definitively what a &quot;letter&quot; is. Of course, we can just go back to the source to see how PHP interprets a &quot;word&quot; since we're aiming for that anyways, but again, that will take some work, especially if we wish to make it work for other languages as well as Chinese. I'm pretty busy for now, but feel free to take a shot at it if you like. FYI, as you can see by your Chinese characters getting mangled, the site is having some problems at the moment with Unicode characters, so if you need to refer to any in the future, maybe you could try using entities or the JavaScript Unicode escape sequences instead (e.g., \u0020). But I think the issue is beyond just Chinese (though Chinese in particular also raises the particular need for also handling characters beyond the Basic Multilingual Plane (BMP) since some Chinese characters fall beyond this plane--in JavaScript, such characters must be represented by two Unicode characters called surrogates (characters which are not used outside of such pairs), so we can't rely on the length of the string--see http://phpjs.org/functions/strlen for a solution). <hr /> <strong> Chris </strong> on 2009-06-13 01:18:53 <br /> This function doesn't work quite like PHP's in that it fails to count each single Chinese character as an entire word, eg.: ?? hello ? Is four words... <hr />