Please upgrade here. These earlier versions are no longer being updated and have security issues.
HackerOne users: Testing against this community violates our program's Terms of Service and will result in your bounty being denied.

Support for storage and retrieval of all Unicode characters in Vanilla?*

mattmatt ✭✭
edited August 2012 in Vanilla 2.0 - 2.8

Emoji (絵文字) is the Japanese term for the picture characters or emoticons used in Japanese electronic messages and webpages. They are emoticons, but they are a ratified or standardised set. They've been used in Japan for years and with the likes of iOS their use is on the increase.

The core emoji set as of Unicode 6.0 consists of 722 characters, of which 114 characters map to sequences of one or more characters in the pre-6.0 Unicode standard, and the remaining 608 characters map to sequences of one or more characters introduced in Unicode 6.0.[4] There is no block specifically set aside for emoji – the new symbols were encoded in seven different blocks (some newly created), and there exists a Unicode data file called EmojiSources.txt that includes mappings to and from the Japanese vendors' legacy character sets.

I can't post any in here because Vanilla doesn't get along with the range of Unicode characters required to display them.

Github lets people use them through emoticon shortcodes, eg. :wink: but I'm not sure that's the best way.

So, that's my request. That we can be given the power.

GitHub issue: https://github.com/vanillaforums/Garden/issues/1405

References:
http://en.wikipedia.org/wiki/Emoji
http://www.emoji-cheat-sheet.com

«1

Comments

  • Note I would expect that issue to be closed soon, because the core team have better things tot do. This could be done as a plugin.

    Short code and image substitution would be the generally preferred way due to lack of universal support and lack of standardisation. I would look at exiting smilies plugin for guidance ( I think the js loaded ones would suit best)

    You could use dynamically loaded apple emoji though
    http://code.kwint.in/emoji/

    that detects the character rather than using a superstition code. I would simply incorporate that in a plugin, or simply put the source code in the js/custom.js of your theme.

    grep is your friend.

  • x00x00 MVP
    edited July 2012

    it isn't vanilla that doesn't get along with the characters however, you could set it to utf16 and it still wouldn't display most of the characters correctly in most browser. The support in iphone is no different from other smilies, it just simply knows what to do.

    Client browsing has too broad range of function to be sidetracked by something superficial as this. It is up to client code, or substitution to define how it works in that case.

    grep is your friend.

  • mattmatt ✭✭

    My issue is that Vanilla is chewing the characters up and not getting them all the way to the final page.

    Safari on Mac OS X 10.7 (and 10.8), and of course on iOS, supports Emoji display. So browsers are already becoming aware enough that this sort of text encoding support needs to be there.

    Anyway, thanks for your opinion. Emoji is pretty big in Japan and the far east. Remember, there is a world outside of our own. ;)

  • ToddTodd Chief Product Officer Vanilla Staff

    You know I can't seem to figure out exactly what emoji are from those links. The wikipedia page is mangled on my browser and the emoji cheetsheet are just pictures and english text.

    Maybe the emoji characters are just not available on western browsers?

  • ToddTodd Chief Product Officer Vanilla Staff

    @x00 said:

    You could use dynamically loaded apple emoji though
    http://code.kwint.in/emoji/

    Do you know how these images are licensed? I can't seem to find that on the page.

  • they are utf16, not exclusively utf8, regardless of character set it doesn't mean they know how to render the characters.

    I think you answered you own question it isn't Vanilla's fault.

    grep is your friend.

  • x00x00 MVP
    edited August 2012

    Todd said:

    @x00 said:

    You could use dynamically loaded apple emoji though
    http://code.kwint.in/emoji/

    Do you know how these images are licensed? I can't seem to find that on the page.

    No idea this one has a GPL licence:
    http://www.iamcal.com/emoji-in-web-apps/

    grep is your friend.

  • 422422 Developer MVP

    @matt i think you mean to click awesome or insightful on @x00's posts, not the lol

    There was an error rendering this rich post.

  • mattmatt ✭✭
    edited August 2012

    Great to see there's some discussion about this :)

    @Todd:
    Emoji are "picture letters" where each character or letter is represented by an image - think of them as single character emoticons. The tables on the wiki page should be blank if you do not have the capability of viewing Emoji characters, but it should not be mangled any more than any other Wikipedia page.

    The images used on the page @x00 linked are flattened versions of those in the "Apple Color Emoji" typeface. Not sure about it's licence, but you can read more about the TTF, introduced with OS 10.7 Lion a year ago, here: http://typographica.org/typeface-reviews/apple-color-emoji/ and the first comment contains a link to a page that shows the whole set and explains how Windows user can install a free TTF called "Symbola" to get (limited) Emoji support, whilst they wait for Windows to gain native support.

    @x00:
    That GPL link is great, and contains some great reasoning why these should be supported. Interestingly it was written a few years ago by a real forward thinker - Cal Henderson - one of the founders of Flickr. Interestingly, as of right now, Flickr does not support storage and retrieval of Emoji characters as far as I can see.

    As for the Emoji problem, it is of two parts: (1) browsers, or perhaps more accurately operating systems, need to support the display of the emoji range of characters. See above re: font installation. Of course, this is nothing to do with Vanilla. (2) software that deals with text needs to support proper storage and retrieval of the characters used to represent Emoji. Right now Vanilla does not do this, as you can see if you try to make a post containing Emoji. This is what I'm trying to raise as an issue.

    Even Apple are not completely up to speed with Emoji; for example their OSes support Emoji display in Safari and most standard text controls, but some software like their Pages word processor does not yet support storage and retrieval of the Emoji range of characters.

    @422:
    I have marked a number of @x00's posts as insightful or awesome, but the particular post you refer to - well - I just thought it was funny

  • mattmatt ✭✭
    edited August 2012

    In hindsight, the topic title should really be something more like:
    "Support for storage and retrieval of all Unicode characters in Vanilla?"

    (edit: thanks to the mods for making this so)

    As this issue is not specific to Emoji, but Emoji is a good way for Westerners to see the issue, or not as the case may be ;)

    I'm sure the guys doing the Chinese fork of Vanilla have already dealt with some of the problems involved in supporting storage and retrieval of the full Unicode character set.

  • ToddTodd Chief Product Officer Vanilla Staff

    We store everything in utf8 which can store all unicode characters as far as I know.

  • mattmatt ✭✭
    edited August 2012

    @Todd the db charset needs to be utf8mb4, I think?

    If it is, then the issue is somewhere else in the pipeline from storage to display.

    If you try posting some Emoji characters you will see for yourself that they disappear. By that I mean the characters are no longer in the post. In fact, if you type English followed by Emoji and then English again, the post is truncated at the position the Emoji characters are at.

  • This a a big change to rush into, I'm not saying it couldn't be in the future but the consequences need to be fully understood.

    You yourself can set character set and collation.

    There are also often issues with switching to multi-byte characters that may need to be sorted out.

    grep is your friend.

  • ToddTodd Chief Product Officer Vanilla Staff

    Completely agree with @x00 here. This is not a change we are going to make lightly nor anytime soon. The fact of the matter is that encoding support in mysql and php is just not that robust. And switching encodings often leads to mangled results.

    We could most likely make this change on the forums we host, but the skill level of most of the open source community just isn't there to handle a change like this.

  • mattmatt ✭✭
    edited August 2012

    Of course, I'm not pushing for it short-term but rather raising it as an issue for the medium- or long-term.

    The guys doing the Chinese fork of Vanilla will have a lot of knowledge that will be useful when the time comes. Maybe @chuck911 can contribute to this thread?

  • chuck911chuck911
    edited August 2012

    @matt it's really funny !
    the github version looks easy to be integrated with vanilla , I will try to write a plugin

    the Unicode emoji may not supported by some system or browser, so converting to real image may be a better solution

  • mattmatt ✭✭

    @chuck911 any thoughts on adding support for full unicode to vanilla? like you guys seem to have done with your Chinese adaption.

  • A bit off topic but you guys seem to be the unicode experts.

    what preg_match statement do you use for unicode and can you give an example. I would like to make one of my plugins work with unicode and I really can't experiment on my computer. Disregard - if you think this too off - topic.

    I may not provide the completed solution you might desire, but I do try to provide honest suggestions to help you solve your issue.

  • @matt integrating Emoji for PHP to vanilla will get the thing done

  • @matt the plugin is done!
    it's here http://vanillaforums.org/addon/emoji-plugin-0.1a

    I just test it on my ipod touch. if you have a android phone, you can help to test it, it should work

Sign In or Register to comment.