HackerOne users: Testing against this community violates our program's Terms of Service and will result in your bounty being denied.

I wrote the best text formatting library ever, can we make something out of it?

Hello Vanillians,

I'm the author of a text formatting library, s9e\TextFormatter. It's a PHP library that handles various markup via plugins. It supports BBCodes, Markdown, a configurable subset of HTML, emoticons and a bunch of other things. It's fast, configurable and extensible. You can read a more complete description in this document.

I'm looking into Vanilla to see whether it would be a good idea to integrate s9e\TextFormatter, either by creating a plugin for it or possibly as an alternative formatter. From a technical standpoint, I've examined a couple of plugins and I haven't seen anything to make me think it would be impossible, or harder than existing plugins. From a community standpoint though, I have no idea. I've known about Vanilla for a long time but I don't use it and I don't follow your community closely enough to know what users want.

And I guess that where you can help. Is there a need for a better BBCode engine? A different Markdown implementation? Are there markup-related features that are not getting implemented because the software isn't there?

The benefits of using s9e\TextFormatter in Vanilla would be:

  • Better performance
  • Cost to add new features greatly reduced
  • Less potential for incompatibilities between markup(?) -- The plugins are designed to play well together, you don't see emoticons in the middle of a link or a code block, etc...
    • That only applies to s9e\TextFormatter plugins

The cost would mostly be in development time I think, although there are some downsides. For instance, other Vanilla plugins could not be applied before s9e\TextFormatter because it could break it. They could be applied after I guess.

The library works differently from what you usually see, too. Things like NBBC or PHP-Markdown transform a text into HTML as a single step. That's why you have to execute them for every post on every page. This library separates parsing from rendering. You run the parser on the user input and store the result (which is XML.) When the post is displayed, you run the renderer on the XML and you get the HTML. It's efficient both in processing time and memory used, and you can still configure the output to change its style, locale, etc...

Anyway, that's the quick rundown. If anybody has any questions, feel free. I love to write walls of text.

  • Does it seem like something that the Vanilla community would like to use?
  • Is there a developer who knows Vanilla and would be willing to collaborate? Ideally, I'd like to find someone who'd handle interfacing with Vanilla while I'd handle interfacing with s9e\TextFormatter.
«1

Comments

  • BleistivtBleistivt Moderator
    edited September 2014

    A new formatter is easy to integrate as a plugin (look at the NBBC plugin).

    However, Vanilla doesn't store any kind of pre-rendered text, so if you wanna go that way, your plugin would have to add a new Column to the Comments and Discussions Table and you would have to make sure that the contents are always synchronized.

    I read on your website that the conversion is reversible, so maybe a second column is not even needed.
    For editing or quoting a post, the intermediary XML could be converted "on the fly". This should be possible with the AfterGet event on the CommentModel.

  • LincLinc Detroit Admin

    @Bleistivt said:
    However, Vanilla doesn't store any kind of pre-rendered text

    That's all we store. Currently, Vanilla always renders on the fly.

  • @Linc that's actually what I meant with pre-rendered, as in rendered, before the actual rendering process takes place. :smiley:

    As far as I understand, the speed benefit of @JoshyPHP's formatter comes from the 2-step conversion process, so you would have to store the rendered XML somewhere.

  • LincLinc Detroit Admin

    Eventually, we are going to do that. Our text handling right now is a pain point - trying to hook into a 1-step find-and-replace process isn't fun or a good plan. We're planning to introduce a pipeline for text rendering in the future.

    I'll poke @Todd here in case he wants to add more; that's something being noodled around by him so I don't have a lot of insight into what's being considered beyond that.

  • x00x00 MVP
    edited September 2014

    XML? that makes me doubt it is better performing. Better performing how? First you have to convert into a common format, then covert again into html.

    But if you want to edit you have to convert back.

    Don't get me wrong it has other benefit like the idea of a universal format.

    Also why care about BBCode? It is not a as standard. Markdown is a more of a standard, although not officially yet.

    grep is your friend.

  • @Bleistivt Yup, I looked into the NBBC plugin, the Emotify plugin and a few others. Weirdly enough it seems like every plugin has their own preferred event to hook into. You are correct about the XML being reversible, I didn't mention it because the post was getting pretty long. All the metadata needed to do XML->HTML or XML->Text is contained in the XML so that it can be stored in place of the original text in the database, without changing the column type.

    @Linc Sounds like I posted at the right time then? Please feel free to tag me in an y discussion on the subject of text rendering. While I'm at it, I've been told I should bug you to get write access to the Developers forum. If you could move this discussion there, that would be awesome too. :D

  • So that it can be stored in place of the original text in the database, without changing the column type

    So what you are saying is you are storing the user markup and the halfway solution.

    Yes I think this will not work with many plugins becuase this isn't he expected behaviour or what is expected in body.

    grep is your friend.

  • LincLinc Detroit Admin

    @JoshyPHP Moved you to Developer role. For the future, joining a community and claiming you have created "the best text formatting library ever" is the best possible way to make everyone immediately skeptical :p I'll have a closer look at it when I have a minute.

  • I don’t necessarily hate the idea of XML like markup as such. But I have a general disdain for the whole concept and ideology of XML as extensible, especially the number of years wasted w3c on XML pursuits, making HLML5 about a decade too late, and having to be proposed by other people. Whilst XML got precisely nowhere.

    If it was me I would use a decent serialisation format, like JSON, and store based on MIME.

    grep is your friend.

  • LincLinc Detroit Admin
    edited September 2014

    @x00 said:
    If it was me I would use a decent serialisation format, like JSON, and store based on MIME.

    As an aside, we intend to move the config to use JSON in the future, possibly 2.5.

  • I hope with a prettify. :)

    grep is your friend.

  • LincLinc Detroit Admin

    @x00 said:
    I hope with a prettify. :)

    Yeah, it would be whitespaced appropriately.

  • @x00 When you use a library like NBBC or PHP-Markdown, what they do is that they parse and render at the same time. They make sense of the text then they transform it to HTML, either with a bunch of ad-hoc string replacement with a bit of logic in between, or by constructing an AST and render it to HTML. And you have to do that for every post every time it's displayed. In s9e\TextFormatter, the parsing is only performed once when the text is submitted, you're only left with the rendering phase at the time you want to display the post. You still have to transform the XML but that's much less complex than parsing custom markups like Markdown or emoticons, etc...

    If you want to get some numbers for yourself there's a script in my Git that you can use with your own input. If you're not running PHP 5.6, you'll need to save the script and run it on the branch that corresponds to your version of PHP.

    @Linc said:
    JoshyPHP Moved you to Developer role. For the future, joining a community and claiming you have created "the best text formatting library ever" is the best possible way to make everyone immediately skeptical :p I'll have a closer look at it when I have a minute.

    Haha, that's where my plan worked perfectly! B) I fully expect anyone to be skeptical about anything related to software. I don't want to convince anyone, I want to give you enough information that you can see for yourself whether you're interested. As long as the title doesn't turn off the kind of people I'm targeting, I guess it's alright. With that said, if you have any advice as to what kind of arguments people like you (leading or developing for big open source projects) are sensitive to, I will definitely take it into consideration.

  • BleistivtBleistivt Moderator
    edited September 2014

    Since we are talking about formatting, something I always found a little ugly:

    Couldn't this part of the format class
    https://github.com/vanilla/vanilla/blob/master/library/core/class.format.php#L268-L302
    ...be moved into a separate BBCode formatter class?

    @x00 said:
    Yes I think this will not work with many plugins becuase this isn't he expected behaviour or what is expected in body.

    You also wouldn't want the xml tags to appear in search.
    A separate column is probably better.

  • @JoshyPHP

    Sure basically you are caching on entry. The only point to XML it to switch between formats.

    Sure, this is performant in one sense, but it also increases your stored data by more than double, in fact quite a bit more.

    grep is your friend.

  • A seperate column is probably better.

    Agreed.

    grep is your friend.

  • JoshyPHPJoshyPHP New
    edited September 2014

    @x00 Sure, a lot of people hate XML for whatever reasons. As far as I'm concerned, I couldn't dream of a better format for storing parsed text and here's why. This is all the code you need to transform the XML back to text:

    return htmlspecialchars_decode(strip_tags($xml), ENT_QUOTES);
    

    If the XML is loaded in a DOM, all you need is to read its textContent to get the same result. If I die, every copy of the library disappears and XML is outlawed by a planet-wide decree, any programmer would only need one look at the data to understand how to get the original text back.

    As for the increased size due to metadata, I invite you to run your own experiment on actual data and see that this does not double. Here's a gist containing this post as text, XML, and HTML.

    @Bleistivt If you want to add a column to your database schema, I recommend you do it for search. Because if you index the markup in the original text you still index BBCode names, the names of HTML elements, etc... Better clean up the text, remove all of its markup and index that instead. Incidentally, I have a method that can help with that.

  • That there may be a native abstract function that happen to suit your particular implementation of XML isn't itself a defence of XML.

    In fact that is "XML like", extensibility is actually irrelevant to that, schema is unimportant to stripping tags.

    grep is your friend.

  • Good thing I'm not here to defend XML then. I needed a storage format that can be reversed easily and parsed quickly. XML did both (and more.) If another format had been better suited to the task, I would have used that instead.

  • @JoshyPHP good thing I get that and I'm pulling your leg. :p

    grep is your friend.

Sign In or Register to comment.