Fork me on GitHub
Current releases are 2.1.4 (30 Oct 2014) and 2.0.18.13 (5 Aug 2014)

Ready for 2.1? Find out if your plugins are compatible. 2.0 will no longer be updated after Dec 2014.

Building a vBulletin 3.x to Vanilla 2 importer

LincLinc Community InstigatorDetroit Vanilla Staff
edited December 2009 in Vanilla 2.0 Help
I'm taking a crack at the foundation of a vBulletin to Vanilla importer this week. I'm going to use this discussion as a place to throw ideas and questions out as I move along.

About me: I'd call myself an advanced (not expert) PHP programmer, mostly because I've never had the opportunity to work closely with another PHP programmer. I only know as much as I've read (snore) and figured out as I've gone along my way the last few years. I understand and have written OOP and MVC-based code, but only have gotten that far in the past year or so. I'm operating at the boundaries of my comfort zone, which is a good place to be I suppose :)

I say this mostly as way of disclaimer up front that whatever I do is going to be going through progressive improvements and will be a learning experience for me. I'm not going to poop out a polished importer in 10 days or something.

I have Vanilla 2 running (finally) and have sorta gotten my head around the database schema. I know vBulletin's data structure pretty well as I've been hacking at it for over three years (including writing a custom vBulletin/Wordpress bridge). I'm at the base of a big learning curve on the Garden/Vanilla code with only a modest and incomplete amount of documentation available so far. On the other hand I've wrestled with PDFlib in PHP so I'm not a stranger to this situation, haha. Any suggestions and help will be much appreciated.

[more in a minute - stupid max post size]

Sr Developer at Vanilla Forums, humble servant of Lord Brackos, & pro smatcher. [GitHub, Twitter]

DaniPascal
«13

Comments

  • LincLinc Community Instigator Detroit Vanilla Staff
    edited September 2009
    OK, to business.

    I'm breaking the importer up into parts by data type (users, categories, discussions, etc) and making it as modular as possible. I have to make some decisions about which data to bring along and which to cull. For now, I'm focusing on the core of vBulletin. Stuff like Albums I'm just not interested in worrying about presently. I can always add more later. My priorities are:

    • Usergroups / Users

    • Categories / Discussions

    • Private messages -> Conversations (very delicate / awkward, I expect)

    • Subscriptions -> Bookmarks

    Beyond these, I'd also like to move the data for polls, smilies, tags, wall posts, and stats. However, I'm not really sure what format they will take on the Vanilla side of the equation since the functionality for those doesn't exist afaik. It quickly gets into "however I write the importer is going to lock in how any future poll/tagging/etc plugins work" territory.

    Starting with users, I run into this immediately. vBulletin as lots of "extra" userfields like Location, Interests, AIM, etc. and the thing is extensible which means just about everyone has custom fields. I can either start inserting a lot of columns into Garden's user table, or go the Wordpress "usermeta" table route. Given that every import is going to be different, I'm putting my money on the Wordpress method and creating a GDN_UserMeta table in the same style. All the "extra" stuff is going to go there in anticipation of being unlocked by some simple profile addons.

    [/wall-o-text]

    Feedback and smacks upside the head welcome. I'm well-stocked on tea and time; hopefully I will have more to say soon.

    Sr Developer at Vanilla Forums, humble servant of Lord Brackos, & pro smatcher. [GitHub, Twitter]

  • SS ✭✭
    edited September 2009
    > GDN_UserMeta

    In Vanilla 1 all custom fields are stored in User Table as serialized array. I prefer this method instead of new table data.
  • LincLinc Community Instigator Detroit Vanilla Staff
    Unless I'm not reading the documentation correctly, Garden's models depend on single values going into each database field. All the validation, etc. depends on the table's structure. If I start grabbing all the values and serializing them into an ever-growing array, it means you wouldn't be able to use a lot of Garden's functionality when building addons (or creating the importer) which seems self-defeating.

    Am I mistaken? Do you have a reason you'd prefer a serialized array?

    Sr Developer at Vanilla Forums, humble servant of Lord Brackos, & pro smatcher. [GitHub, Twitter]

  • MarkMark Vanilla Staff
    First of all, thanks for making a go of this!

    Second, take a look at the Vanilla 1 to Vanilla 2 import script. It's kind of broken from a front-end perspective, but it should give you a good idea of how we're tackling that problem for Vanilla. It is in /applications/garden/controllers/import.php

    Third, you can add columns to the user table if you want, but serializing the data is a decent option as well that requires far less effort. There is an "Attributes" field on the user table that takes all of that serialized data. You just need to pull the data already in that table out, unserialize it to an array using Format::Unserialize($Data), then add your values to it, then serialize it again with Format::Serialize($Data), and save it back into the Attributes field for that user.

  • @Lincoln, have a look at http://wiki.github.com/aeden/activewarehouse-etl/documentation as it did wonders for me. It's a little tricky to get working off the bat -- errors are arcane -- but does the job.
  • MarkMark Vanilla Staff
    One big problem you're going to run into is password encryption differences. Vanilla does encryption one way, and vBulletin does it another way (I don't know how vBulletin does it, actually, but I'd be very interested to know). This means that when Vanilla tries to authenticate users with the vBulletin encrypted password, it's going to assume the Vanilla encryption was used and fail to authenticate.

    I have an idea for how to handle this, but it requires understanding how vBulletin encrypts those passwords. So, please pass along that info if you have it, and @Todd or I will help out with that part of the process.
  • MarkMark Vanilla Staff
    @yann - that's really cool! I'm going to have to look into that...
  • @mark - It is but really could do with a nice example. You can do transforms of the type:
    copy :realname, :LastName
    transform (:LastName) { |n,v,r| v.split(/[\s]+/)[1] }
    which does assume that the real name has only two parts separated by some space. Mostly, that will be true. I think you could do something with passwords as well. All that matters is that you can decrypt them and re-encrypt them via ruby.
  • LincLinc Community Instigator Detroit Vanilla Staff
    edited December 2009
    @Mark, yup, I was thinking about that encryption problem.

    vBulletin stores passwords in the database thusly: md5(md5(password) . salt) //EDIT: corrected Dec 8 '09//

    and in cookies thusly: md5(ENCRYPTED_PASS . COOKIE_SALT)

    where ENCRYPTED_PASS is the value stored in the database, and COOKIE_SALT is a unique value to each vBulletin installation (defined in /path/to/vbulletin/includes/functions.php).

    My first thought was it would be necessary to bring along the encrypted password in a new field and add a plugin that transfers the password at each users' first login after the migration.

    //edit: I forgot the COOKIE_SALT is actually just the license number (mine is 7 numbers followed by a single letter). It's in almost every PHP file as a comment on the fourth line in this format:

    || # vBulletin 3.8.1 Patch Level 1 - Licence Number 1111111a

    Sr Developer at Vanilla Forums, humble servant of Lord Brackos, & pro smatcher. [GitHub, Twitter]

  • LincLinc Community Instigator Detroit Vanilla Staff
    edited September 2009
    @Mark thanks for pointing out the Vanilla importer. Hadn't thought of that (dur).

    You really think the edit process being unserialize->change->serialize is easier than just having a UserMeta table and editing values directly? The developer in me likes that serializing packs it all in one place very efficiently, but the designer in me hates out-of-control arrays that make the database more confusing and adds a step to every edit.

    //edit: I guess this is how I feel about my experience with both so far: vBulletin uses serialized arrays EVERYWHERE and it's a pain (especially for novice devs) to get basic info out of it. You have to unserialize it just to figure out if it's even the info you were looking for. Wordpress uses a key-based UserMeta table and it's always felt super-simple to get info out of it. You can open the table in phpmyadmin and ta-da! it all becomes clear.

    Sr Developer at Vanilla Forums, humble servant of Lord Brackos, & pro smatcher. [GitHub, Twitter]

  • SS ✭✭
    @Lincoln

    > Am I mistaken?

    No, You are correct.

    > Do you have a reason you'd prefer a serialized array?

    Yes, I have two.

    1. Allow ANY "field-value" pair (auto garden's validation model of your fields will be restricted your table structure/columns).

    2. I'm not planning to use this fields in search, sort, etc., only on "profile" page. Faster. No waste data tables (columns).
  • LincLinc Community Instigator Detroit Vanilla Staff
    edited September 2009
    OK, but let's say you have a field for Steam (Valve's gaming platform) names in profiles, and now you want to make a Steam directory of your members. Locking that up in a serialized array makes that query basically impossible, doesn't it? (and I hadn't thought of that until you mentioned using the fields elsewhere - this is a major red flag to me as I already have this sort of thing implemented on my boards)

    Also, I'm not sure we're all talking about the same thing as far as a UserMeta table is concerned. Are you familiar with Wordpress's model? It has 4 columns: unique ID, user ID, metakey, and metavalue. To store something like a Steam name you'd set the user ID, a metakey of 'steam' and then a metavalue of their Steam name. This model prevents wasted columns, and it doesn't create any more duplicated key text than a serialized array would.

    Sr Developer at Vanilla Forums, humble servant of Lord Brackos, & pro smatcher. [GitHub, Twitter]

  • SS ✭✭
    edited September 2009
    Yes, serialized data doesn't allow to operate with it (structured queries, etc,) But like I said that it doesn't need me now. If you need - you are Master of youself.

    >It has 4 columns

    Is this UserMeta concept what you mentioned above?

    If I understood correctly, that will be example.

    Example data for three users (1,2,3).
    UID UserID Key Value
    # 1 steam st1
    # 2 steam st2
    # 3 steam st3


    Is not "steam" duplicated key here?

    IMH. The game is not worth the candle.
  • LincLinc Community Instigator Detroit Vanilla Staff
    edited September 2009
    Yes, the metakey is duplicated, but it'd be duplicated in the serialized array too wouldn't it? :) It's just stuck in the middle of the text field instead of separated into its own field. You could abstract it out to a second table of metakey values, but again that's substituting only marginally better efficiency for more complexity.

    I guess my point is that I'd rather err on the side of the format that will enable the larger number of use cases. "I don't have a use for it yet" doesn't seem like a good rationale for picking a format that others will likely need to use.

    Sr Developer at Vanilla Forums, humble servant of Lord Brackos, & pro smatcher. [GitHub, Twitter]

  • SS ✭✭
    edited September 2009
    Yes, the metakey is duplicated, but it'd be duplicated in the serialized array too wouldn't it? :)

    It does. But no need new data tables for it.
  • MarkMark Vanilla Staff
    If you decide you want to add fields instead of taking advantage of the serialized column, adding fields is super easy to do. For example, Vanilla adds some columns to the user table when it is enabled for the first time. Check out /applications/vanilla/settings/structure.php, and these lines specifically:
    // Add extra columns to user table for tracking discussions & comments
    $Construct->Table('User')
    ->Column('CountDiscussions', 'int', 11, FALSE, '0')
    ->Column('CountUnreadDiscussions', 'int', 11, FALSE, '0')
    ->Column('CountComments', 'int', 11, FALSE, '0')
    ->Column('CountDrafts', 'int', 11, FALSE, '0')
    ->Column('CountBookmarks', 'int', 11, FALSE, '0')
    ->Set();
    Todd has created a new version of the "Column" method (I think it's called Column2 right now), and it's a lot easier to use than the original one. We're going to be switching over to it soon.
  • LincLinc Community Instigator Detroit Vanilla Staff
    edited September 2009
    @Mark, thanks, I think I am indeed going to create a Wordpress-style UserMeta table to handle these fields. I think it's going to be important that they not be serialized going forward and that we not have to append a new column to the User table for each one.

    I'm also going to be importing Attachments and "Smilies" as I think these are two things that most vBulletin communities would be loathe to lose. I plan to follow up the importer with quick Vanilla addons for parsing emoticons and showing existing Comment Attachments. I want to at least enable switchers to not *lose* these things even if I don't immediately create robust enough addons to add and manage them.

    Attachments present a particularly nasty problem because of the way vBulletin stores them (as extension-less hashes in a directory that can exist outside the web root, or as database blobs). I'm going to require that they be moved out of the database before the migration (this is an option in vBulletin) and move and rename every attachment during the migration to overcome this (to a format of /path/to/vanilla/foldername/year/month/actual-filename).

    Lastly, I'm going to create a permanent ImportID column for discussions, users, comments, and attachments to enable a redirect addon. Breaking links would be fatal to my site.

    I've decided to leave polls, tags, and stats outside the scope of my first draft as I really want to hone in on the basics to get something working before I start adding things of more dubious cost:benefit ratios.

    ...and if anyone actually read all that and has suggestions I'm all ears. :)

    tl;dr: UserMeta table = yes, importing attachments (ugh) and emoticons, ignoring everything else non-core for now.

    Sr Developer at Vanilla Forums, humble servant of Lord Brackos, & pro smatcher. [GitHub, Twitter]

  • LincLinc Community Instigator Detroit Vanilla Staff
    I'm starting to make serious headway on the learning curve.

    I've managed to import roles, users, and user/role relationships so far.

    Sr Developer at Vanilla Forums, humble servant of Lord Brackos, & pro smatcher. [GitHub, Twitter]

  • LincLinc Community Instigator Detroit Vanilla Staff
    edited October 2009
    I've got user profile/meta data (including custom fields created in vBulletin) and avatars imported now too. Aiming to have a 0.1 version done by the end of the week.

    Sr Developer at Vanilla Forums, humble servant of Lord Brackos, & pro smatcher. [GitHub, Twitter]

  • LincLinc Community Instigator Detroit Vanilla Staff
    edited October 2009
    Also, I'm pretty sure I've figured out how to import not only vBulletin 3.x attachments, but any attachments added in 2.x as well. This will be important for boards that have been around since 2003 or earlier (like mine). (ps: what a mess)

    Sr Developer at Vanilla Forums, humble servant of Lord Brackos, & pro smatcher. [GitHub, Twitter]

  • ToddTodd Chief Product Officer Vanilla Staff
    @Lincoln this is seriously awesome stuff. I think the user meta table would also make a good general purpose plugin.
  • Very interested in this. Thanks! (And good luck!)
  • LincLinc Community Instigator Detroit Vanilla Staff
    Thanks @Todd :)

    As a note to other import makers (especially ones importing smaller boards), I'm running into timeouts quite a bit because of the size of my board. I can't do single-query imports for things like private messages because their format is so different; I need to process each one with PHP. Doing this for 68,800 private messages while trying to thread them into conversations is a problem.

    If you're doing PHP processing during an import, be sure to chunk it up to make sure that larger boards can use your importer too. There's a BIG difference between importing a 200-user, 5K-discussion board and a 20K-user, 100K-discussion board!

    I've taken to creating "batches". I set a limit to how many it will process at a time, store the ID# it left off with in the config file, and then keep repeating the step (incrementing batch #s so the user knows it's still working), each time starting where it left off, until I run out of records to process. Then it goes to the next step.

    Sr Developer at Vanilla Forums, humble servant of Lord Brackos, & pro smatcher. [GitHub, Twitter]

  • ToddTodd Chief Product Officer Vanilla Staff
    Yeah, that's why the Vanilla 1 import chunks things. Another note is if you put a series of inserts into one insert statement then it is waaay faster.
  • LincLinc Community Instigator Detroit Vanilla Staff
    I haven't abandoned this; I've had to reorder my priorities twice since my last post. I'm probably 2 weeks away from getting back to this. Mah bad :(

    Sr Developer at Vanilla Forums, humble servant of Lord Brackos, & pro smatcher. [GitHub, Twitter]

  • LincLinc Community Instigator Detroit Vanilla Staff
    edited November 2009
    Or 4 weeks. Close enough? (I was redesigning Icrontic.com and moving it from 3 Wordpress blogs to Wordpress MU).

    Back now.

    Sr Developer at Vanilla Forums, humble servant of Lord Brackos, & pro smatcher. [GitHub, Twitter]

  • LincLinc Community Instigator Detroit Vanilla Staff
    @Todd That was an excellent tip on the multiple inserts, by the way. I ran into that big time migrating Wordpress. I'll make sure to build up my queries before sending them; thanks.

    Sr Developer at Vanilla Forums, humble servant of Lord Brackos, & pro smatcher. [GitHub, Twitter]

  • LincLinc Community Instigator Detroit Vanilla Staff
    edited December 2009
    Avatars have been imported as Pictures, and private messages have been imported into Conversations.

    A) vBulletin doesn't thread private messages (well, not until 3.8 or so anyway). The importer threads them with title/contributor matching instead. That creates some significant overhead so it imports in batches of 200 (and keeps the user updated where it is in the process).

    B) [kvetch] vBulletin creates invalid serialized arrays of who private messages are To (by including members' names which have been passed through htmlentities - semicolons, yay!) and instead of telling me that unserialize was failing, the error backtrace told me a foreach loop was failing AND pointed me at the wrong loop. It took over 7 hours of failed imports to untangle this. >:-| Hate vBulletin. [/kvetch]

    What's left: threads/posts and attachments. Then I need to write the password converter so I can actually log into the mess and make sure it works in Vanilla (I'm just eyeballing the database to confirm it's working for now).

    Sr Developer at Vanilla Forums, humble servant of Lord Brackos, & pro smatcher. [GitHub, Twitter]

«13
Sign In or Register to comment.