I'm taking a crack at the foundation of a vBulletin to Vanilla importer this week. I'm going to use this discussion as a place to throw ideas and questions out as I move along.
About me: I'd call myself an advanced (not expert) PHP programmer, mostly because I've never had the opportunity to work closely with another PHP programmer. I only know as much as I've read (snore) and figured out as I've gone along my way the last few years. I understand and have written OOP and MVC-based code, but only have gotten that far in the past year or so. I'm operating at the boundaries of my comfort zone, which is a good place to be I suppose :)
I say this mostly as way of disclaimer up front that whatever I do is going to be going through progressive improvements and will be a learning experience for me. I'm not going to poop out a polished importer in 10 days or something.
I have Vanilla 2 running (finally) and have sorta gotten my head around the database schema. I know vBulletin's data structure pretty well as I've been hacking at it for over three years (including writing a custom vBulletin/Wordpress bridge). I'm at the base of a big learning curve on the Garden/Vanilla code with only a modest and incomplete amount of documentation available so far. On the other hand I've wrestled with PDFlib in PHP so I'm not a stranger to this situation, haha. Any suggestions and help will be much appreciated.
[more in a minute - stupid max post size]
OK, to business.
I'm breaking the importer up into parts by data type (users, categories, discussions, etc) and making it as modular as possible. I have to make some decisions about which data to bring along and which to cull. For now, I'm focusing on the core of vBulletin. Stuff like Albums I'm just not interested in worrying about presently. I can always add more later. My priorities are:
Unless I'm not reading the documentation correctly, Garden's models depend on single values going into each database field. All the validation, etc. depends on the table's structure. If I start grabbing all the values and serializing them into an ever-growing array, it means you wouldn't be able to use a lot of Garden's functionality when building addons (or creating the importer) which seems self-defeating.
Am I mistaken? Do you have a reason you'd prefer a serialized array?
First of all, thanks for making a go of this!
Second, take a look at the Vanilla 1 to Vanilla 2 import script. It's kind of broken from a front-end perspective, but it should give you a good idea of how we're tackling that problem for Vanilla. It is in /applications/garden/controllers/import.php
Third, you can add columns to the user table if you want, but serializing the data is a decent option as well that requires far less effort. There is an "Attributes" field on the user table that takes all of that serialized data. You just need to pull the data already in that table out, unserialize it to an array using Format::Unserialize($Data), then add your values to it, then serialize it again with Format::Serialize($Data), and save it back into the Attributes field for that user.
One big problem you're going to run into is password encryption differences. Vanilla does encryption one way, and vBulletin does it another way (I don't know how vBulletin does it, actually, but I'd be very interested to know). This means that when Vanilla tries to authenticate users with the vBulletin encrypted password, it's going to assume the Vanilla encryption was used and fail to authenticate.
I have an idea for how to handle this, but it requires understanding how vBulletin encrypts those passwords. So, please pass along that info if you have it, and @Todd or I will help out with that part of the process.
@mark - It is but really could do with a nice example. You can do transforms of the type:
copy :realname, :LastName
transform (:LastName) { |n,v,r| v.split(/[\s]+/)[1] }
which does assume that the real name has only two parts separated by some space. Mostly, that will be true. I think you could do something with passwords as well. All that matters is that you can decrypt them and re-encrypt them via ruby.
@Mark, yup, I was thinking about that encryption problem.
vBulletin stores passwords in the database thusly: md5(md5(password) . salt) //EDIT: corrected Dec 8 '09//
and in cookies thusly: md5(ENCRYPTED_PASS . COOKIE_SALT)
where ENCRYPTED_PASS is the value stored in the database, and COOKIE_SALT is a unique value to each vBulletin installation (defined in /path/to/vbulletin/includes/functions.php).
My first thought was it would be necessary to bring along the encrypted password in a new field and add a plugin that transfers the password at each users' first login after the migration.
//edit: I forgot the COOKIE_SALT is actually just the license number (mine is 7 numbers followed by a single letter). It's in almost every PHP file as a comment on the fourth line in this format:
|| # vBulletin 3.8.1 Patch Level 1 - Licence Number 1111111a
@Mark thanks for pointing out the Vanilla importer. Hadn't thought of that (dur).
You really think the edit process being unserialize->change->serialize is easier than just having a UserMeta table and editing values directly? The developer in me likes that serializing packs it all in one place very efficiently, but the designer in me hates out-of-control arrays that make the database more confusing and adds a step to every edit.
//edit: I guess this is how I feel about my experience with both so far: vBulletin uses serialized arrays EVERYWHERE and it's a pain (especially for novice devs) to get basic info out of it. You have to unserialize it just to figure out if it's even the info you were looking for. Wordpress uses a key-based UserMeta table and it's always felt super-simple to get info out of it. You can open the table in phpmyadmin and ta-da! it all becomes clear.
@Lincoln
> Am I mistaken?
No, You are correct.
> Do you have a reason you'd prefer a serialized array?
Yes, I have two.
1. Allow ANY "field-value" pair (auto garden's validation model of your fields will be restricted your table structure/columns).
2. I'm not planning to use this fields in search, sort, etc., only on "profile" page. Faster. No waste data tables (columns).
OK, but let's say you have a field for Steam (Valve's gaming platform) names in profiles, and now you want to make a Steam directory of your members. Locking that up in a serialized array makes that query basically impossible, doesn't it? (and I hadn't thought of that until you mentioned using the fields elsewhere - this is a major red flag to me as I already have this sort of thing implemented on my boards)
Also, I'm not sure we're all talking about the same thing as far as a UserMeta table is concerned. Are you familiar with Wordpress's model? It has 4 columns: unique ID, user ID, metakey, and metavalue. To store something like a Steam name you'd set the user ID, a metakey of 'steam' and then a metavalue of their Steam name. This model prevents wasted columns, and it doesn't create any more duplicated key text than a serialized array would.
Yes, serialized data doesn't allow to operate with it (structured queries, etc,) But like I said that it doesn't need me now. If you need - you are Master of youself.
>It has 4 columns
Is this UserMeta concept what you mentioned above?
If I understood correctly, that will be example.
Example data for three users (1,2,3).UID UserID Key Value
# 1 steam st1
# 2 steam st2
# 3 steam st3
Is not "steam" duplicated key here?
IMH. The game is not worth the candle.
Yes, the metakey is duplicated, but it'd be duplicated in the serialized array too wouldn't it? :) It's just stuck in the middle of the text field instead of separated into its own field. You could abstract it out to a second table of metakey values, but again that's substituting only marginally better efficiency for more complexity.
I guess my point is that I'd rather err on the side of the format that will enable the larger number of use cases. "I don't have a use for it yet" doesn't seem like a good rationale for picking a format that others will likely need to use.
If you decide you want to add fields instead of taking advantage of the serialized column, adding fields is super easy to do. For example, Vanilla adds some columns to the user table when it is enabled for the first time. Check out /applications/vanilla/settings/structure.php, and these lines specifically:
// Add extra columns to user table for tracking discussions & comments
$Construct->Table('User')
->Column('CountDiscussions', 'int', 11, FALSE, '0')
->Column('CountUnreadDiscussions', 'int', 11, FALSE, '0')
->Column('CountComments', 'int', 11, FALSE, '0')
->Column('CountDrafts', 'int', 11, FALSE, '0')
->Column('CountBookmarks', 'int', 11, FALSE, '0')
->Set();
@Mark, thanks, I think I am indeed going to create a Wordpress-style UserMeta table to handle these fields. I think it's going to be important that they not be serialized going forward and that we not have to append a new column to the User table for each one.
I'm also going to be importing Attachments and "Smilies" as I think these are two things that most vBulletin communities would be loathe to lose. I plan to follow up the importer with quick Vanilla addons for parsing emoticons and showing existing Comment Attachments. I want to at least enable switchers to not *lose* these things even if I don't immediately create robust enough addons to add and manage them.
Attachments present a particularly nasty problem because of the way vBulletin stores them (as extension-less hashes in a directory that can exist outside the web root, or as database blobs). I'm going to require that they be moved out of the database before the migration (this is an option in vBulletin) and move and rename every attachment during the migration to overcome this (to a format of /path/to/vanilla/foldername/year/month/actual-filename).
Lastly, I'm going to create a permanent ImportID column for discussions, users, comments, and attachments to enable a redirect addon. Breaking links would be fatal to my site.
I've decided to leave polls, tags, and stats outside the scope of my first draft as I really want to hone in on the basics to get something working before I start adding things of more dubious cost:benefit ratios.
...and if anyone actually read all that and has suggestions I'm all ears. :)
tl;dr: UserMeta table = yes, importing attachments (ugh) and emoticons, ignoring everything else non-core for now.
Thanks @Todd :)
As a note to other import makers (especially ones importing smaller boards), I'm running into timeouts quite a bit because of the size of my board. I can't do single-query imports for things like private messages because their format is so different; I need to process each one with PHP. Doing this for 68,800 private messages while trying to thread them into conversations is a problem.
If you're doing PHP processing during an import, be sure to chunk it up to make sure that larger boards can use your importer too. There's a BIG difference between importing a 200-user, 5K-discussion board and a 20K-user, 100K-discussion board!
I've taken to creating "batches". I set a limit to how many it will process at a time, store the ID# it left off with in the config file, and then keep repeating the step (incrementing batch #s so the user knows it's still working), each time starting where it left off, until I run out of records to process. Then it goes to the next step.
Avatars have been imported as Pictures, and private messages have been imported into Conversations.
A) vBulletin doesn't thread private messages (well, not until 3.8 or so anyway). The importer threads them with title/contributor matching instead. That creates some significant overhead so it imports in batches of 200 (and keeps the user updated where it is in the process).
B) [kvetch] vBulletin creates invalid serialized arrays of who private messages are To (by including members' names which have been passed through htmlentities - semicolons, yay!) and instead of telling me that unserialize was failing, the error backtrace told me a foreach loop was failing AND pointed me at the wrong loop. It took over 7 hours of failed imports to untangle this. >:-| Hate vBulletin. [/kvetch]
What's left: threads/posts and attachments. Then I need to write the password converter so I can actually log into the mess and make sure it works in Vanilla (I'm just eyeballing the database to confirm it's working for now).
@Mark I structured this parallel to the Vanilla importer (i.e. I created a /apps/garden/controllers/importvb.php file and accompanying Views). When it's ready, do you want me to drop those files into my branch and send a pull request, or do you want me to put 'em somewhere else?
Sorry to jump in on this conversation but I think importers should be plugins. Adding one or two to the core may set a precedent going forward and it could never end - and for someone like me who is building a brand new website with nothing to import from, I dont really want my installation to have importers that will never get used, just my two cents though...
@Lincoln Sounds like a plan to isolate it. However, depending on the update/upgrade plan for Vanilla 2, when someone downloads the latest tarball/zip and extracts it into their working directory, the "importer" would effectively be re-installed?
@benno Sure, but A) I don't think having import files around is any more of a big deal than having the install files and B) I've deleted the install files out of every download of vBulletin for the last 6 years so it's not that big of a deal to me :) I'd value a good, simple first-run experience over potentially adding a step to upgrades if you're worried about having the extra files there.
Fair enough! Maybe I'm biased as I'm building from nothing with a virgin install.... :)
However, on the flip side, most other packages are shipped with an importer of some description.
I know they are not going to do any harm in that folder, providing they are written to be "admin" access only. It'd be a shame to bulk up the Garden application with lots of importers where no more than one would be used at the outset, so if it was to happen I'd vote either plugins / seperate importer application.
On a totally unrelated note - I may however, @Lincoln, come asking for "community building" tips as you seem to have a fairly big setup going! ;-)
Sweet, sweet victory. I logged into Vanilla tonight with my freshly-imported vBulletin account, old password and all.
//edit: And I figured out how to make a new repo for the password importer plugin: http://github.com/lincolnwebs/VbulletinPassword
Thrown online in all its grubby bug-ridden glory: http://github.com/lincolnwebs/VanillaImport
Ehhh, kinda... this is the "here's the code to play with yourself" version. I highly do NOT recommend using it to import a production forum yet. If you have a dev server and want to play with it, go for it.
It will MOSTLY import users, roles, categories, discussions, comments, and private messages. However, there are several notable bugs, like that private messages don't get correct message counts yet because the queries to do so crash the importer for large boards. I've only successfully used it a couple times so far and haven't completely validated the results.
I think it needs a lot of work before I'd use it to import my own boards.
Not a problem, my intention of asking was bit diff ;) I have a board with less than 50 posts and 10 users and I suspect there will be any PMs yet. It is actually an IPB, so may be I can import ipb into VB and get it in V2 through using your code
Which version of VBulletin you are using?
The importer is coded for 3.8. If you removed the wall posts (step 5) and reference to the "skype" field (step 8 + column checker at beginning), it would probably work for the entire 3.x branch.
//edit: I just added comments to the file about compatibility tweaks that need to be made.
Strike that, I'm making the necessary adjustments to accommodate versions 3.6 thru 4.0. There are fewer database differences than I expected. The only thing that's different between them (in the relevant areas) is that visitor messages were added and private message addressing was changed at 3.8. I've added seamless workarounds for both.
Decided to nix importing emoticons, but am going to import "subscribed threads" as bookmarked discussions. Still refining the private message import; it's very messy and incomplete. Saving attachments for last. Everything else is working fine now as far as I can tell.
I'm going to put attachment importing somewhere else; I'd like to keep the importer as basic to core Vanilla 2 as possible for now. (Take THAT, feature creep!)
The latest version on GitHub closes all my known bugs. You could call it a "Beta" at this point. It should fully import: - users
- usergroups (primary and secondary as roles - MAX 25!)
- forums (as categories - skips those with zero threads)
- threads
- posts
- subscriptions (as bookmarks)
- avatars (as pictures)
- wall posts (where applicable)
- basic and custom profile fields (creates a WP-style UserMeta table)
- private messages (as conversations)
I'm not delighted with its PM -> conversation import, but I believe it works as conversations are intended and should be satisfactory to someone looking to import immediately.
Additionally, it will convert HTML entities in category names and discussion titles as part of the clean up. With the Vbulletin Password Converter it will allow your users to sign in seamlessly after the import. As with the updated Vanilla importer, it will merge your new admin account into an imported account with the same name (if one exists).
It's ready for others to test and critique. (Especially the private messages in step 11. It is particularly unwieldy, but it works.)
Yeah, if you try to have more than 32 it'll die with a nasty error, and there are 4 to start, so I rounded down so you're not screwed with your max # of groups right away. It'd be easier for them to condense their usergroups on the vBulletin side than to realize they messed up and have to fix in Vanilla.
@Lincoln, first up, this is amazing, and thanks for doing it.
Do you know if there is any way to clean up the BBcode that was done in vbulletin when you import?
Anyway, I did test this out and it worked pretty seamlessly on first run, and this is the first time I've tried Vanilla 2 and an importer of any type. Really outstanding. Thanks again.
@oinkfu I'm definitely going to come up with a solution for the BBCode. I don't want that solution in the base importer though, because some people may elect to use a BBCode plugin rather than stripping it out or converting it all to HTML. My solution will likely be a combination of stripping out the color, size, and font BBCodes while coverting the rest to HTML. As with anything I do for Vanilla, I'll make it available on GitHub at the very least.
Thanks for the feedback! Much appreciated.
@Lincoln
Hold the phone, I am getting this error whenever I'm trying to post (once I've completed the import): VbulletinImportID is required.
Also I can post new discussions, but the comment/post part of the discussion doesn't post, only the title.
@Convoe Heh, I'll support it as best I can, mate. I highly recommend doing test runs on a separate server and thoroughly testing the result. Feedback appreciated. Hopefully you'll see more from me in January, especially a way for dealing with all the bbcode left behind.
I'm starting to regret having a PM importer at all. I may remove it. It's very jarring to try and start using Conversations if you expect it to work like Private Messages did. It would probably be better to warn users to download their PMs first and wipe the slate clean so they don't have an expectation of parallel functionality.
Haha :D Fair enough, but sometimes you don't want to let other people get to your email address and PM allows back and forth while (potentially) maintaining a certain level of anonymity.
If you are going to drop PM from your addon, please keep the code knocking about somewhere, as it would be a great starting point fr anyone who wished to transfer the PMs over.
Sure, that code took too long to write to just delete it anyway. ;)
Considering it takes 5 minutes to sign up for a Gmail account and link it to your existing account as a secondary address I don't really buy into the email hiding idea. It's one thing to not want it published on the Internet, but if you want to have a serious conversation with someone, it's time to cough up an addy.
OK, so the importer imports all 182,146 of my users, but after that it just fails. There are a lot of categories and sub categories. There are 234,300+ threads and 1,645,000+ posts. I commented out the avatar, attachment, and personal message imports because it isn't necessary (for me).
As for the sub categories, V2 has no sub categories, so should I wait for the "labels/tags" which will replace sub categories or will there be something in the V2 core?
It says in the docs that you can only have 25 categories. You can comment out the limit, but it's still going to die at 32 categories because that's all the V2 database can handle.
I don't expect any subcategory or label functionality in the V2 core.
It looks like you're new here. If you want to get involved, click one of these buttons!