Please upgrade here. These earlier versions are no longer being updated and have security issues.
HackerOne users: Testing against this community violates our program's Terms of Service and will result in your bounty being denied.

New Carmen Sandiego (GeoIP) Plugin

dericdddericdd Montreal New
edited July 2015 in Vanilla 2.0 - 2.8

Hi Everyone,

I've built a plugin that provides some GeoIP functionality. I called it Carmen Sandiego... of course :)

The plugin provides 2 pieces of functionality. It can display the author's flag for discussions and comments as well as log geoip information when a user logs in (UserMeta table).

Here's where the work got interesting. Since lots of hosting servers do not include the PHP geoip library as well as not enabling "LOAD DATA" in MySQL, I've had to make my own functions as well as manually parse the CSV into the MySQL database during the plugins import routine.

I've also added a caching layer to make sure this stuff stays fast. I strongly recommend enabling Memcached if you want to run this thing properly. I also have proper logic in my caching, so each IP is cached and available in other discussions.

Please let me know what you think,

Cheers

DDD

«1

Comments

  • businessdadbusinessdad Stealth contributor MVP

    Why not using a ready made library, such as the MaxMind one, for Geolocation? I wrote over two dozens of solutions that use geolocation, all based on that database, and they work perfectly.

  • dericdddericdd Montreal New

    I've used them before and they are great... But those libraries are not always included in shared hosting. I wanted something more standalone.

  • phreakphreak Vanilla*APP (White Label) & Vanilla*Skins Shop MVP
    edited July 2015

    @dericdd: Can you line out what can be done with the plugin exactly? Does it provide a map functionality to create flags for users on a world map, or might there be an option that allows to find "users in your area"?

    I guess the plugin provides the data, that can than be mapped on whatever a forum admin likes... but doesn't have the functionality already built in.

    • VanillaAPP | iOS & Android App for Vanilla - White label app for Vanilla Forums OS
    • VanillaSkins | Plugins, Themes, Graphics and Custom Development for Vanilla
  • dericdddericdd Montreal New

    The plan is still ongoing, lol . I am certainly open to suggestions ;)

    For now it can put a country flag near the author's name in discussions and comments. It can also log geoip info of a user when they login. (simply check which features to enable in Settings panel).

    I built this plugin for the experience of building a proper Vanilla plugin. For now the plugin has very little experience out in the wild. I am hoping that feedback will guide the next steps of it's development.

  • businessdadbusinessdad Stealth contributor MVP

    @dericdd said:
    I've used them before and they are great... But those libraries are not always included in shared hosting. I wanted something more standalone.

    Those libraries are just PHP files, which you can include with your plugin using Composer. None of my plugins assume that they are pre-installed on the server, they are included with the distribution. :)

  • dericdddericdd Montreal New

    @businessdad (love the name btw). Not sure i follow what you mean...

    I actually started making this plugin using PHP's geoip lib. It occurred to me that not all servers will have that lib included. (http://php.net/manual/en/book.geoip.php). I then decided to write something that would import MaxMind's GeoIP-Lite City CSV file, parse it and load it into MySQL tables. Another reason is that having the data in MySQL makes it easier to operate on for future dev (not to mention faster).

    I had also originally started using LOAD DATA function in MySQL to load the CSV into the database and then also realized that this is supported even less often on servers. This is why I parse through the file manually... Which works pretty fast. My dev VM takes takes less than a minute for the complete import (1 core, 1 gig of RAM).

  • dericdddericdd Montreal New

    @businessdad Have you tried installing this plugin? I am hoping to get feedback.

  • businessdadbusinessdad Stealth contributor MVP

    What I mean is that you don't have to rely on the PHP GeoIP extension, there is a ready made library to query the MaxMind database: https://github.com/maxmind/geoip-api-php

    The library doesn't require anything installed on the server.

  • dericdddericdd Montreal New

    Ah i see... Yeah i am trying to move away from DAT file. I wanted the data in our own MySQL DB.

  • businessdadbusinessdad Stealth contributor MVP

    I'm not sure that moving to MySQL would be a good idea. The RDBMS is the bottleneck of every website, while the DAT file could be queried independently for each request. Personally, I find that the DAT file scales better.

  • dericdddericdd Montreal New

    Perhaps this Plugin is not for you then

  • TimTim Operations Vanilla Staff

    @businessdad said:
    I'm not sure that moving to MySQL would be a good idea. The RDBMS is the bottleneck of every website, while the DAT file could be queried independently for each request. Personally, I find that the DAT file scales better.

    Personally, we tested it and the DAT file is slow and bloats the process. And if you're directly hitting the database and ignoring the cache then I don't know what to tell you.

    Vanilla Forums COO [GitHub, Twitter, About.me]

  • businessdadbusinessdad Stealth contributor MVP
    edited July 2015

    @Tim said:
    Personally, we tested it and the DAT file is slow and bloats the process. And if you're directly hitting the database and ignoring the cache then I don't know what to tell you.

    Again, the (wrong) assumption that any site has a cache layer installed?

    Paraphrasing: if you assume that all self-hosted Vanilla installations have a cache layer, then I don't know what to tell you. :lol:

  • businessdadbusinessdad Stealth contributor MVP
    edited July 2015

    @dericdd said:
    Perhaps this Plugin is not for you then

    I don't have a Vanilla Forums site to run (correction: not anymore), I wouldn't need it anyway. :)

  • TimTim Operations Vanilla Staff

    One implementation (DAT) is slow and will always be slow. It balloons the size of the executing process whenever you query it, and supports only single IP lookups. At 5-10ms per lookup (experimentally obtained), that's 200-400ms per 40 comment discussion.

    The other implementation (MySQL) is slow, but can be sped up by querying in parallel. That means you're suffering a 20-40ms penalty whether you ask for 1 or 100 IPs, and you leave the PHP process lean and fast (which is one of the biggest contributors to poor concurrency at scale).

    Additionally, having the data in MySQL allows for more extensive queries that surpass those offered by the extension's API and Maxmind's PHP library.

    Vanilla Forums COO [GitHub, Twitter, About.me]

  • businessdadbusinessdad Stealth contributor MVP

    @Tim said:
    One implementation (DAT) is slow and will always be slow. It balloons the size of the executing process whenever you query it, and supports only single IP lookups. At 5-10ms per lookup (experimentally obtained), that's 200-400ms per 40 comment discussion.

    Based on my experience, single IP lookups should be enough. If you query the DAT when the comment is posted, then store the result against the post, that should be all you need. As I mentioned, I implemented geolocation on dozens of sites, and there is a maximum of one lookup per visitor (location data is cached after the lookup, so that visitor's location is known from that moment on). That's 10ms once-off, on a single page load (other pages loaded by the same visitor are not affected) which seems quite good to me.

    The other implementation (MySQL) is slow, but can be sped up by querying in parallel. That means you're suffering a 20-40ms penalty whether you ask for 1 or 100 IPs, and you leave the PHP process lean and fast (which is one of the biggest contributors to poor concurrency at scale).

    I don't see why someone would have to query hundreds of IP addresses (unless it's to populate a catalogue), but, in that case, a RBAR approach would indeed be slow.

    Additionally, having the data in MySQL allows for more extensive queries that surpass those offered by the extension's API and Maxmind's PHP library.

    Yet, it still depends on if those queries are needed. I've been dealing with geolocation on a daily basis for for three years straight now, and I still have to find a scenario where I need to query multiple IP addresses in one shot. If I had such need, then I would probably dump the data into a database as well.

  • businessdadbusinessdad Stealth contributor MVP
    edited July 2015

    @dericdd For the record, and to be clear, I'm not implying that the solution you implemented is wrong. I was just wondering why you decided to rewrite the whole geolocation logic, when there are ready-made libraries available. My personal approach is to use what's available as much as possible, and write my own solution only when there aren't existing ones, or when the available ones don't cover my requirements (and, as I mentioned, in all geolocation scenarios I worked with, the PHP libraries worked perfectly fine).

    To be clear, if I had to write a geolocation solution for Vanilla, I would have used the MaxMind PHP library straight away, until the need for something more complex arised.

  • x00x00 MVP
    edited July 2015

    When businessdad the SQL guy doesn't use SQL you better sit up and listen, and if you are no-SQL guy you feel slightly smug.

    Just teasing ;) Of course use the right tool for the job.

    grep is your friend.

  • businessdadbusinessdad Stealth contributor MVP

    @x00 said:
    When businessdad the SQL guy doesn't use SQL you better sit up and listen, and if you are no-SQL guy you feel slightly smug.

    Just teasing ;) Of course use the right tool for the job.

    Precisely. And, for the record, I have nothing against NoSQL, I simply haven't yet come across a scenario where it would actually bring any significant benefit over a RDBMS. :)

  • dericdddericdd Montreal New

    @businessdad said:
    Yet, it still depends on if those queries are needed. I've been dealing with geolocation on a daily basis for for three years straight now, and I still have to find a scenario where I need to query multiple IP addresses in one shot. If I had such need, then I would probably dump the data into a database as well.

    Well the discussion thread is a clear example of needing multiple queries at a time. If we have 20-30 comments and we query them one at a time it takes forever. So at the beginning of the discussion thread we tie into "discussionController_beforeDiscussionDisplay_Handler" hook to pre-query all the IPs from the discussion and comments. Then we use the "base_authorInfo_Handler" to display flag using pre-loaded geoip info.

    You made a good point that we should perhaps record the geoIP data while posting discussion/comment. I will probably start doing this as well in future versions to lower that query count... However I wanted it to work for old data as well.

    I did not re-invent the wheel or rewrite the logic. I am using CSV data/structure from MaxMind. Does not take a rocket scientist to figure out how to query it if you understand how IPs and subnets work.

    I agree that one should not build something if it is already made... it just wasn't made the way I needed it.

Sign In or Register to comment.