"Undefined foreign content" when fetching wordpress post

ludalex · May 2012

I set up Vanilla 2.0.18.4 (w/ Vanilla plugin) and Wordpress (w/ Vanilla Forums). All updated to the lastest version.

I correctly connected the Vanilla install through the wp plugin, choose the forum category where the wp posts end up.

When someone comments, it does create a new post in the chosen category, but the content is just a broken link to the post and the title is Undefined foreign content.

What do?

x00 · May 2012

If vanilla cannot access the site to scrape the title it will not be able to find it. The facility uses FetchPageInfo which uses ProxyRequest which requires curl. FetchPageInfo requires DOMDocument it will first search for the title element, and meta description, if it doesn't find it it will look for the first p element with content greater than 90 characters, and will chop it at 400 characters.

I not really sure why they implemented it this way. Personally I would have use the excellent JSON api that is available for a plugin with wordpress.

ludalex · May 2012

Vanilla 2.0.18.4 (w/ Vanilla plugin) - I meant w/ < embed > Vanilla plugin.

x00 · May 2012

If vanilla cannot access the site to scrape the title it will not be able to find it. The facility uses FetchPageInfo which uses ProxyRequest which requires curl. FetchPageInfo requires DOMDocument it will first search for the title element, and meta description, if it doesn't find it it will look for the first p element with content greater than 90 characters, and will chop it at 400 characters.

I not really sure why they implemented it this way. Personally I would have use the excellent JSON api that is available for a plugin with wordpress.

ludalex · May 2012

x00 said:
If vanilla cannot access the site to scrape the title it will not be able to find it. The facility uses FetchPageInfo which uses ProxyRequest which requires curl. FetchPageInfo requires DOMDocument it will first search for the title element, and meta description, if it doesn't find it it will look for the first p element with content greater than 90 characters, and will chop it at 400 characters.

I not really sure why they implemented it this way. Personally I would have use the excellent JSON api that is available for a plugin with wordpress.

well then what's the problem?

x00 · May 2012

I don't know I'm, pointing you in the right direction, but I'm not goign to investigate further.

x00 · May 2012

Simple gotacha is a private site. Scraping need to be done on public content, if not this solution is not suitable.

ludalex · May 2012

x00 said:
Simple gotacha is a private site. Scraping need to be done on public content, if not this solution is not suitable.

I've put the blog offline to non-administrator with a plugin, could that be the problem?

x00 · May 2012

likely, basically if it can't access publicly it via curl it can't scrape the title.

This solution only works with public content.

ludalex · May 2012

nope, tried disabling it and commenting a post, still get "Undefined foreign content" post from the "System" user with a broken link to the wp post.

x00 · May 2012

well I would work through the dependencies.

ludalex · May 2012

x00 said:
well I would work through the dependencies.

what do you mean?

x00 · May 2012

I said I could only take this so far, that is my lot. I mentioned the dependencies of this system above, if you don't know you need to find someone who would be able to do that for you. It is not something that can just be sorted out on the discussion.

ludalex · May 2012

I'm surprised to be the only one to have this issue.. I performed various google searches and apparently no one had my same problem.

Mark · May 2012

You get "undefined foreign content" when Vanilla fails to retrieve the page in question. So, either your page is unavailable to unauthenticated users (ie. in draft mode), or curl is not set up or working properly.

ludalex · May 2012

Mark said:
You get "undefined foreign content" when Vanilla fails to retrieve the page in question. So, either your page is unavailable to unauthenticated users (ie. in draft mode), or curl is not set up or working properly.

This is the curl configuration of the server I'm using.

ludalex · May 2012

Shadowdare said:
@futuretalk and @ludalex:

The Vanilla WP plugin should do it automatically, and if not then all you have to do is modify the comment links on your template so that it looks similar to this:

<a href="http://yourdomain.com/path/to/page/with/comments/#vanilla_comments" vanilla-identifier="embed-test">Comments</a>

Using this advice it now works.
But why does it fetch BLOG_NAME » BLOG_POST as title and the same thing with a thumbnail of blog's logo and metalink as content? Shouldn't it fetch post name and content?

x00 · May 2012

that was their design I'm not a fan of it, but hey fetch whatever the title element is. So what you could do is make sure the title of post is exactly that.

You can also override the FetchPageInfo function. create conf/boostrap.before.php and put

<?php if (!defined('APPLICATION')) exit();

if (!function_exists('FetchPageInfo')) {
   /**
    * Examines the page at $Url for title, description & images. Be sure to check the resultant array for any Exceptions that occurred while retrieving the page. 
    * @param string $Url The url to examine.
    * @param integer $Timeout How long to allow for this request. Default Garden.SocketTimeout or 1, 0 to never timeout. Default is 0.
    * @return array an array containing Url, Title, Description, Images (array) and Exception (if there were problems retrieving the page).
    */
   function FetchPageInfo($Url, $Timeout = 0) {
      $PageInfo = array(
         'Url' => $Url,
         'Title' => '',
         'Description' => '',
         'Images' => array(),
         'Exception' => FALSE
      );
      try {
         $PageHtml = ProxyRequest($Url, $Timeout, TRUE);
         $Dom = new DOMDocument();
         @$Dom->loadHTML($PageHtml);
         // Page Title
         $TitleNodes = $Dom->getElementsByTagName('title');
         $PageInfo['Title'] = $TitleNodes->length > 0 ? $TitleNodes->item(0)->nodeValue : '';

         /*
         *Do some string manipulation here 
         * 
         * e.g.  $PageInfo['Title']=substr($PageInfo['Title'],stripos('» ',$PageInfo['Title']));
         */



         // Page Description
         $MetaNodes = $Dom->getElementsByTagName('meta');
         foreach($MetaNodes as $MetaNode) {
            if (strtolower($MetaNode->getAttribute('name')) == 'description')
               $PageInfo['Description'] = $MetaNode->getAttribute('content');
         }
         // Keep looking for page description?
         if ($PageInfo['Description'] == '') {
            $PNodes = $Dom->getElementsByTagName('p');
            foreach($PNodes as $PNode) {
               $PVal = $PNode->nodeValue;
               if (strlen($PVal) > 90) {
                  $PageInfo['Description'] = $PVal;
                  break;
               }
            }
         }
         if (strlen($PageInfo['Description']) > 400)
            $PageInfo['Description'] = SliceString($PageInfo['Description'], 400);

         // Page Images (retrieve first 3 if bigger than 100w x 300h)
         $Images = array();
         $ImageNodes = $Dom->getElementsByTagName('img');
         $i = 0;
         foreach ($ImageNodes as $ImageNode) {
            $Images[] = AbsoluteSource($ImageNode->getAttribute('src'), $Url);
         }

         // Sort by size, biggest one first
         $ImageSort = array();
         // Only look at first 10 images (speed!)
         $i = 0;
         foreach ($Images as $Image) {
            $i++;
            if ($i > 10)
               break;

            list($Width, $Height, $Type, $Attributes) = getimagesize($Image);
            $Diag = (int)floor(sqrt(($Width*$Width) + ($Height*$Height)));
            if (!array_key_exists($Diag, $ImageSort))
               $ImageSort[$Diag] = $Image;
         }
         krsort($ImageSort);
         $PageInfo['Images'] = array_values($ImageSort);
      } catch (Exception $ex) {
         $PageInfo['Exception'] = $ex;
      }
      return $PageInfo;
   }
}
?>

Do string manipulation as appropriate.

"Undefined foreign content" when fetching wordpress post

Best Answer

Answers