Avoiding IFRAMES via PHP and cURL

A current project requires integration with a certain third party that provides a “Web service” to allow data integration into member websites. Unfortunately for me, this service revolves around plopping an IFRAME into your page where you’d like the data to appear. Not great.

In an ideal world, we’d be able to pull said content via (the real) AJAX but due to security (particularly cross domain) issues, that’s not a possibility. All is not lost, however. Enter cURL.

A brief introduction to cURL

cURL is defined as:

PHP supports libcurl, a library created by Daniel Stenberg, that allows you to connect and communicate to many different types of servers with many different types of protocols. libcurl currently supports the http, https, ftp, gopher, telnet, dict, file, and ldap protocols. libcurl also supports HTTPS certificates, HTTP POST, HTTP PUT, FTP uploading (this can also be done with PHP’s ftp extension), HTTP form based upload, proxies, cookies, and user+password authentication.

These functions have been added in PHP 4.0.2.

In summary, cURL allows you to have PHP fetch a page for you to do with what you will.

Setting up cURL

There’s a bit of a learning curve when using cURL, so you’ll want to review the manual. If you’re looking to set something up quick and dirty, the function I’ve come to use is (via):

function get_url( $url,  $javascript_loop = 0, $timeout = 5 )
{
    $url = str_replace( "&", "&", urldecode(trim($url)) );

    $cookie = tempnam ("/tmp", "CURLCOOKIE");
    $ch = curl_init();
    curl_setopt( $ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1" );
    curl_setopt( $ch, CURLOPT_URL, $url );
    curl_setopt( $ch, CURLOPT_COOKIEJAR, $cookie );
    curl_setopt( $ch, CURLOPT_FOLLOWLOCATION, true );
    curl_setopt( $ch, CURLOPT_ENCODING, "" );
    curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true );
    curl_setopt( $ch, CURLOPT_AUTOREFERER, true );
    curl_setopt( $ch, CURLOPT_SSL_VERIFYPEER, false );    # required for https urls
    curl_setopt( $ch, CURLOPT_CONNECTTIMEOUT, $timeout );
    curl_setopt( $ch, CURLOPT_TIMEOUT, $timeout );
    curl_setopt( $ch, CURLOPT_MAXREDIRS, 10 );
    $content = curl_exec( $ch );
    $response = curl_getinfo( $ch );
    curl_close ( $ch );

    if ($response['http_code'] == 301 || $response['http_code'] == 302)
    {
        ini_set("user_agent", "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1");

        if ( $headers = get_headers($response['url']) )
        {
            foreach( $headers as $value )
            {
                if ( substr( strtolower($value), 0, 9 ) == "location:" )
                    return get_url( trim( substr( $value, 9, strlen($value) ) ) );
            }
        }
    }

    if (    ( preg_match("/>[[:space:]]+window\.location\.replace\('(.*)'\)/i", $content, $value) || preg_match("/>[[:space:]]+window\.location\=\"(.*)\"/i", $content, $value) ) &&
            $javascript_loop < 5
    )
    {
        return get_url( $value[1], $javascript_loop+1 );
    }
    else
    {
        return array( $content, $response );
    }
}

This function allows me to pass a URL and have it be returned as the first index in an array. The second index contains another array of response headers as well.

Replacing an IFRAME with cURL

The particular service I'm working with uses GET variables to filter the data presented. I can literally use the same URL string in my cURL function and work with the data straight away. For example:

$service_url  = $service_base_url;
$service_url .= "&var1=X";
$service_url .= "&var2=Y";
$service_url .= "&api_key=" . $service_api_key;

$request_results = get_url($service_url);

preg_match("/<body.*\/body>/s", $request_results[0], $pagecontent);

$pagecontent = $pagecontent[0];

$pagecontent = str_replace('<body>', '', $pagecontent);
$pagecontent = str_replace('</body>', '', $pagecontent);

// I'd like to resize the images...
$pattern = '/\< *[img][^\>]*[src] *= *[\"\']{0,1}([^\"\'\ >]*)/i';
$replacement = '<img src="' . $imgpath . '/phpthumb/phpThumb.php?src=' . '$1' . '&w=160&h=110&zc=1';
$pagecontent = preg_replace($pattern, $replacement, $pagecontent);
	
echo $pagecontent;

What's happening there is I'm first building the request URL (as the GET variables will change based on a few things) and then firing my get_url() function and passing the final URL. That's a great start, but of course the cURL request is going to return a full HTML document (including the head) which we don't need. A quick preg_match will pull out everything included within the body of the document, and we'll finally strip that out as well.

That leaves you with the remote page as would have been included in the IFRAME itself. You can write applicable CSS and do what you will with the markup. You can even go a step further and continue to refine the markup returned. In my case, I'd like to resize the images returned to fit the design I'm trying to implement. I've come to use phpThumb for all of my resizing needs and a quick preg_replace lets you reformat the img src to better match your design.

Keep in mind the terms of service

I'm currently waiting to hear back from the third party in an effort of following their terms of service. With the official documentation revolving around the inclusion of an IFRAME I'd like to make sure that this alternative method is acceptable before I put the remaining hours into customizing the output.