Australia's Leading Digital Marketing Experts. T. 1300 235 433  |  Aggreagtion Enquires Welcome

Retrieve Object Graph and Meta Tags From a Website With PHP

For a number of article we have scheduled, we're required to obtain meta data from the head of a HTML document. The object graph tags, in particular, are used by the likes of Twitter and Facebook to render the preview text and image associated with links that are shared. The code on this page essentially provides the same functionality by retrieving all the name and property tags, and then returns them in an array.

Once the data is obtained it's important to cache the result in some way to avoid making repeated, lengthy, and unnecessary requests to the destination website (which we do with Simple Cache). The cache should have a reasonable expiry time to cater for changing head content.

The Result

To return an array of all name and property tags of our sadly neglected Flight website, use the following:

1
<?php 
2
/* Usage: Print array */
3
$url = 'http://www.flight.org/asiana-214-boeing-777-autoflight-speed-protection';
4
echo '
5
<pre>' . print_r(beliefmedia_meta_data($url), true) . '</pre>
6
 
7
';

To return a specific name or property value, use the following:

1
<?php 
2
/* Single value */
3
echo 'Image URL is  ' . beliefmedia_meta_data($url, $type = 'og:image');

Returned array will look something like this.

1
Array
2
(
3
    [viewport] => width=device-width
4
    [generator] => WordPress 4.7.5
5
    [twitter:card] => summary_large_image
6
    [twitter:site] => @flightorg
7
    [twitter:creator] => @martykhoury
8
    [twitter:title] => Asiana 214 - Boeing 777 Autoflight Speed Protection...
9
    [twitter:description] => I was asked recently to write for an internal newsletter to provide some Boeing 777 specific information to non-777 pilots on the role of the 777 Autopilot Flight Director System (AFDS) and Autothrott
10
    [twitter:image] => http://www.flight.org/wp-content/uploads/2015/06/asiana-214-crash-600.jpg
11
    [og:image] => http://www.flight.org/wp-content/uploads/2015/06/asiana-214-crash-600.jpg
12
    [og:image:type] => image/jpeg
13
    [og:image:width] => 600
14
    [og:image:height] => 300
15
    [og:url] => http://www.flight.org/asiana-214-boeing-777-autoflight-speed-protection
16
    [og:description] => I was asked recently to write for an internal newsletter to provide some Boeing 777 specific information to non-777 pilots on the role of the 777 Autopilot Flight Director System (AFDS) and Autothrottle in the Asiana 214 accident. The following article is based on that contribution. With the publica
17
    [og:title] => Asiana 214 - Boeing 777 Autoflight Speed Protection
18
)

The second example function prints the og:image featured image URL string.

The PHP Code

If you're going to try this out, it's best to download it to avoid copy errors. Usage requires Simple Cache.

1
<?php 
2
/*
3
 Retrieve Object Graph and Meta Tags From a Website With PHP
4
 http://www.beliefmedia.com/retrieve-object-meta-tags
5
 Requires Simple Cache: http://www.beliefmedia.com/simple-php-cache
6
*/
7
 
8
function beliefmedia_meta_data($url, $type = '', $args = '') {
9
 
10
  $atts = array(
11
    'cache' => 1814400
12
  );
13
 
14
  $atts['url'] = $url;
15
  if ($type != '') $atts['type'] = $type;
16
 
17
  /* Merge $args with $atts */
18
  $atts = (empty($args)) ? $atts : array_merge($atts, $args);
19
 
20
  /* Transient */
21
  $transient = 'bm_meta_' . md5(serialize($atts));
22
  $result = beliefmedia_get_transient($transient, $atts['cache']);
23
 
24
  if ($result !== false) {
25
  $return = $result;
26
 
27
  } else {
28
 
29
   /* Request. Consider CURL */
30
   $data = @file_get_contents($atts['url']);
31
 
32
   /* Only parse betwen head tags */
33
   preg_match_all('/<head>(.*?)<\/head>/si', $data, $head);
34
   $data = $head['0']['0'];
35
 
36
     if ($data !== false) {
37
 
38
         /* Get all tags */
39
         preg_match_all('/<[\s]*meta[\s]*(name|property)=&quot;?' . '([^>&quot;]*)&quot;?[\s]*' . 'content=&quot;?([^>&quot;]*)&quot;?[\s]*[\/]?[\s]*>/si', $data, $match);
40
 
41
         $count = count($match['3']);
42
 
43
            if ($count != 0) {
44
 
45
                $i = 0; do {
46
 
47
                   $key = $match['2'][&quot;$i&quot;];
48
                   $key = trim($key);
49
 
50
                   $value = $match['3'][&quot;$i&quot;];
51
                   $value = trim($value);
52
 
53
                   /* Create array/key combo */
54
                   $return[&quot;$key&quot;] = &quot;$value&quot;; $i++;
55
 
56
                } while ($i < $count);
57
 
58
            }
59
 
60
         /* Set temp transient */
61
         beliefmedia_set_transient($transient, $return);
62
 
63
     } else {
64
 
65
     /* Does an older version exist? */
66
     $return = beliefmedia_get_transient_data($transient);
67
 
68
     }
69
 
70
  }
71
 
72
 /* If data, return it, otherwise returns false */
73
 return ($return !== false) ? ($type) ? $return["$type"] : $return : false;
74
}

Considerations

  • PHP provides a default get_meta_tags() function, but it provides limited results.

This get_meta_tags() example:

1
<?php 
2
$url = 'http://www.flight.org/asiana-214-boeing-777-autoflight-speed-protection';
3
echo '
4
<pre>' . print_r(get_meta_tags($url), true) . '</pre>
5
 
6
';

Returns:

1
Array
2
(
3
    [viewport] => width=device-width
4
    [generator] => WordPress 4.7.5
5
    [twitter:card] => summary_large_image
6
    [twitter:site] => @flightorg
7
    [twitter:creator] => @martykhoury
8
    [twitter:title] => Asiana 214 - Boeing 777 Autoflight Speed Protection...
9
    [twitter:description] => I was asked recently to write for an internal newsletter to provide some Boeing 777 specific information to non-777 pilots on the role of the 777 Autopilot Flight Director System (AFDS) and Autothrott
10
    [twitter:image] => http://www.flight.org/wp-content/uploads/2015/06/asiana-214-crash-600.jpg
11
)
  • Google, Facebook, and others, tend to be shying away from OG tags for the purpose of a title and description. This is because once a post garnishes likes, a title change can associate an individuals with content that doesn't reflect their initial action.
  • Returning a specific value was an afterthought and lacks any kind of error checking. We wrote this function to return an array, and that's it's primary purpose.
  • We use an extended version of this code to build preview pages for our truncating services such as shor.tt and fat.ly.
  • Some data you might want from a website isn't included in meta tags. For example, the title. To extract the title from a HTML page, the following function applies.
1
<?php 
2
/*
3
 Retrieve Title From a Website With PHP
4
 http://www.beliefmedia.com/retrieve-object-meta-tags
5
 Requires Simple Cache: http://www.beliefmedia.com/simple-php-cache
6
*/
7
 
8
function beliefmedia_get_title($url) {
9
 
10
  /* Get data */
11
  $html = @file_get_contents($url);
12
 
13
  if ( ($html !== false) && (strlen($html) ) > 0) {
14
 
15
    /* Find title match */
16
    preg_match(&quot;/\<title\>(.*)\<\/title\>/i&quot;, $html, $title);
17
 
18
    /* Title match */
19
    $title = $title[1];
20
 
21
    /* Remove breaks and white space */
22
    $title = trim(preg_replace('/\s+/', ' ', $title));
23
  }
24
 
25
 return ( ($html !== false) && ($title != '') ) ? $title : (boolean) false;
26
}
27
 
28
/* Usage */
29
$url = 'http://www.flight.org/';
30
  • When we use the primary function, we extract the title tag and add it to the resulting meta array.

Download

The WordPress plugin (requiring the Simple Cache plugin) is to support a number of other plugins. By itself it does very little.


Title: Object Graph and Meta Tags (PHP)
Description: Retrieve Object Graph and Meta Tags From a Website With PHP. WP plugin supports some of our other plugins.
  Download • Version 0.2, 1.3K, zip, Category: PHP Code & Snippets
WordPress Plugins (General), (2.7K)    

Download our 650-page guide on Finance Marketing. We'll show you exactly how we generate Billions in volume for our clients.

  AUS Eastern Standard Time (Connecticut)

  Want to have a chat?
 

Like this article?

Share on facebook
Share on Facebook
Share on twitter
Share on Twitter
Share on linkedin
Share on Linkdin
Share on pinterest
Share on Pinterest

Leave a comment

READY TO HAVE A CHAT? CALL US ANYTIME ON 1300 235 433