Australia's Leading Digital Marketing Experts. T. 1300 235 433  |  Aggreagtion Enquires Welcome

Create Keyword Tags from Text With PHP

We wrote this function back in 2009 and it has since found its way onto hundreds - if not thousands - of other websites. The function was initially written to extract suggested keywords from a string of text that the user could optionally apply to a post. We've not modified it on migration from Internoetics so we'll will likely update it soon.

Creating the Keywords/Tags

Given the text body, we wanted to accomplish the following:

  • Based on the entire text, I wanted to create keywords from all words in the text.
  • I wanted a 'blacklist' of $commonWords that I could remove from the returned keyword array.
  • I wanted to compare the extracted keywords (the $words array) with an array of permitted keywords. Only the returned keywords that are also in the $allowedWords array will be returned.
  • I wanted to restrict keyword output to words over 'n' characters in length.
  • I was required to limit keywords that appeared a minimum of 'n' times in the submitted text.
  • I wanted to specify how many keywords, in total, would be returned.

Example

Take, for example, the following block of text (extracted from another blog post):

Many systems that traditionally had a reliance on the pneumatic system have been transitioned to the electrical architecture. They include engine start, API start, wing ice protection, hydraulic pumps and cabin pressurisation. The only remaining bleed system on the 787 is the anti-ice system for the engine inlets. In fact, Boeing claims that the move to electrical systems has reduced the load on engines (from pneumatic hungry systems) by up to 35 percent (not unlike today's electrically power flight simulators that use 20% of the electricity consumed by the older hydraulically actuated flight sims).

Usage:

echo extract_keywords($text);

Output: ice, pneumatic, engine, electrical

The extracted keywords aren't ideal... but they are a good starting point for 'suggested' tags that the end user can refine.

The PHP Function

You should download the PHP function below. The $commonWords array is several hundred words in length so it wasn't practical reproducing it in this post.

1
<?php 
2
/*
3
 Create Keyword Tags from Text With PHP
4
 http://www.beliefmedia.com/create-keywords
5
*/
6
 
7
function beliefmedia_keywords($string, $min_word_length = 3, $min_word_occurrence = 2, $as_array = false, $max_words = 8, $restrict = false) {
8
 
9
   function keyword_count_sort($first, $sec) {
10
     return $sec[1] - $first[1];
11
   }
12
 
13
   $string = preg_replace('/[^\p{L}0-9 ]/', ' ', $string);
14
   $string = trim(preg_replace('/\s+/', ' ', $string));
15
 
16
   $words = explode(' ', $string);
17
 
18
   /*
19
 Only compare to common words if $restrict is set to false
20
 Tags are returned based on any word in text
21
 If we don't restrict tag usage, we'll remove common words from array
22
   */
23
 
24
   if ($restrict === false) {
25
      $commonWords = array('a','able','about','above','abroad','according','accordingly','across','actually','adj','after','afterwards','again','against','ago','ahead','ain\'t','all','allow','allows','almost','alone','along','alongside','already','also','although','always','am','amid','amidst','among','amongst','an','and','another','any','anybody','anyhow','anyone','anything','anyway','anyways','anywhere','apart','appear','appreciate','appropriate','are','aren\'t','around','as','a\'s','aside','ask','asking','associated','at','available','away','awfully','b','back','backward','backwards','be','became','because','become','becomes','becoming','been','before','beforehand','begin','behind','being','believe','below','beside','besides','best','better','between','beyond','both','brief','but','by','c','came','can','cannot','cant','can\'t','caption','cause','causes','certain','certainly','changes','clearly','c\'mon','co','co.','com','come','comes','concerning','consequently','consider','considering','contain','containing','contains','corresponding','could','couldn\'t','course','c\'s','currently','d','dare','daren\'t','definitely','described','despite','did','didn\'t','different','directly','do','does','doesn\'t','doing','done','don\'t','down','downwards','during','e','each','edu','eg','eight','eighty','either','else','elsewhere','end','ending','enough','entirely','especially','et','etc','even','ever','evermore','every','everybody','everyone','everything','everywhere','ex','exactly','example','except','f','fairly','far','farther','few','fewer','fifth','first','five','followed','following','follows','for','forever','former','formerly','forth','forward','found','four','from','further','furthermore','g','get','gets','getting','given','gives','go','goes','going','gone','got','gotten','greetings','h','had','hadn\'t','half','happens','hardly','has','hasn\'t','have','haven\'t','having','he','he\'d','he\'ll','hello','help','hence','her','here','hereafter','hereby','herein','here\'s','hereupon','hers','herself','he\'s','hi','him','himself','his','hither','home','hopefully','how','howbeit','however','hundred','i','i\'d','ie','if','ignored','i\'ll','i\'m','immediate','in','inasmuch','inc','inc.','indeed','indicate','indicated','indicates','inner','inside','insofar','instead','into','inward','is','isn\'t','it','it\'d','it\'ll','its','it\'s','itself','i\'ve','j','just','k','keep','keeps','kept','know','known','knows','l','last','lately','later','latter','latterly','least','less','lest','let','let\'s','like','liked','likely','likewise','little','look','looking','looks','low','lower','ltd','m','made','mainly','make','makes','many','may','maybe','mayn\'t','me','mean','meantime','meanwhile','merely','might','mightn\'t','mine','minus','miss','more','moreover','most','mostly','mr','mrs','much','must','mustn\'t','my','myself','n','name','namely','nd','near','nearly','necessary','need','needn\'t','needs','neither','never','neverf','neverless','nevertheless','new','next','nine','ninety','no','nobody','non','none','nonetheless','noone','no-one','nor','normally','not','nothing','notwithstanding','novel','now','nowhere','o','obviously','of','off','often','oh','ok','okay','old','on','once','one','ones','one\'s','only','onto','opposite','or','other','others','otherwise','ought','oughtn\'t','our','ours','ourselves','out','outside','over','overall','own','p','particular','particularly','past','per','perhaps','placed','please','plus','possible','presumably','probably','provided','provides','q','que','quite','qv','r','rather','rd','re','really','reasonably','recent','recently','regarding','regardless','regards','relatively','respectively','right','round','s','said','same','saw','say','saying','says','second','secondly','see','seeing','seem','seemed','seeming','seems','seen','self','selves','sensible','sent','serious','seriously','seven','several','shall','shan\'t','she','she\'d','she\'ll','she\'s','should','shouldn\'t','since','six','so','some','somebody','someday','somehow','someone','something','sometime','sometimes','somewhat','somewhere','soon','sorry','specified','specify','specifying','still','sub','such','sup','sure','t','take','taken','taking','tell','tends','th','than','thank','thanks','thanx','that','that\'ll','thats','that\'s','that\'ve','the','their','theirs','them','themselves','then','thence','there','thereafter','thereby','there\'d','therefore','therein','there\'ll','there\'re','theres','there\'s','thereupon','there\'ve','these','they','they\'d','they\'ll','they\'re','they\'ve','thing','things','think','third','thirty','this','thorough','thoroughly','those','though','three','through','throughout','thru','thus','till','to','together','too','took','toward','towards','tried','tries','truly','try','trying','t\'s','twice','two','u','un','under','underneath','undoing','unfortunately','unless','unlike','unlikely','until','unto','up','upon','upwards','us','use','used','useful','uses','using','usually','v','value','various','versus','very','via','viz','vs','w','want','wants','was','wasn\'t','way','we','we\'d','welcome','well','we\'ll','went','were','we\'re','weren\'t','we\'ve','what','whatever','what\'ll','what\'s','what\'ve','when','whence','whenever','where','whereafter','whereas','whereby','wherein','where\'s','whereupon','wherever','whether','which','whichever','while','whilst','whither','who','who\'d','whoever','whole','who\'ll','whom','whomever','who\'s','whose','why','will','willing','wish','with','within','without','wonder','won\'t','would','wouldn\'t','x','y','yes','yet','you','you\'d','you\'ll','your','you\'re','yours','yourself','yourselves','you\'ve','z','zero');
26
      $words = array_udiff($words, $commonWords,'strcasecmp');
27
   }
28
 
29
   /* Restrict Keywords based on values in the $allowedWords array */
30
   if ($restrict !== false) {
31
      $allowedWords =  array('engine','boeing','electrical','pneumatic','ice');
32
      $words = array_uintersect($words, $allowedWords,'strcasecmp');
33
   }
34
 
35
   $keywords = array();
36
 
37
   while(($c_word = array_shift($words)) !== null) {
38
 
39
     if (strlen($c_word) < $min_word_length) continue;
40
     $c_word = strtolower($c_word);
41
 
42
        if (array_key_exists($c_word, $keywords)) $keywords[$c_word][1]++;
43
        else $keywords[$c_word] = array($c_word, 1);
44
   }
45
 
46
   usort($keywords, 'keyword_count_sort');
47
   $final_keywords = array();
48
 
49
   foreach ($keywords as $keyword_det) {
50
     if ($keyword_det[1] < $min_word_occurrence) break;
51
     array_push($final_keywords, $keyword_det[0]);
52
   }
53
 
54
  $final_keywords = array_slice($final_keywords, 0, $max_words);
55
 
56
 return $as_array ? $final_keywords : implode(', ', $final_keywords);
57
}

Usage is as follows:

1
<?php 
2
/* Usage */
3
$string = "Many systems that traditionally had a reliance on the pneumatic system have been transitioned to the electrical architecture. They include engine start, API start, wing ice protection, hydraulic pumps and cabin pressurisation. The only remaining bleed system on the 787 is the anti-ice system for the engine inlets. In fact, Boeing claims that the move to electrical systems has reduced the load on engines (from pneumatic hungry systems) by up to 35 percent (not unlike today's electrically power flight simulators that use 20% of the electricity consumed by the older hydraulically actuated flight sims).";
4
echo beliefmedia_keywords($string, $min_word_length = 3, $min_word_occurrence = 2, $as_array = false, $max_words = 8, $restrict = false);

Notes on Usage

In the above example, if $restrict = true were set to false, the tags returned would be system, systems, engine, start, ice. This is because we're only omitting the $commonWords from the result (and evaluating every other word for consideration). The results is less accurate than comparing against a preferred keyword array.

The most accurate results are obtained from refining the $allowedWords array and including as many subject-specific words as possible to cover all preferred tags.

$min_word_length determines what words are searched. In our case, anything less than 3 characters in length will be ignored.

$min_word_occurrence determines how many times a word must be written into text before it can be considered for inclusion in returned keywords.

$as_array specifies whether the keywords are rendered as text or as an array.

$max_words determines the maximum number of words to return in the keyword string.

Download

The second half to this article (previously shared on Internoetics) will be published soon.


Title: Create Keyword Tags from Text With PHP
Description: Create Keyword Tags from Text With PHP.
  Download • Version 0.2, 3.0K, zip, Category: PHP Code & Snippets

Download our 650-page guide on Finance Marketing. We'll show you exactly how we generate Billions in volume for our clients.

  AUS Eastern Standard Time (Connecticut)

  Want to have a chat?
 

Like this article?

Share on facebook
Share on Facebook
Share on twitter
Share on Twitter
Share on linkedin
Share on Linkdin
Share on pinterest
Share on Pinterest

Leave a comment

READY TO HAVE A CHAT? CALL US ANYTIME ON 1300 235 433