Home Blog Embed a Google Ngram Viewer In WordPress With Shortcode

Embed a Google Ngram Viewer In WordPress With Shortcode

0

A Ngram, or number gram, is a statistical analysis of text or speech content to find the n (or number) a pattern of text is found in various texts. That pattern might include phonemes, prefixes, phrases, or letters. Given Google have pledged to scan every book ever written, they provide one of the most accurate sources of historical reference for which to search N-gram patterns. Google Books Ngram Viewer will output a graph that represents the use of a particular phrase in their scanned texts through time. More than one word or phrase may be used to generate a color-coded comparison showing how language has evolved or changed.

The Google Books Ngram Viewer refers to the text you’re searching as the “corpus”, and their tool can segregate searches by language or any number of limiting search criteria. This article will show you how to embed Google’s N-gram viewer into your WordPress post or page with shortcode.

Note: Because multiple Ngrams on the same page tends to screw with formatting and compromise on load-time, we’ve used screenshots for some examples.

The Result

For the first basic search we’ll generate a graph of references to the word ‘cat’ and ‘dog’ since 1880. Shortcode of [ngram start="1880" end="2018" phrase="cat,dog" casesensitive="0"]. The result:

Pictured: An Ngram graph showing a comparison of cat and dog in Google’s corpus between 1880 and 2018. It’s a case insensitive search so right-click on the graph line to expand the breakup of word references.

An issue I’ve always struggled with is the battle between lazy American English and the Internationalised version, and this dissonance has impacted upon how I reference aircraft in aviation discussion papers and publications etc. Comparing the terms ‘aircraft’, ‘aeroplane’, and ‘airplane’ (the latter being the generally accepted US vernacular) returns the following chart (screenshot):

Google Books Ngram Inflection

I’ll always refrain from using anything other than ‘aircraft’, and the graph tends to support its widespread use. Shortcode used to generate the above chart was as follows: [ngram start="1880" end="2018" phrase="aircraft,airplane,aeroplane"]. Searching only British English shows ‘aeroplane’ in use far more often than ‘airplane’. Note the lack of references to any term prior to around 1900 before the aviation industry started to emerge. Also of interest is that the term ‘aeroplane’ only emerged as the primary term in the US after the cessation of the First War.

In all our examples were conducting a mix of case-insensitive and case-sensitive searches. You can perform a case-sensitive search by using casesensitive=”1″ in your shortcode.

Wildcard Search

When you substitude a word with a * (asterisk), the Ngram Viewer will display the top ten substitutions. For instance, to find the most popular words following “full of”, search for “full of *”. Shortcode of [ngram start="1880" end="2018" phrase="full of *"] returns the following (screenshot):

Google Books Ngram Wildcard Search

You’ll play with this feature for hours! It has applications in the marketing arena… but it might also be used as quasi-scientific evidence of interest-based trends (not unlike Google Trends).

Inflection Search

An inflection is the modification of a word to represent various grammatical categories such as aspect, case, gender, mood, number, person, tense and voice. You can search for them by appending _INF to an ngram. For instance, searching “book_INF a hotel” will display results for “book”, “booked”, “books”, and “booking”. The shortcode of [ngram start="1920" end="2018" phrase="book_INF a hotel"] returns (screenshot):

Google Books Ngram Inflection

This kind of comparison tends to support the use of common or localised vernacular, and might be useful when preparing copy for an industry or region that may be a little foreign.

Part-of-speech Tags

You may return results based on word usage. A full list of tags is reproduced below (source: Google).

Google Books Ngram Tags

Ngram Compositions

The Ngram Viewer provides five operators that you can use to combine ngrams: +, -, /, *, and :. Their use is as follows (screenshot, Source: Google).

Google Books Ngram Compositions

WordPress Shortcode

Copy and paste the WordPress function into your theme's functions.php file or, if you sensibly have one installed, your custom functions plugin.

If you require shortcode to work in a sidebar widget, you'll have to enable the functionality with a filter. If you're using our custom functions plugin, you'll have that feature enabled by default.

Shortcode Attributes

The following attributes are available. Reference should be made to Google’s documentation for more information.

phrase

The phrase is the comma delimited string of words or phrases that you would like to search and/or compare.

height

The height of the Ngram iframe. If a height is set as false (height="0") we’ll use 1.8 as a height ratio.

width

The width of the Ngram iframe.

start

The start date for results. The year 1800 seems to be about the earliest accurate results can obtained.

end

The end date for results. For example, start="1840" end="1915".

corpus

Frankenstein doesn’t appear in Russian books, so if you search in the Russian corpus you’ll see a flatline. You can choose the corpus via the attribute, e.g., Frankenstein:eng_2012. Source: Google.

smoothing

Often trends become more apparent when data is viewed as a moving average. A smoothing of 1 means that the data shown for 1950 will be an average of the raw count for 1950 plus 1 value on either side: (“count for 1949” + “count for 1950” + “count for 1951”), divided by 3. So a smoothing of 10 means that 21 values will be averaged: 10 on either side, plus the target value in the center of them. At the left and right edges of the graph, fewer values are averaged. With a smoothing of 3, the leftmost value (pretend it’s the year 1950) will be calculated as (“count for 1950” + “count for 1951” + “count for 1952” + “count for 1953”), divided by 4. A smoothing of 0 means no smoothing at all: just raw data. Source: Google.

casesensitive

Using casesensitive="0" will disregard sensitivity. It is case-SenSItiVe by default.

Considerations

  • The Google Books Ngram viewer page is the most appropriate location to get more information. They show a number of examples that demonstrate how the API might be used.
  • All the data is created under a Creative Commons Attribution 3.0 Unported license. All data is available for download here . It’s likely we’ll build our own API to drive some of our own ideas.
  • If you would like more information on the Ngram in general, Wikipedia is a good start.
  • A musical Ngram viewer is available here . Data is also available for download.
  • Using multiple ngrams on the one page often returns erroneous results (which is why we’ve used screenshots). In cases where the service is experiencing downtime the graph will return a flatline.

Downlaod

Title: Embed Google Ngram Viewer In WordPress With Shortcode
Description: Embed Google Ngram Viewer In WordPress With Shortcode. Includes basic options to alter Ngram appearance.
Download: Shortcode (V0.1) | | Plugin Page

Shortt URL for this post: http://shor.tt/I6e

LEAVE A REPLY

Please enter your comment!
Please enter your name here