A Ngram, or number gram, is a statistical analysis of text or speech content to find the n (or number) a pattern of text is found in various texts. That pattern might include phonemes, prefixes, phrases, or letters. Given Google have pledged to scan every book ever written, they provide one of the most accurate sources of historical reference for which to search N-gram patterns. Google Books Ngram Viewer will output a graph that represents the use of a particular phrase in their scanned texts through time. More than one word or phrase may be used to generate a color-coded comparison showing how language has evolved or changed.
The Google Books Ngram Viewer refers to the text you're searching as the "corpus", and their tool can segregate searches by language or any number of limiting search criteria. This article will show you how to embed Google's N-gram viewer into your WordPress post or page with shortcode.
Note: Because multiple Ngrams on the same page tends to screw with formatting and compromise on load-time, we've used screenshots for some examples.
For the first basic search we'll generate a graph of references to the word 'cat' and 'dog' since 1880. Shortcode of
[ngram start="1880" end="2018" phrase="cat,dog" casesensitive="0"]. The result:
Pictured: An Ngram graph showing a comparison of
dog in Google's corpus between 1880 and 2018. It's a case insensitive search so right-click on the graph line to expand the breakup of word references.
An issue I've always struggled with is the battle between lazy American English and the Internationalised version, and this dissonance has impacted upon how I reference aircraft in aviation discussion papers and publications etc. Comparing the terms 'aircraft', 'aeroplane', and 'airplane' (the latter being the generally accepted US vernacular) returns the following chart (screenshot):
I'll always refrain from using anything other than 'aircraft', and the graph tends to support its widespread use. Shortcode used to generate the above chart was as follows:
[ngram start="1880" end="2018" phrase="aircraft,airplane,aeroplane"]. Searching only British English shows 'aeroplane' in use far more often than 'airplane'. Note the lack of references to any term prior to around 1900 before the aviation industry started to emerge. Also of interest is that the term 'aeroplane' only emerged as the primary term in the US after the cessation of the First War.
In all our examples were conducting a mix of case-insensitive and case-sensitive searches. You can perform a case-sensitive search by using
casesensitive="1" in your shortcode.
When you substitude a word with a * (asterisk), the Ngram Viewer will display the top ten substitutions. For instance, to find the most popular words following "full of", search for "full of *". Shortcode of
[ngram start="1880" end="2018" phrase="full of *"] returns the following (screenshot):
You'll play with this feature for hours! It has applications in the marketing arena... but it might also be used as quasi-scientific evidence of interest-based trends (not unlike Google Trends).
An inflection is the modification of a word to represent various grammatical categories such as aspect, case, gender, mood, number, person, tense and voice. You can search for them by appending _INF to an ngram. For instance, searching "book_INF a hotel" will display results for "book", "booked", "books", and "booking". The shortcode of
[ngram start="1920" end="2018" phrase="book_INF a hotel"] returns (screenshot):
This kind of comparison tends to support the use of common or localised vernacular, and might be useful when preparing copy for an industry or region that may be a little foreign.
You may return results based on word usage. A full list of tags is reproduced below (source: Google).
The Ngram Viewer provides five operators that you can use to combine ngrams:
:. Their use is as follows (screenshot, Source: Google).
Copy and paste the WordPress function into your theme's
functions.php file or, if you sensibly have one installed, your custom functions plugin.
The following attributes are available. Reference should be made to Google's documentation for more information.
height="0") we'll use 1.8 as a height ratio.
casesensitive="0"will disregard sensitivity. It is case-SenSItiVe by default.
- The Google Books Ngram viewer page is the most appropriate location to get more information. They show a number of examples that demonstrate how the API might be used.
- All the data is created under a Creative Commons Attribution 3.0 Unported license. All data is available for download here . It's likely we'll build our own API to drive some of our own ideas.
- If you would like more information on the Ngram in general, Wikipedia is a good start.
- A musical Ngram viewer is available here . Data is also available for download.
- Using multiple ngrams on the one page often returns erroneous results (which is why we've used screenshots). In cases where the service is experiencing downtime the graph will return a flatline.