RBA Cash Rate: 3.60% · 1AUD = 0.66 USD · Inflation: 2.1%  
Leading Digital Marketing Experts | 1300 235 433 | Aggregation Enquires Welcome | Book Appointment
Example Interest Rates: Home Loan Variable: 4.99% (4.99%*) • Home Loan Fixed: 4.79% (5.47%*) • Fixed: 4.79% (5.47%*) • Variable: 4.99% (4.99%*) • Investment IO: 4.99% (5.81%*) • Investment PI: 4.89% (5.26%*)

Using the Matrix API to Check for Duplication, AI, or Plagiarism

The fuel that feeds Search Engines is content that fulfils Google's expectation of Expertise, Experience, Authoritativeness, and Trust (abbreviated by Google as EEAT), and this can only only be achieved by producing original and relevant material that keeps users engaged. It's not rocket science. We've borrowed Google's EEAT attributes as part of our own Magic Lantern model and expanded it to include Expertise, Experience, Education, Entertainment, and Authoritativeness (EEEEAT, or E+AT), and these are the broad components that underpin our funnel-based 'escalation of commitment' that delivers consumer 'Trust'. Failing to deliver on any Lantern attribute, or compromising any attribute, works in the same way with Google as it does your users... and duplicate blog content is near the top of the list.

This article looks briefly at duplicate content, and how to search for duplicates or similar content when measured again the broader 'Financial Web' (via our Matrix API).

Best Practice Citations and Attribution: In the course of querying Matrix for various examples in creating this article we identified duplicate blog content at a scale that was unexpected. As such, we've schedule a short article title "Content Attribution and Citing Sources in Article Content". The article details how to cite, reference, source, and assign attribution to those nested elements in article content. The article discusses duplicate content a little more and references some of Google's guidelines.

Site SEO: Financial websites are already swimming against the current when it comes to exposure (by way of their "Your Money or Your Life" mantra), so blog content should be a creative outlet that allows you to expose yourself, your creativity, your brand, and your personality in a unique way... and duplicate content is contrary to this primary content objective. As a significant component of Yabber, we've talked a lot about Search Engine Optimisation (SEO) in the past, and a pending article details advances made in our own SEO presence. This article was intended to avoid diving deep into the semantics of SEO, so any SEO information should be considered cursory.

The Matrix API

The Matrix API is essentially a search engine for just mortgage broker websites. We also index real-estate, buyers agents, and financial planner websites, but the service was created for brokers, and its the financial sector where the data is most complete. The service indexes every bit of available page information, measures each page against each other and major search engines, and it returns an index of site and/or page performance for every known broker in the country - essentially mapping what we call the 'Financial Web'. Supported by a free industry plugin that a few hundred brokers are using in some capacity, we're able to parse statistics from websites or digital products that aren't ours for the purpose of a direct comparison.

When we say that our mortgage broker website converts more clients than competitors, it's not hyperbole, marketing fluff, or rhetoric - it's backed by solid objective, real-world data, and Matrix plays a significant part in this resolution. In fact, our site converts 12X more clients than the site offered by what might be considered our closest competitor.

Duplicate Content

If all you publish is duplicate content, Google and other Search engines will easily recognise the lack of any expertise or authoritativeness in your content and ultimately resolve that you either can't be trusted, or that your website isn't worthy of ranking with authority.

Duplicate content isn't necessarily bad, and that's because some content needs to be reproduced by businesses in their entirety, such as press releases, cash rate board meetings, property listings, lender data, and so on, but when this duplicate content is used there's a key indicator that we feed to Search engines that lets them know we're not the original source, and this is done by a snipped of code called the 'canonical URL' - a single URL that references the original publisher, assigns search value to the other page, and indicates that the page shouldn't be evaluated for uniqueness. If this instruction isn't present on your page, you'll run the risk or having your site penalised, or even worse - and when the duplicate nature of your blog content is overwhelming - having your website deindexed entirely. Possibly worse is when you try to rewrite components of a duplicate article in your own voice but fail to inject sufficient differentiation causing your content to be identified as plagiarism.

First Come, First Served: Google will typically index the first article it finds as the 'source' article, and anything else is a copy. The more articles that reference the canonical URL will change the article dynamic, but if you're not including the required tag you're in for a world of SEO hurt.

This brings me to managed article services. First, we've had an article distribution service for a number of years where we've often posted in excess of 100 articles per month to client websites. What differentiates our service from others is that we strictly apply a canonical URL (and other markup) with every article assigning value back to a hidden non-public archive. Areas of the standard broker website framework that are duplicate in nature - such as the Streets Module with its 20-million pages, and other search facilities such as BSB numbers and schools - also include a parent canonical URL.-

There are issues associated with your own content finding its way elsewhere (content theft), and as we've identified numerous cases where various SEO agencies will lift various articles and modify them only slightly so as to appear 'mildly' different to the original (but significantly similar to content algorithms). Matrix identifies these infractions.

In this article we'll look at how to evaluate your page against another, or against any page listed in the Matrix database. Given we've indexed thousands of broker websites with hundreds of thousands of pages, we'll likely identify malfeasance or crappy practice if it exists. We use the system daily to see who has stolen our content... and it happens all the time.

Article Services Suck

Our 'experimentation' with article services dates back to around 2007, and whatever method we've worked with, none of them align with best practice. Despite providing an article services ourselves that fully complies with SEO best practice, it is not a service I recommend. Any business that wants to elevate their organic SEO traffic and SERP or their local business profile needs to manufacture original, engaging, entertaining, educational, and authoritative content of all types that exposes your expertise and authoritativeness. It's really that simple... and difficult.

An early decision has to be made whether you're focused on SEO or fluffy article content that works contrary to good search rankings, and it's the former that I recommend. If you are using our article program or another, you need to be given a clear guarantee that a canonical URL will be present in the duplicate article that references the source... and you need to understand that a site full of duplicate content will objectively and significantly decrease search visibility. You also needs to understand that a site full of content that Search resolves as duplicate or plagiarism runs the risk of causing a site to be delisted completely. Anybody that has article content delivered to their website will tell you that it hasn't had a noticeable impact on their digital success, so what is the real purpose of the content if it doesn't work?

There was a time when we'd only post an article to a client website when they'd posted a certain number themselves, so the properly referenced 'generic' articles only accounted for a smaller percentage of overall content, but the problem with this is that Yabber rarely posted anything because businesses rarely crafted their own original content. We had to make the decision to hand complete control to clients over what type of content was posted, and how often new articles were delivered, but the net result is that most businesses are compromising on the more lucrative organic traffic in favour of 'filler' articles that aren't indexed.

We're coming off what is a 10-week article hiatus where we paused our own article module for a massive update (relating to the social posts accompanying the shared content). As we prepare to kick the program back into gear, we'll be talking to individual clients about the inevitable consequences of the program.

An Article Service: One rather popular distribution services writes articles that are excellent, but their tech is quite poor, and their commitment to the canonical URL is appalling. We contacted the group a few years ago and offered them our system so that best practice might be applied. They declined. Instead, they use the Yoast SEO Plugin by default - a clumsy solution that adds unnecessary weight to your codebase, and as we'll show shortly, the canonical URL is seldom used. I'm not sure that brokers understand the negative Search consequences when these services are used (again, including our own).

Article Duplicates and Similarity

Google, Bing, and others don't publish article similarity percentages publicly, but the following bands tend to apply for best SEO considerations.

  • Above 80% . Duplicate content risk (may not rank well, and the content or website runs the risk of delisting).
  • Around 50% – 70%. This is acceptable for different audiences, provided there's unique value. It suggests that one article was highly influenced by the other.
  • Less than 50% similarity. While clearly influenced by another article, any score below around 50% is considered 'safely unique' by most engines and plagiarism checkers.

If we had two businesses from the same industry that aren't known to each other create content on the same topic and influenced by the same source material, we might expect the following overlap:

  • 5% – 30%. Typical Range. Most independently-written articles fall here when influenced by the same idea or generic topic (where lots of typical language and vernacular is used). If lender data is reproduced, or company-provided bulleted points are provided, the score may be higher for shorter articles.
  • 30% – 50%. Higher Overlap. Common industry phrasing, similar outlines, jargon, etc.
  • Less than 15% similarity. Unique content. Generally very opinionated, stylistic, or very creative.

Again, anything over 50% overlap suggests that one article heavily influenced the other, and a canonical URL should be considered in those cases where the source is heavily referenced or copied. Additionally, appropriate content citations should be used to ensure similar content is suitably referenced.

For industries that aren't connected, or for those with a specific writing style, the overlap is significantly lower.

How is Similarity Calculated?

Our system uses a number of methods to compare one page against another. We use Shingles/Simhashing, Jaccard Similarity, Cosine Similarity (TF-IDF), and Semantic similarity algorithms to determine a highly accurate and reliable score. We use similar methods to extract website and page keywords.

Matrix Comparison Endpoint

The Matrix API compare endpoint is used when you want to compare one article against another for similarity. The Similarity (or similar) endpoint returns the same data and measures against all known websites, so the compare endpoint is rarely used for broader comparisons.

A request is made to matrix/compare/url_1/url_2/matrix.json with the API key as a parameter. The value of url_1 and url_2 and always returned via the page request, and usually with the get request when the page is first requested. You may also request all pages indexed on a site via browse or list, but these facilities are outside the scope of this article.

1
Array
2
(
3
    [status] => 200
4
    [code] => 200
5
    [data] => Array
6
        (
7
            [compare] => Array
8
                (
9
                    [similarity_percent] => 100%
10
                    [similarity] => 100
11
                    [url_id_source] => 0
12
                    [url_1] => Array
13
                        (
14
                            [canonical] => https://xxxxxxxxxx.com.au/blog/low-cost-renos-to-help-keep-your-home-cosy-this-autumn/
15
                            [canonical_url_id] => 3
16
                            [url] => https://xxxxxxxxxx.com.au/blog/low-cost-renos-to-help-keep-your-home-cosy-this-autumn/
17
                            [url_id] => 34
18
                        )
19
 
20
                    [url_2] => Array
21
                        (
22
                            [canonical] => https://yyyyyyyyyy.com.au/our-blog/low-cost-renos-to-help-keep-your-home-cosy-this-autumn
23
                            [canonical_url_id] => 35
24
                            [url] => https://www.xxxxxxxxxx.com.au/our-blog/low-cost-renos-to-help-keep-your-home-cosy-this-autumn
25
                            [url_id] => 35
26
                        )
27
 
28
                )
29
 
30
        )
31
 
32
    [message] => Array
33
        (
34
            [0] => Success
35
        )
36
 
37
)

Neither article has a valid canonical URL. The first page (url_1) has two canonical URLs listed but this is only because they've referenced an invalid canonical URL in their HTML. The www and non-www version of a website are not the same, and only one should be used. Search will penalise the first site for their invalid 'endless loop' canonical - a common mistake. Both sites will be recognised as 100% duplicate content (indicted via the similarity key), and both websites will likely be penalised. If all their articles are listed in this manner (and they are), they both run the risk of Search prison.

The url_id_source key of '0' indicates that no source was found via canonical attribution.

Matrix Similarity Endpoint

We'll use the first page from above and check against all similar articles. The endpoint of matrix/similar/url_1/matrix.json with the API Key attribute returns the following (in total, 128 returned results, but we'll show just 10).

1
Array
2
(
3
    [status] => 200
4
    [code] => 200
5
    [url_id_source] => 42
6
    [data] => Array
7
        (
8
            [0] => Array
9
                (
10
                    [article_id] => 16
11
                    [common] => 595
12
                    [similarity] => 100.0000
13
                    [domain] => xxxxx.com.au
14
                    [domain_id] => 16
15
                    [url_id] => 55
16
                    [url] => https://xxxxx.com.au/blog/low-cost-renos-to-help-keep-your-home-cosy-this-autumn/
17
                    [canonical] => https://xxxxx.com.au/blog/low-cost-renos-to-help-keep-your-home-cosy-this-autumn/
18
                    [canonical_url_id] => 55
19
                    [title] => Low cost renos to help keep your home cosy this autumn - xxxxx
20
                )
21
 
22
            [1] => Array
23
                (
24
                    [article_id] => 5
25
                    [common] => 595
26
                    [similarity] => 100.0000
27
                    [domain] => xxxxx.com.au
28
                    [domain_id] => 5
29
                    [url_id] => 36
30
                    [url] => https://xxxxx.com.au/post/252/low-cost-renos-to-help-keep-your-home-cosy-this-autumn
31
                    [canonical] => https://xxxxx.com.au/post/252/low-cost-renos-to-help-keep-your-home-cosy-this-autumn
32
                    [canonical_url_id] => 36
33
                    [title] => Low cost renos to help keep your home cosy this autumn
34
                )
35
 
36
            [2] => Array
37
                (
38
                    [article_id] => 13
39
                    [common] => 595
40
                    [similarity] => 100.0000
41
                    [domain] => xxxxx.com.au
42
                    [domain_id] => 13
43
                    [url_id] => 51
44
                    [url] => https://www.xxxxx.com.au/low-cost-renos-to-help-keep-your-home-cosy-this-autumn/
45
                    [canonical] => https://www.xxxxx.com.au/low-cost-renos-to-help-keep-your-home-cosy-this-autumn/
46
                    [canonical_url_id] => 13
47
                    [title] => Low cost renos to help keep your home cosy this autumn - xxxxx
48
                )
49
 
50
            [3] => Array
51
                (
52
                    [article_id] => 11
53
                    [common] => 595
54
                    [similarity] => 100.0000
55
                    [domain] => xxxxx.com.au
56
                    [domain_id] => 11
57
                    [url_id] => 48
58
                    [url] => https://www.xxxxx.com.au/low-cost-renos-to-help-keep-your-home-cosy-this-autumn/
59
                    [canonical] => https://www.xxxxx.com.au/low-cost-renos-to-help-keep-your-home-cosy-this-autumn/
60
                    [canonical_url_id] => 48
61
                    [title] => Low cost renos to help keep your home cosy this autumn
62
                )
63
 
64
            [4] => Array
65
                (
66
                    [article_id] => 9
67
                    [common] => 595
68
                    [similarity] => 98.1848
69
                    [domain] => xxxxx.com
70
                    [domain_id] => 9
71
                    [url_id] => 43
72
                    [url] => https://xxxxx.com/low-cost-renos-to-help-keep-your-home-cosy-this-autumn/
73
                    [canonical] => https://xxxxx/2025/03/12/low-cost-renos-to-keep-your-home-cosy-this-autumn/
74
                    [canonical_url_id] => 44
75
                    [title] => Low cost renos to help keep your home cosy this autumn - Adelaide's Home, Business and Corporate Finance Specialist Brokers
76
                )
77
 
78
            [5] => Array
79
                (
80
                    [article_id] => 14
81
                    [common] => 557
82
                    [similarity] => 86.6252
83
                    [domain] => xxxxx.com.au
84
                    [domain_id] => 14
85
                    [url_id] => 52
86
                    [url] => https://www.xxxxx.com.au/posts/low-cost-renos-to-help-keep-your-home-cosy-this-autumn
87
                    [canonical] => https://www.xxxxx.com.au/posts/low-cost-renos-to-help-keep-your-home-cosy-this-autumn
88
                    [canonical_url_id] => 14
89
                    [title] => xxxxx
90
                )
91
 
92
            [6] => Array
93
                (
94
                    [article_id] => 17
95
                    [common] => 542
96
                    [similarity] => 85.2201
97
                    [domain] => xxxxx.com.au
98
                    [domain_id] => 17
99
                    [url_id] => 56
100
                    [url] => https://www.xxxxx.com.au/post/low-cost-renos-to-help-keep-your-home-cosy-this-autumn
101
                    [canonical] => https://www.xxxxx.com.au/post/low-cost-renos-to-help-keep-your-home-cosy-this-autumn
102
                    [canonical_url_id] => 56
103
                    [title] => Low cost renos to help keep your home cosy this autumn
104
                )
105
 
106
            [7] => Array
107
                (
108
                    [article_id] => 12
109
                    [common] => 557
110
                    [similarity] => 84.1390
111
                    [domain] => xxxxx.com.au
112
                    [domain_id] => 12
113
                    [url_id] => 49
114
                    [url] => https://xxxxx.com.au/blog/low-cost-renos-to-keep-your-home-cosy-this-autumn/
115
                    [canonical] => https://xxxxx.com.au/blog/low-cost-renos-to-keep-your-home-cosy-this-autumn/
116
                    [canonical_url_id] => 49
117
                    [title] => Low cost renos to help keep your home cosy this autumn
118
                )
119
 
120
            [8] => Array
121
                (
122
                    [article_id] => 6
123
                    [common] => 557
124
                    [similarity] => 82.8869
125
                    [domain] => xxxxx.com.au
126
                    [domain_id] => 6
127
                    [url_id] => 38
128
                    [url] => https://xxxxx.com.au/blog/low-cost-renos-to-keep-your-home-cosy-this-autumn/
129
                    [canonical] => https://xxxxx.com.au/blog/low-cost-renos-to-keep-your-home-cosy-this-autumn/
130
                    [canonical_url_id] => 38
131
                    [title] => Low cost renos to help keep your home cosy this autumn
132
                )
133
 
134
            [----- SNIP -----]
135
 
136
            [127] => Array
137
                (
138
                    [article_id] => 822
139
                    [common] => 497
140
                    [similarity] => 82.8333
141
                    [domain] => xxxxx.com.au
142
                    [domain_id] => 803
143
                    [url_id] => 4411
144
                    [url] => https://xxxxx.com.au/low-cost-renos-to-help-keep-your-home-cosy-this-autumn/
145
                    [canonical] => https://xxxxx/2025/03/12/low-cost-renos-to-keep-your-home-cosy-this-autumn/
146
                    [canonical_url_id] => 42
147
                    [title] => Low cost renos to help keep your home cosy this autumn | xxxxx
148
                )
149
 
150
        )
151
 
152
    [message] => Array
153
        (
154
            [0] => Success
155
        )
156
 
157
)

Result Limit: An API Parameter of limit will limit the number of evaluated articles. Give the large number of duplicate articles published to aggregator-supplied websites, you may request up to 1000 results in a single query. Defaults to 100.

The source canonical URL was attributed in about 7% of all duplicate articles and resolved to be a source url_id of 42 (itself a value that can be queried). However, 93% of all articles had a self-pointing canonical, and in 29% of all these cases the canonical itself was an invalid internal URL. This is a mind-blowing and reckless violation of best-practice SEO, and the company behind distributing this article is responsible for the decimation of SEO best practice for those participating businesses. Those websites that have an invalid canonical URL will likely have their site identified as plagiaristic, and they all run the risk of removal from Search entirely.

All articles indexed above have a score higher than 80%, meaning that all the content will be recognised as a copy or plagiarism.

See the problem?

Any business caught up in the content marketing hype needs to apply a serious risk-benefit analysis to determine if their efforts are working contract to the expected outcome. Changing the title, a few words here and there, or adding a paragraph or two, will rarely make a marked difference.

Modifying Article Content with AI

We have always said that any content you choose to rewrite should be indistinguishable from the original, and if you are altering content, we suggest our own clients run the compare tools in Yabber to ensure sufficient differentiation exists. Anything with a score of greater than 20 should be modified.

A couple of the articles returned via the example Matrix response above were modified only slightly with AI, and the foul stench of an LLM was evident by way of the language, Unicode characters, and formatting. Search engines will penalise AI content, and it won't contribute in any way to those coveted principles of E+AT. A bad rewrite attempt can only considered to be a poor attempt at plagiarism and will damage your SEO presence and brand reputation.

Social Media and the Insights API

Duplicate content is punished in a similar way on popular social media services. We built an Insights API that interrogates and consumes information from most social services, resolves the content to a business, and creates a similar Social Rank. The same OCR-enabled and AI-supported system ingests advertising for the purpose of evaluating uniqueness and compliance.

Conclusion

Search success isn't earned by chance — it's engineered through careful content strategy, technical compliance, and a respect for the mechanisms that underpin how search engines evaluate authority and trust. The prevalence of duplicate content across the financial web, often unknowingly published through well-meaning article services, poses a significant risk to long-term SEO viability. If your business is relying on duplicated content — whether sourced from a service, a lender, or some syndicated feed — you’re not just stalling growth. You’re actively sabotaging your visibility.

Google’s algorithm doesn’t reward laziness. It rewards originality, authority, and genuine value. Every time you publish the same content as dozens of others, you're telling Google, "We have nothing unique to say." And Google responds in kind — by pushing you into digital obscurity.

Our Matrix platform exposes this duplication for exactly what it is: a systemic failure of content strategy across the industry. It quantifies it, measures it, and makes it impossible to ignore.

If you’re paying for content that isn’t exclusive to you, you’re paying to become invisible. If you're publishing without understanding the impact of duplication, you’re flying blind.

The tools we've built, such as Matrix and its suite of comparison endpoints, were designed to shine a spotlight on this problem by providing objective evidence of content overlap. They make it painfully clear that content duplication isn’t a minor issue — it's a widespread failure of basic SEO hygiene.

If you're relying on syndicated or duplicated articles without proper canonical attribution, you're not building trust with users or search engines — you're undermining it. And if you're paying for that content? You're paying for your own decline in visibility.

The fix isn’t complicated: write better, create video, write original, and write often. Create original content. Back it with data. Own your expertise. Or get left behind.

At the heart of it all, content must serve people before it serves platforms. When you craft truly original, engaging, educational, and entertaining material that represents your unique experience and authority, you build equity in your domain, trust in your brand, and visibility in search.

  Featured Image: The APA (Australian) Building, 49 Elizabeth Street, on the corner of Flinders Lane in Melbourne, Victoria, c1900. The APA Building was a skyscraper in Melbourne, Victoria, Australia; at 12 storeys and 53m to the tip of its corner spire, it became the Australia's tallest commercial building at the time of its completion in mid 1890 (and remained so for decades). Originally known as the Australian Building (and also known as the Australian Property Investment Co or API Building), it was demolished in 1980 to make way for a five storey concrete and glass office building. [ View Image ]

■ ■ ■

 
Download our complimentary 650-page guide on marketing for mortgage brokers. We'll show you exactly how we generate billions in volume for our clients.
Finance Guide, Cropped Top and Bottom
Timezone: E. AUSTRALIA STANDARD TIME · [ CHANGE ]

Related Articles:

Like this article?

Share on Facebook
Share on Twitter
Share on Linkdin
Share on Pinterest