- Blog
- nopCommerce SEO sitemap 2020 + How to avoid 100% CPU usage
nopCommerce SEO sitemap 2020 + How to avoid 100% CPU usage
- 8:18:02 AM
- Wednesday, July 1, 2020
Poblem
One of our customer recently became aware that his site had been hacked some of their product pages didn’t response.
They have taken steps to rectify the problem as below:
- Completely rebuilt the site
- Try to use Redis and IIS Farm to improve performance
It helped to solve an issue with product page opening but their CPU usage (at live website) has been at 100% causing site to load slowly. Analyzing this issue we discovered that it is google bots that are crawling the site extremely hard causing the issue, (we have tried to block them via firewall and when we do so the CPU usage drops and sites become responsive again), We obviously want the site to rank again, so would like google to crawl the site in more intelligent way.
Overview
A website has almost ~100 millions pages, ~25 mln. products and 4 languages. It's one of the biggest nopCommerce website and it uses Solr Search Plugin to support this amount of products.
Solution
We decided to help google to understand the last modification date for each product URL and generate sitemap dynamically once per day. We used Solr search plugin so we can easily generate product sitemap dynamically. Our extension also had to updates sitemaps every day and update them in the sitemap index file.
Also our APP has to follow next limits to create well-structure sitemap index and sitemaps:
Limits
- 50 000 per sitemap index
- 500 sitemap indexes
- 50 000 per sitemap
- 50 MB per sitemap
Google Search Console provides SEO specialist with information about how many URLs in each sitemap are indexed. It helps with monitoring of indexation.
Sitemap index
Here is the template of our sitemap index with two files:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="https://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://www.yourstore.com/sitemap1.xml</loc>
<lastmod>2020-01-05T12:00:00-02:00</lastmod>
</sitemap>
<sitemap>
<loc>https://www.yourstore.com/sitemap2.xml</loc>
<lastmod>2020-01-04T12:00:00-02:00</lastmod>
</sitemap>
</sitemapindex>
Alternate languages support
Our site has 4 languages so we link them by using “hreflang” attribute. It help to the search engine gets to know about our product pages language easily to show in the search results page of specific country or region.
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xhtml="http://www.w3.org/1999/xhtml">
<url>
<loc>http://www.yourstore.com/</loc>
<lastmod>2017-10-20T17:30:00-02:00</lastmod>
<xhtml:link
rel="alternate" hreflang="en-us"
href="http://www.yourstore.com/en/product1"
/>
<xhtml:link
rel="alternate" hreflang="de"
href="http://www.yourstore.com/de/product1"
/>
<xhtml:link
rel="alternate" hreflang="ru"
href="http://www.yourstore.com/ru/product1"
/>
</url>
</urlset>
Also our APP follows next Sitemap rules:
Additional Sitemap Rules
- UTF-8 encoding
- Entity escaping
Character | Escape | Code |
---|---|---|
Ampersand | & | & |
Double | " | " |
Single | ' | ' |
Less than | < | < |
Greater than | > | > |
- other non-ASCII characters escaping:
For example, the URL
https://www.yourstore.com/päge1 requires escaping of character ( ä ):
https://www.yourstore.com/p%C3%A4ge1
Robots.txt.
The last step of the sitemap implementation was to notify every search engine about our changes. We added the following string to robots.txt
Sitemap: https://yourstore.com/sitemap-index.xml
Results
- CPU usage drops till 16-30%
- Our marketers called to us and ask what did we do with site and why our PageRank significantly increased))).