What is a sitemap?
A sitemap is a list of information about all the pages, videos, images and other files on a website. This information is used by web crawlers and search engines to understand the structure of a website and the relationship between files and pages on it.
There are two main types of sitemaps - HTML sitemaps and XML sitemaps.
XML sitemaps is primarily aimed at search engines and their bots and web crawlers to give them a better overview of your website structure. This enables the crawlers to easily access all pages on your website as well as all relevant info about it. This is typically a URL to the page, last updated date and any language variants of the page. By knowing the structure of your website, it lets search engines crawl your website intelligently and makes sure that all pages on your website is in their search index.
HTML sitemaps are normally made in an easy-to-read format and structure and is used to help users find what they are looking for. An HTML sitemap does not need to include all of your URLs/pages and should be made to help a user find what they are looking for, if they become lost on your website. Many create an HTML template to help the user find the most important pages on their website.
- What is an XML sitemap?
- What is an HTML sitemap?
- XML vs. HTML sitemaps
- Creating an XML sitemap in Umbraco
What is an XML sitemap?
The Sitemaps protocol was introduced by Google, but is supported by most major search engines such as Bing, Yahoo and Ask. In an XML sitemap it is possible to add extra info to a URL, that helps crawlers optimize how they go through your website.
This normally includes a time and date for the last time the page was modified, but could also include additional information such as the change frequency and the relative priority of a page compared to other pages.
These factors help search engines prioritize which pages to crawl and how frequently they need to crawl them again. If you have pages on your website that are important and change frequently, these attributes can be used to ensure Google crawls them more often, so the newest version is represented in their index.
Having an XML sitemap and submitting it to the major search engines is great for SEO and a common best practice among marketers. Having a sitemap - and submitting it to Google - is a practice that can only benefit your site and never something you’ll be penalized for (Confirmed by Google).
If you are using a lot of images and videos you can create and submit specific sitemaps for these as well. This can help improve visibility in image and video searches, while also opening up the possibility to submit further info about these files than you can with pages. A video sitemap can can include video running time, category and age appropriateness rating, while an image sitemap can include subject matter, type and license.
Do I need an XML Sitemap?
If all of your pages are properly linked, search engine crawlers will usually be able to discover all of your pages without a sitemap. But by adding additional information in a sitemap you help their crawlers increase their efficiency and thereby help them discover changes faster than they otherwise would. Search engine crawlers do not crawl every single page on your website every time they visit your website. If you don’t provide information on which pages are the most important to crawl, it will often take time for the changes to be discovered.
While all websites should have a sitemap, websites that meet one of these 4 criteria will see significant improvements:
Websites that are really large
Every website has a limited crawl budget and with very large websites that means it can take a long time before the crawler comes by newly updated or created URLs. By providing a last modified date in a sitemap, you can ensure that the crawlers don’t overlook newly updated pages. If you sitemap is bigger than either 50MB or has more than 50.000 URLs then Google will not accept it. If that is the case you will have to split your sitemap into multiple sitemaps and upload them individually. Alternatively you can make a sitemap index file with links to the individual sitemaps.
Your website has a lot of pages that are isolated or not well linked
If you have parts of your website that is not properly linked to on your website, then there’s a chance that search engines will overlook these pages and not put them in their index. This can be the case if you have old archives of content or orphan pages on your website.
Your website is new and/or has few external backlinks
Search engine crawlers discover content on the internet by following links from one page to another. If your website has few external backlinks from other websites, your website might not be discovered at all by the search engines.
Your website use rich media content in search engines
If your website uses Google News or other rich media content in search engines, then the additional information in your sitemap can be used to enhance your content.
Submitting an XML sitemap to Google
If you want to submit your XML sitemap to Google you can do it in 3 different ways. The recommended method is the first one - submitting through Google Search Console - but if that does not work for you, then you can choose one of the other two.
Submit sitemap through Google Search Console
If you have already verified your website through Google Search Console it is straightforward to submit your website. Once you’re logged in to your Search Console account and have chosen your website, you’ll need to navigate to “Sitemaps” which is found in the left-hand menu under “Index”.
Once on the page you need to enter the sitemap URL and click “Submit”. That’s it - your sitemap will now be verified and if the format is correct it will update with a status of “Success”.
Submit sitemap by using robots.txt file
If you do not want to use the Google Search Console, then you can also submit your sitemap by adding it to your robots.txt file. To do that you need to specify the path to your sitemap by adding the following line anywhere in the robots.txt:
Send an HTTP GET request to “ping” Google
The last option is to “ping” Google and ask their crawlers to crawl your website. This is done by sending an HTTP GET request:
What is an HTML sitemap?
An HTML sitemap is simply put just another navigation element, where you can help the user find what they are looking for.
The sitemap is put in an HTML format to make it easier for a user to use and navigate. This should not be seen as a substitute for a search function or navigation elements, but an added help to the user if they become lost on your website.
HTML sitemaps should not be submitted to search engines, but should be included in your website navigation elements. A common place for an HTML sitemap is in a website footer. The HTML sitemap can help surface your most important pages and is especially useful if you have a deep URL structure, where some of your most important content is found deep in the natural website navigation.
XML vs. HTML sitemaps - which should you use?
Luckily you don’t have to choose between the two, since they have two different purposes. When it comes to choosing a sitemap it is perfectly acceptable to choose both. While an XML sitemap is highly recommended, an HTML sitemap is more optional - but still recommended to have.
As explained above, the main difference is who the sitemaps are aimed at and who needs to understand/read the information.
In an XML sitemap you do not need to think about the user, readability and what information is useful to a user. Your only concern is the web crawlers and what information they need to get a better understanding of your website. That means information that is otherwise irrelevant for a user - like change frequency - is very important in an XML sitemap.
You should always have an XML sitemap and submit it to the major search engines. It is usually a core feature in your CMS and only has to be set up once.
An HTML sitemap on the other hand is aimed at a user trying to find something on your website. So does a priority help them find what they need? Probably not.
Instead an HTML sitemap should be a simpler version of your XML sitemap, which is prettier and more user friendly than the long XML list of pages. That means substituting the raw URL with a more descriptive title (also called anchor text) and maybe even adding a description or breadcrumbs to the links.
Creating an XML sitemap in Umbraco
If you are using Umbraco creating an XML sitemap is fairly easy. While there’s no out of the box XML sitemap generator with Umbraco you can either choose to use a package or create one following this guide from the official Umbraco documentation.