What is a sitemap?
A sitemap is a list of information about all the pages, videos, images, and other files on a website. This information is used by web crawlers and search engines to understand the structure of a website and the relationship between files and pages on it.
There are two main types of sitemaps - HTML sitemaps and XML sitemaps.
XML sitemaps are primarily aimed at search engines and their bots and web crawlers to give them a better overview of your website structure. This enables the crawlers to easily access all pages on your website as well as all relevant info about it. This is typically a URL to the page, the last updated date, and any language variants of the page. Knowing the structure of your website, it lets search engines crawl your website intelligently and makes sure that all pages on your website are in their search index. Having an XML sitemap for your website is considered an SEO best practice.
HTML sitemaps are normally made in an easy-to-read format and structure and are used to help users find what they are looking for. An HTML sitemap does not need to include all of your URLs/pages and should be made to help a user find what they are looking for if they become lost on your website. Many create an HTML template to help the user find the most important pages on their website.
Table of content
- What is an XML sitemap?
- Do I need an XML sitemap?
- What does an XML sitemap look like?
- How to create an XML sitemap
- Why submit an XML sitemap to Google?
- How do I submit an XML sitemap to Google?
- How do I find the sitemap of any website?
- What is an HTML sitemap?
- What does an HTML sitemap look like?
- XML vs. HTML sitemaps
- Creating a sitemap in Umbraco
What is an XML sitemap?
The Sitemaps protocol was introduced by Google but is supported by most major search engines such as Bing, and Yahoo. In an XML sitemap, it is possible to add extra info to a URL, that helps crawlers optimize how they go through your website.
This normally includes a time and date for the last time the page was modified, but could also include additional information such as the change frequency and the relative priority of a page compared to other pages. The additional information, change frequency and priority, is no longer used by Google. It doesn't hurt to include them but Google completely ignores them. Instead, Google relies solely on the last modified date.
Having an XML sitemap and submitting it to the major search engines is great for SEO and a common best practice among marketers. Having a sitemap - and submitting it to Google - is a practice that can only benefit your site and is never something you’ll be penalized for (Confirmed by Google).
If you are using a lot of images and videos you can create and submit specific sitemaps for these as well. This can help improve visibility in image and video searches, while also opening up the possibility to submit further info about these files. A video sitemap can include video running time, category, and age appropriateness rating, while an image sitemap can include subject matter, type, and license.
Do I need an XML Sitemap?
If all of your pages are properly linked, search engine crawlers will usually be able to discover all of your pages without a sitemap. But by adding additional information in a sitemap you help their crawlers increase their efficiency and thereby help them discover changes faster than they otherwise would. Search engine crawlers do not crawl every single page on your website every time they visit your website. If you don’t provide information on which pages are the most important to crawl, it will often take time for the changes to be discovered.
If you want to know more about how to optimize your website for search engine crawling and indexing, we've written a white paper about that.
While all websites should have a sitemap, websites that meet one of these 4 criteria will see the most significant improvements:
Websites that are really large
Every website has a limited crawl budget and with very large websites that means it can take a long time before the crawler comes by newly updated or created URLs. By providing a last modified date in a sitemap, you can ensure that the crawlers don’t overlook newly updated pages. If your sitemap is bigger than either 50MB or has more than 50.000 URLs then Google will not accept it. If that is the case you will have to split your sitemap into multiple sitemaps and upload them individually. Alternatively, you can make a sitemap index file with links to the individual sitemaps.
Your website has a lot of pages that are isolated or not well linked
If you have parts of your website that are not properly linked to on your website, then there’s a chance that search engines will overlook these pages and not put them in their index. This can be the case if you have old archives of content or orphan pages on your website.
This issue can also happen if you're relying on JavaScript to serve your content, like with a JAMstack website. Search engine crawlers are able to crawl content from JavaScript, it just takes longer because it needs to do an additional rendering step. In that case, it might not pick up links to certain pages, because it hasn't rendered those yet and put them in their index.
Your website is new and/or has few external backlinks
Search engine crawlers discover content on the internet by following links from one page to another. If your website has few external backlinks from other websites, your website might not be discovered at all by the search engines. By submitting a sitemap you give search engines like Google a blueprint of your website and makes it much easier for their crawlers to find your pages.
Your website uses rich media content in search engines
If your website uses Google News or other rich media content in search engines, then the additional information in your sitemap can be used to enhance your content.
What does an XML sitemap look like?
An XML sitemap must follow a strict structure if you want search engines like Google to use them. If your sitemap is not following the rules, then it won't be used, and it won't add any value to your website.
There are 3 formats that you can use to create your XML sitemap, which are all supported by Google:
- XML
- RSS, mRSS, and Atom 1.0
- Text
To find complete details on all 3 formats, and how you need to structure them, you should follow the official protocol from sitemaps.org.
Below is a description of the different tags of an XML sitemap, and what an XML sitemap looks like.
<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://www.example.com/</loc> <lastmod>2005-01-01</lastmod> <changefreq>monthly</changefreq> <priority>0.8</priority> </url> </urlset>
This is an example of the structure of an XML sitemap file. If you want to see what a real sitemap looks like, take a look at the Umbraco sitemap.
<?xml version="1.0" encoding="UTF-8"?>
This tag is optional.
It tells the search engines which XML version is used, and the encoding used.
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
This tag is mandatory.
It marks the beginning and the end of the file and references the protocol standard it follows. This will always be the same for all websites and sitemaps. As you can see in the example above, the tag must be closed at the very end of the document by adding </urlset>.
<url>
This tag is mandatory.
It is the parent tag of each URL entry, and all tags put underneath it are children of this tag. For every URL you have in the sitemap, you'll need to have one <url> tag that includes the necessary information about the URL. The tags you can add for an URL are these 4 (1 mandatory, 1 recommended, and 2 optional):
<loc>
This tag is mandatory.
This is the URL of the page, and it must be written out exactly as the server returns the URL. There are a few elements to be aware of here, that you need to ensure are correctly implemented:
- Protocol: The URL must start with the protocol, either https:// or http://.
- www. or non-www.: You must use the exact version that your website uses.
- Trailing slash: If your server returns URLs with a trailing slash at the end of a URL, you must include this as well. The example above uses a trailing slash at the end, but your website might not.
- Length: The URL must be less than 2.048 characters. If it's longer, it won't be processed.
<lastmod>
This tag is optional (but highly recommended).
This tag is used to signal the last time it was modified/updated. The date has to be in W3C Datetime format. By including it in your sitemap, you'll make it easier for Google and other search engines to see if the version they have in their index is outdated. Google keeps a timestamp of the last time a URL was crawled, and if that is older than the last modified date given in the sitemap, it'll increase the likelihood that Google crawls this page to fetch the latest changes and add them to their index.
Previously, you could influence this by using the two next tags, but Google ignores both of those (as per their Guidelines).
<changefreq>
This tag is optional.
It is used to give search engines an indication of how often the content changes, and thus how often they should crawl the URL. The valid values for it are:
- always
- hourly
- daily
- weekly
- monthly
- yearly
- never
This tag is not being used by Google anymore and they completely ignore it. This might vary across different search engines, but if you're only planning to submit the sitemap to Google, you shouldn't spend time on this tag.
<priority>
This tag is optional.
This tag is used to give a relative priority compared to other URLs on your website. The valid values go from 0.0 (lowest) to 1.0 (highest). The default priority is 0.5.
Again it is important to highlight that Google no longer uses this tag and it won't have any impact on how their crawl your website.
What about multilingual websites?
If your content exists in multiple languages, you can also include an <xhtml:link> tag with the relevant information in your sitemap for each of your URLs. This is an alternative way of doing it, instead of including it as HTML tags on the pages themselves.
You can find more information about it in Google's official documentation on localized versions.
How to create an XML sitemap
Creating an XML sitemap is possible with any SEO-friendly content management system that you might use. The exact steps to creating the sitemap vary depending on the system you use. The most common solution to creating an XML sitemap for your website is by using an extension for your CMS.
If you are not using a CMS for your website, or you don't want to use an extension, it is possible to develop your own as well.
No matter which solution you choose, it is often considered a one-off task to create a sitemap. Once a sitemap has been created, it is important that it automatically gets updated, so all changes you make to existing pages, and any new page you create, are reflected in the sitemap. This is standard functionality in extensions for creating sitemaps and will often work without you having to do any additional configuration.
If you want to have an idea of how it works, check out this step-by-step guide to creating an XML sitemap in Umbraco.
Where should I place my sitemap?
The simple answer is to always put it at the root of your website (ie. http://www.example.com/sitemap.xml). The file location matters, because the URLs submitted have to start with the same path as the sitemap.
In other words, if your sitemap is submitted in a subfolder (ie. http://www.example.com/subfolder/sitemap.xml) you can only add URLs that are part of the http://www.example.com/subfolder/ path. If you submit URLs that are either in a different folder (http://www.example.com/other-folder/), subdomain (http://subdomain.example.com/subfolder/), or a different protocol (https://www.example.com/subfolder/) it won't work.
There are no requirements for the file name or file extension (.xml), so feel free to call it what you want, as long as it is accessible and can be submitted to the search engines.
Why should I submit an XML sitemap to Google?
Submitting a sitemap to Google is not strictly necessary for your website to be found, as Google's crawlers are usually pretty good at discovering new pages, images, and videos. But why leave it solely in Google's hands, and hope they find your new content, when you can help instead?
By submitting an XML sitemap to Google, you'll make it much easier for Google's crawlers to find all the content on your website. On top of helping Google find all of your content, it's also a good way to get notified of any errors on your page, that might get picked up by Google. Once you have submitted your sitemap in Google Search Console, you will be able to access the Sitemap Coverage Report, which shows you errors and warnings for your sitemap. These could be URLs that result in a server error (5xx), Not found (404), or soft 404 HTTP status code. By getting these reports you'll be able to avoid these issues hurting your performance in the organic search results.
How do I submit an XML sitemap to Google?
If you want to submit your XML sitemap to Google you can do it in 3 different ways. The recommended method is the first one - submitting through Google Search Console - but if that does not work for you, then you can choose one of the other two.
Submit an XML sitemap through Google Search Console
If you have already verified your website through Google Search Console it is straightforward to submit your website. Once you’re logged in to your Search Console account and have chosen your website, you’ll need to navigate to “Sitemaps” which is found in the left-hand menu under “Index”.
Once on the page, you need to enter the sitemap URL and click “Submit”. That’s it - your sitemap will now be verified and if the format is correct it will update with a status of “Success”.
Submit an XML sitemap by using the robots.txt file
If you do not want to use the Google Search Console, then you can also submit your sitemap by adding it to your robots.txt file. To do that you need to specify the path to your sitemap by adding the following line anywhere in the robots.txt:
Sitemap: https://yourwebsite.com/sitemaplocation.xml
If you want to see what it looks like on a live website, you can take a look at umbraco.com/robots.txt.
Send an HTTP GET request to “ping” Google
The last option is to “ping” Google and ask their crawlers to crawl your website. This is done by sending an HTTP GET request:
https://www.google.com/ping?sitemap=https://yourwebsite.com/sitemaplocation.xml
How do I find the sitemap of any website?
Did you just start a new job, and want to see if your sitemap looks okay? Or maybe you work at an agency, and just got a new client on board?
Whatever the reason, there are different ways for you to find the sitemap. Since a sitemap doesn't have a fixed position like a robots.txt file, there are no guarantees that you can find it by using the below tips.
That said, sitemaps are a fairly standardized thing though, which is why it is usually possible to locate them using one of the 6 ways shown below. Some websites might choose to hide them to avoid competitors looking at their sitemap, but since there are no inherent security risks in a sitemap, it's rarely something that website owners spend time and resources on.
Let's go through the 6 ways that you can find a sitemap. The first 2 ways require that you have access to the website, while the last 4 are more generic ways to find a sitemap of any website.
Check if it has been submitted to Google Search Console
If you have access to a website, the first way to find the sitemap would be to check if it has been submitted to Google Search Console already.
Note that you must have access to the Google Search Console property, which might require you to verify it first. This can be done in several different ways, but if you have access to the website it is usually fairly straightforward (you can read about the options here).
Once you have access you can navigate to the "Sitemaps" section that is found under "Index" in the lefthand menu. If you see anything in the "Submitted sitemaps" you can click on it, and then have an "Open Sitemap" link appear in the top right corner. This will take you to the URL of the sitemap.
Check in the CMS backend
If the sitemap was not submitted in Google Search Console, the next step is to have a look in the backend of the website's content management system.
Where exactly to find it depends a lot on your CMS, and how it is structured. In the screenshot, you see an example of how it looks when you search for it in an Umbraco installation.
If you can't find it by searching for it, take a look at the different settings and plugins/extensions that you're using. Since it is typically used to improve SEO, you'll usually find it among other SEO-related settings.
Check the most common sitemap locations
While the two first ways required that you had access to the website, the next few tips do not.
Since sitemaps are typically standardized, this way of finding it is to simply try out some of the most common locations. While there's no guarantee that the sitemap is found there, it's a fast way to check if it's found at some of the most commonly used locations.
Here's a list of the most common sitemap locations:
Common sitemap locations
- /sitemap/
- /sitemap
- /sitemap.xml
- /sitemap1.xml
- /sitemap_index.xml
- /sitemap-index.xml
- /sitemapindex.xml
Check the robots.txt
It is often recommended to include a link to your sitemap in the robots.txt file for your website.
And luckily, there are strict rules for the placement of this file, so you will always be able to find it on any website that has one (not all website has one).
To find the robots.txt file for any website, all you have to do is go to the /robots.txt path, and you can see if it has a link to the sitemap. That's the case for our website, where you can find a link to our sitemap when you go to https://umbraco.com/robots.txt.
Use Google Search Operators
If you haven't had any luck finding the sitemap yet, it's time to take it to Google Search and use some search operators to see if you can find it.
An important note here is that a sitemap will only show up if it's indexable by Google, and has been added to their search index. This is typically not the case, as most sitemaps will be set as noindex, follow, or simply haven't been found by Google. An example is our sitemap, which you wouldn't be able to find using the search operators above.
Here's a list of different search operators you can try to use in different combinations (ie. site:example.com filetype:xml in one combined search):
Google Search Operators
- site:example.com
- filetype:xml or filetype:txt
- ext:xml or ext:txt
- inurl:sitemap
Look for other sitemap types (RSS, mRSS, Atom 1.0, or Text)
Still, no luck finding the sitemap? Then you can give it one last shot, by looking at different types of sitemaps.
Remember, that an XML sitemap is not the only format, and the website might be using a different format. If that's the case, look at the list of common sitemap locations, and combine it with different extensions.
Here's a list of possible paths you can look at:
Other types of sitemaps
- /sitemap.txt
- /sitemap1.txt
- /sitemap_index.txt
- /sitemap-index.txt
- /sitemapindex.txt
- /rss/
- /rss.xml
- /atom.xml
What is an HTML sitemap?
An HTML sitemap is simply put just another navigation element, where you can help the user find what they are looking for.
The sitemap is put in an HTML format to make it easier for a user to use and navigate. This should not be seen as a substitute for a search function or navigation elements, but as an added help to the user if they become lost on your website.
HTML sitemaps should not be submitted to search engines but should be included in your website navigation elements. A commonplace for an HTML sitemap is in a website footer. The HTML sitemap can help surface your most important pages and is especially useful if you have a deep URL structure, where some of your most important content is found deep in the natural website navigation.
What does an HTML sitemap look like?
An HTML sitemap looks the same as an XML sitemap in terms of content, but in a more readable format. Instead of presenting the URLs of a website in XML format, it is done in HTML to make it more user-friendly and readable.
An HTML sitemap has the same purpose as an XML sitemap - creating an overview of URLs on a website - but is aimed at actual human users instead of search engine crawlers. This difference is clear in how it's formatted, as the HTML sitemap is much cleaner and easier to read for a user than the XML format.
The purpose of having an HTML sitemap is to help users find pages they might otherwise have a hard time finding.
Today, it is no longer common to see websites use an HTML sitemap. Instead, most websites make use of extensive navigation items like the main navigation menu, sub-navigation menus, footers, and internal search engines. The closest you'll often get to an HTML sitemap on modern websites is overview pages on various sections of websites.
An example is the overview page we have created for our Knowledge Base. This is listed alphabetically and gives you links to all the topics we cover in a simple HTML format.
XML vs. HTML sitemaps - which should I use?
Luckily you don’t have to choose between the two, since they have two different purposes. When it comes to choosing a sitemap it is perfectly acceptable to choose both. While an XML sitemap is highly recommended, an HTML sitemap is more optional - but still recommended to have.
As explained above, the main difference is who the sitemaps are aimed at and who needs to understand/read the information.
In an XML sitemap, you do not need to think about the user, readability, and what information is useful to a user. Your only concern is the web crawlers and what information they need to get a better understanding of your website. That means information that is otherwise irrelevant for a user - like change frequency - is very important in an XML sitemap.
You should always have an XML sitemap and submit it to the major search engines. It is usually a core feature in your CMS and only has to be set up once.
An HTML sitemap on the other hand is aimed at a user trying to find something on your website. So does a priority help them find what they need? Probably not.
Instead, an HTML sitemap should be a simpler version of your XML sitemap, which is prettier and more user-friendly than the long XML list of pages. That means substituting the raw URL with a more descriptive title (also called anchor text) and maybe even adding a description or breadcrumbs to the links.
Creating an XML sitemap in Umbraco
If you are using Umbraco creating an XML sitemap is fairly easy. While there’s no out-of-the-box XML sitemap generator with Umbraco you can either choose to use a package or create one following this guide from the official Umbraco documentation.
Creating an HTML sitemap in Umbraco
If you want to create an HTML sitemap in Umbraco, that is also possible. How to do so, depends on the requirements you have, and how you want to style it.
Generally, you will need to create a new template for your HTML sitemap along with a new Document Type. Once you have those, you need to add the proper Razor code to it (there are many different ways to do this) and choose a layout to have proper styling.
Once you've done those things, you will be able to create a new page that can work as the HTML sitemap for your Umbraco site.