What is a sitemap?

A sitemap is a list of information about all the pages, videos, images, and other files on a website. This information is used by web crawlers and search engines to understand the structure of a website and the relationship between files and pages on it.

There are two main types of sitemaps - HTML sitemaps and XML sitemaps.

XML sitemaps are primarily aimed at search engines and their bots and web crawlers to give them a better overview of your website structure. This enables the crawlers to easily access all pages on your website as well as all relevant info about it. This is typically a URL to the page, the last updated date, and any language variants of the page. By knowing the structure of your website, it lets search engines crawl your website intelligently and makes sure that all pages on your website are in their search index.

HTML sitemaps are normally made in an easy-to-read format and structure and are used to help users find what they are looking for. An HTML sitemap does not need to include all of your URLs/pages and should be made to help a user find what they are looking for if they become lost on your website. Many create an HTML template to help the user find the most important pages on their website. 

What is an XML sitemap?

The Sitemaps protocol was introduced by Google but is supported by most major search engines such as Bing, Yahoo, and Ask. In an XML sitemap, it is possible to add extra info to a URL, that helps crawlers optimize how they go through your website.

This normally includes a time and date for the last time the page was modified, but could also include additional information such as the change frequency and the relative priority of a page compared to other pages. The additional information, change frequency and priority, is no longer used by Google. It doesn't hurt to include them but Google completely ignores them. Instead, Google relies solely on the last modified date.

Having an XML sitemap and submitting it to the major search engines is great for SEO and a common best practice among marketers. Having a sitemap - and submitting it to Google - is a practice that can only benefit your site and never something you’ll be penalized for (Confirmed by Google).

If you are using a lot of images and videos you can create and submit specific sitemaps for these as well. This can help improve visibility in image and video searches, while also opening up the possibility to submit further info about these files. A video sitemap can include video running time, category, and age appropriateness rating, while an image sitemap can include subject matter, type, and license.

Headless arrows icon

Do I need an XML Sitemap?

If all of your pages are properly linked, search engine crawlers will usually be able to discover all of your pages without a sitemap. But by adding additional information in a sitemap you help their crawlers increase their efficiency and thereby help them discover changes faster than they otherwise would. Search engine crawlers do not crawl every single page on your website every time they visit your website. If you don’t provide information on which pages are the most important to crawl, it will often take time for the changes to be discovered.

While all websites should have a sitemap, websites that meet one of these 4 criteria will see the most significant improvements:

 

Websites that are really large

Every website has a limited crawl budget and with very large websites that means it can take a long time before the crawler comes by newly updated or created URLs. By providing a last modified date in a sitemap, you can ensure that the crawlers don’t overlook newly updated pages. If your sitemap is bigger than either 50MB or has more than 50.000 URLs then Google will not accept it. If that is the case you will have to split your sitemap into multiple sitemaps and upload them individually. Alternatively, you can make a sitemap index file with links to the individual sitemaps.

 

Your website has a lot of pages that are isolated or not well linked

If you have parts of your website that are not properly linked to on your website, then there’s a chance that search engines will overlook these pages and not put them in their index. This can be the case if you have old archives of content or orphan pages on your website.

 

Your website is new and/or has few external backlinks

Search engine crawlers discover content on the internet by following links from one page to another. If your website has few external backlinks from other websites, your website might not be discovered at all by the search engines.

 

Your website uses rich media content in search engines

If your website uses Google News or other rich media content in search engines, then the additional information in your sitemap can be used to enhance your content.

What should a sitemap look like?

A sitemap must follow a strict structure if you want search engines like Google to use them. If they're not following the rules, then it won't be used, and it won't add any value to your website.

There are 3 formats that you can use to create your sitemap, which are all supported by Google:

  1. XML
  2. RSS, mRSS, and Atom 1.0
  3. Text

To find complete details on all 3 formats, and how you need to structure them, you should follow the official protocol from sitemaps.org.

Below is a description of the different tags of an XML sitemap, and how it looks like.

 

<?xml version="1.0" encoding="UTF-8"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"><url><loc>http://www.example.com/</loc><lastmod>2005-01-01</lastmod><changefreq>monthly</changefreq><priority>0.8</priority></url></urlset> 

This is an example of the structure of an XML sitemap file. If you want to see how a real sitemap looks like, take a look at the Umbraco sitemap.

 

<?xml version="1.0" encoding="UTF-8"?>

This tag is optional.

It tells the search engines which XML version is used, and the encoding used.

 

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

This tag is mandatory.

It marks the beginning and the end of the file and references the protocol standard it follows. This will always be the same for all websites and sitemaps.

 

<url>

This tag is mandatory.

It is the parent tag of each URL entry, and all tags put underneath it are children of this tag.

 

<loc>

This tag is mandatory.

This is the URL of the page, and it must be written out exactly as the server returns the URL. There are a few elements to be aware of here, that you need to ensure is correctly implemented:

  1. Protocol: The URL must start with the protocol, either https:// or http://.
  2. www. or non-www.: You must use the exact version that your website uses. 
  3. Trailing slash: If your server returns URLs with a trailing slash at the end of a URL, you must include this as well. The example above uses a trailing slash at the end, but your website might not.
  4. Length: The URL must be less than 2.048 characters. If it's longer, it won't be processed.

 

<lastmod>

This tag is optional (but highly recommended).

This tag is used to signal the last time it was modified/updated. The date has to be in W3C Datetime format. By including it in your sitemap, you'll make it easier for Google and other search engines to see if the version they have in their index is outdated. Google keeps a timestamp of the last time a URL was crawled, and if that is older than the last modified date given in the sitemap, it'll increase the likelihood that Google crawls this page to fetch the latest changes and add them to their index.

Previously, you could influence this by using the two next tags, but Google ignores both of those (as per their Guidelines).

 

<changefreq>

This tag is optional.

It is used to give search engines an indication of how often the content changes, and thus how often they should crawl the URL. The valid values for it are:

  • always
  • hourly
  • daily
  • weekly
  • monthly
  • yearly
  • never

As mentioned, this tag is not being used by Google anymore, as they completely ignore it. This might vary, but if you're only planning to submit the sitemap to Google, you shouldn't spend time on this tag.

 

<priority>

This tag is optional.

This tag is used to give a relative priority compared to other URLs on your website. The valid values go from 0.0 (lowest) to 1.0 (highest). The default priority is 0.5.

Again it is important to highlight that Google no longer uses this. 

 

What about multilingual websites?

If your content exists in multiple languages, you can also include an <xhtml:link> tag with the relevant information in your sitemap for each of your URLs. This is an alternative way of doing it, instead of including it as HTML tags on the pages themselves.

You can find more information about it in Google's official documentation on localized versions.

Where should I place my sitemap?

The simple answer is to always put it at the root of your website (ie. http://www.example.com/sitemap.xml). The file location matters, because the URLs submitted have to start with the same path as the sitemap.

In other words, if your sitemap is submitted in a subfolder (ie. http://www.example.com/subfolder/sitemap.xml) you can only add URLs that are part of the http://www.example.com/subfolder/ path. If you submit URLs that are either in a different folder (http://www.example.com/other-folder/), subdomain (http://subdomain.example.com/subfolder/), or a different protocol (https://www.example.com/subfolder/) it won't work.

There are no requirements for the file name or file extension (.xml), so feel free to call it what you want, as long as it is accessible and can be submitted to the search engines.

Why should I submit an XML sitemap to Google?

Submitting a sitemap to Google is not strictly necessary for your website to be found, as Google's crawlers are usually pretty good at discovering new pages, images, and videos. But why leave it in Google's hands, and hope they find your new content, when you can help instead?

By submitting an XML sitemap to Google, you'll make it much easier for Google's crawlers to find all the content on your website. On top of helping Google find all of your content, it's also a good way to get notified of any errors on your page, that might get picked up by Google. Once you have submitted your sitemap in Google Search Console, you will be able to access the Sitemap Coverage Report, which shows you errors and warnings for your sitemap. These could be URLs that result in a server error (5xx), Not found (404), or soft 404 HTTP status code. By getting these reports you'll be able to avoid these issues hurting your performance in the organic search results.

How do I submit an XML sitemap to Google?

If you want to submit your XML sitemap to Google you can do it in 3 different ways. The recommended method is the first one - submitting through Google Search Console - but if that does not work for you, then you can choose one of the other two.

 

Submit sitemap through Google Search Console

If you have already verified your website through Google Search Console it is straightforward to submit your website. Once you’re logged in to your Search Console account and have chosen your website, you’ll need to navigate to “Sitemaps” which is found in the left-hand menu under “Index”.

Once on the page, you need to enter the sitemap URL and click “Submit”. That’s it - your sitemap will now be verified and if the format is correct it will update with a status of “Success”.

Google Search console submit sitemap

Submit sitemap by using robots.txt file

If you do not want to use the Google Search Console, then you can also submit your sitemap by adding it to your robots.txt file. To do that you need to specify the path to your sitemap by adding the following line anywhere in the robots.txt:

Sitemap: https://yourwebsite.com/sitemaplocation.xml

If you want to see what it looks like on a live website, you can take a look at umbraco.com/robots.txt.

 

Send an HTTP GET request to “ping” Google

The last option is to “ping” Google and ask their crawlers to crawl your website. This is done by sending an HTTP GET request:

https://www.google.com/ping?sitemap=https://yourwebsite.com/sitemaplocation.xml

How do I find the sitemap of any website?

Did you just start a new job, and want to see if your sitemap looks okay? Or maybe you work at an agency, and just got a new client onboard?

Whatever the reason, there are different ways for you to find the sitemap, if you can't find it. Since a sitemap doesn't have a fixed position like a robots.txt file, there are no guarantees that you can find it by using the below tips.

Sitemaps are a fairly standardized thing though, which is why it is usually possible to locate them using one of the 6 ways shown below. Some websites might choose to hide them to avoid competitors looking at their sitemap, but since there are no inherent security risks in a sitemap, it's rarely something that website owners spend time and resources on.

Let's go through the 6 ways that you can find a sitemap. The first 2 ways require that you have access to the website, while the last 4 are more generic ways to find a sitemap of any website.

Browser with Umbraco logo icon

Check if it has been submitted in Google Search Console

If you have access to a website, the first way to find the sitemap would be to check if it has been submitted to Google Search Console already.

Note that you must have access to the Google Search Console property, which might require you to verify it first. This can be done in several different ways, but if you have access to the website it is usually fairly straightforward (you can read about the options here).

Once you have access you can navigate to the "Sitemaps" section that is found under "Index" in the lefthand menu. If you see anything in the "Submitted sitemaps" you can click on it, and then have an "Open Sitemap" link appear in the top right corner. This will take you to the URL of the sitemap.

Google Search console submit sitemap
Search for Sitemap in Umbraco Backoffice

Check in the CMS backend

If the sitemap was not submitted in Google Search Console, the next step is to have a look in the backend of the website's content management system.

Where exactly to find it depends a lot on your CMS, and how it is structured. In the screenshot, you see an example of how it looks when you search for it in an Umbraco installation. 

If you can't find it by searching for it, take a look at the different settings and plugins/extensions that you're using. Since it is typically used to improve SEO, you'll usually find it among other SEO-related settings.

Check the most common sitemap locations

While the two first ways required that you had access to the website, the next few tips do not.

Since sitemaps are typically standardized, this way of finding it is to simply try out some of the most common locations. While there's no guarantee that the sitemap is found there, it's a fast way to check if it's found at some of the most commonly used locations.

Here's a list of the most common sitemap locations:

Common sitemap locations

  • /sitemap/
  • /sitemap
  • /sitemap.xml
  • /sitemap1.xml
  • /sitemap_index.xml
  • /sitemap-index.xml
  • /sitemapindex.xml
Robots.txt file on umbraco.com

Check the robots.txt

It is often recommended to include a link to your sitemap in the robots.txt file for your website. 

And luckily, there are strict rules for the placement of this file, so you will always be able to find it for any website that has one (not all website has one).

To find the robots.txt file for any website, all you have to do is go to the /robots.txt path, and you can see if it has a link to the sitemap. That's the case for our website, where you can find a link to our sitemap when you go to https://umbraco.com/robots.txt.

 

Use Google Search Operators

If you haven't had any luck finding the sitemap yet, it's time to take it to Google Search and use some search operators to see if you can find it.

An important note here is that a sitemap will only show up if it's indexable by Google, and has been added to their search index. This is typically not the case, as most sitemaps will be set as noindex, follow, or simply haven't been found by Google. An example is our sitemap, which you wouldn't be able to find using the search operators above.

Here's a list of different search operators you can try to use in different combinations (ie. site:example.com filetype:xml in one combined search):

Google Search Operators

  • site:example.com
  • filetype:xml or filetype:txt
  • ext:xml or ext:txt
  • inurl:sitemap

Look for other sitemap types (RSS, mRSS, Atom 1.0, or Text)

Still no luck finding the sitemap? Then you can give it one last shot, by looking at different types of sitemaps.

Remember, that an XML sitemap is not the only format, and the website might be using a different format. If that's the case, look at the list of common sitemap locations, and combine it with different extensions. 

Here's a list of possible paths you can look at:

Other types of sitemaps

  • /sitemap.txt
  • /sitemap1.txt
  • /sitemap_index.txt
  • /sitemap-index.txt
  • /sitemapindex.txt
  • /rss/
  • /rss.xml
  • /atom.xml

What is an HTML sitemap?

An HTML sitemap is simply put just another navigation element, where you can help the user find what they are looking for.

The sitemap is put in an HTML format to make it easier for a user to use and navigate. This should not be seen as a substitute for a search function or navigation elements, but an added help to the user if they become lost on your website.

HTML sitemaps should not be submitted to search engines but should be included in your website navigation elements. A commonplace for an HTML sitemap is in a website footer. The HTML sitemap can help surface your most important pages and is especially useful if you have a deep URL structure, where some of your most important content is found deep in the natural website navigation.

Sheet of paper icon

XML vs. HTML sitemaps - which should I use?

Luckily you don’t have to choose between the two, since they have two different purposes. When it comes to choosing a sitemap it is perfectly acceptable to choose both. While an XML sitemap is highly recommended, an HTML sitemap is more optional - but still recommended to have.

As explained above, the main difference is who the sitemaps are aimed at and who needs to understand/read the information.

In an XML sitemap, you do not need to think about the user, readability, and what information is useful to a user. Your only concern is the web crawlers and what information they need to get a better understanding of your website. That means information that is otherwise irrelevant for a user - like change frequency - is very important in an XML sitemap.

You should always have an XML sitemap and submit it to the major search engines. It is usually a core feature in your CMS and only has to be set up once.

An HTML sitemap on the other hand is aimed at a user trying to find something on your website. So does a priority help them find what they need? Probably not.

Instead, an HTML sitemap should be a simpler version of your XML sitemap, which is prettier and more user-friendly than the long XML list of pages. That means substituting the raw URL with a more descriptive title (also called anchor text) and maybe even adding a description or breadcrumbs to the links.

Creating an XML sitemap in Umbraco

If you are using Umbraco creating an XML sitemap is fairly easy. While there’s no out-of-the-box XML sitemap generator with Umbraco you can either choose to use a package or create one following this guide from the official Umbraco documentation

 

Creating an HTML sitemap in Umbraco

If you want to create an HTML sitemap in Umbraco, that is also possible. How to do so, depends on the requirements you have, and how you want to style it.

Generally, you will need to create a new template for your HTML sitemap along with a new Document Type. Once you have those, you need to add the proper Razor code to it (there are many different ways to do this) and choose a layout to have proper styling. 

Once you've done those things, you will be able to create a new page that can work as the HTML sitemap for your Umbraco site.

Loved by developers, used by thousands around the world!

One of the biggest benefits of using Umbraco is that we have the friendliest Open Source community on this planet. A community that's incredibly pro-active, extremely talented and helpful.

If you get an idea for something you would like to build in Umbraco, chances are that someone has already built it. And if you have a question, are looking for documentation or need friendly advice, go ahead and ask the Umbraco community on Our.

Want to be updated on everything Umbraco?

Sign up for the Umbraco newsletter and get the latest news and special offers sent directly to your inbox