XML Sitemaps - a guide for SEOs

Written by Tiago Silva. Updated on 17, October 2023

A sitemap is a document with a list of relevant links from a website. This article will focus on XML sitemaps (eXtensible Markup Language), the most popular sitemap protocol used for search engine optimisation.

This article is part of our Google Search Console tutorials and training section. Make sure to check the others out.

What is an XML Sitemap?

An XML sitemap contains a list of all the URLs from a website you want a search engine to crawl regularly. They are a complementary tool used to help search engines find pages and crawl them more efficiently. Consider a sitemap as a roadmap with metadata (like the last updated date) pointing search engines to essential pages.

A single sitemap file must be UTF-8 encoded, have a maximum size of 50,000 URLs, and be 50MB uncompressed, whichever is the highest.

Sitemaps can be compressed to gzip format. These limits prevent servers from becoming overwhelmed. If a Sitemap hits the limitations mentioned, you can create an XML sitemap index.

Why are XML Sitemaps Important?

XML sitemaps can be considered the unsung heroes of the SEO world. They are often overlooked, but they are incredibly vital for a well-optimised website. I have been working on optimising websites for nearly a decade now, and I cannot stress enough how important a comprehensive, up-to-date XML sitemap can be.

Why are XML sitemaps important? Think of them as a roadmap for search engine crawlers like Googlebot. They make it easier for these crawlers to understand the structure of your website. Without a sitemap, crawlers may miss out on some of your newer pages that are more difficult to find.

Another advantage of XML sitemaps is prioritising certain pages over others. For example, if you have an ecommerce website with a vast product catalogue... By carefully structuring the XML sitemap and prioritising high-margin products, you could influence how often those pages are crawled and updated in a search engine's index.

Do you Need an XML Sitemap?

Whilst it is possible to run a website without having an XML sitemap in place, I would strongly recommend against it. Through years of optimising hundreds of websites, I have found that an XML sitemap is, essentially, an insurance policy for ensuring that search engines fully understand your site's structure. Especially if you have a large, complex, or rapidly changing site. A sitemap is almost a necessity!

A Quick Note on HTML Sitemaps

HTML sitemaps and XML sitemaps serve distinct functions and cater to different audiences. Although both are concerned with a website's navigational structure, an HTML sitemap is a user-facing page that lists important website pages, functioning as a table of contents for human visitors. An XML sitemap is intended only for search engine crawlers, facilitating the efficient indexing of a website's content.

Example of an HTML sitemap.

Image Credit: Elegant Themes

Having both types of sitemaps offers a few advantages. An HTML sitemap enhances user experience as it aids site navigation, which potentially reduces bounce rates and may improve engagement metrics. Whilst these factors may not directly affect search engine rankings, they contribute to a website's overall performance and health.

On the other hand, an XML sitemap plays a critical technical role by guiding search engine crawlers. Many content management systems automatically generate XML sitemaps, but manual optimisation is recommended if you have the capacity to do this, especially for larger or more complex websites.

What is an XML Sitemap Index?

XML Sitemap Index is a file containing a list of multiple sitemaps. The limitations are the same: a sitemap Index cannot have more than 50,000 sitemaps and be smaller than 50MB.

It's also possible to have multiple XML sitemap indexes and compress them using gzip format.

Sitemap Effect on SEO

As I wrote above, sitemaps are important but not required for SEO.

Here is what we know about a sitemap's impact on SEO:

So, while using XML sitemaps won't guarantee indexing or good rankings, these are the benefits of using them:

  • Indicate to Google which pages you consider important and their canonical version
  • Tell Google when a page has been updated

You can also check the index coverage reports by sitemap. This allows you to identify crawling issues for specific parts of a site if your sitemaps are separated in such a way.

What Websites Should Use Sitemaps?

According to Google, these are the types of websites in more need of a sitemap:

  • Large websites;
  • Websites with an extensive archive without a proper internal link structure;
  • New websites with few external links;
  • Websites with rich media (video and images) or showing in Google News.

Using the same Google help doc, these are the cases where a website might not need a sitemap:

  • Small websites (with less than 500 pages);
  • Websites with good internal linking;
  • Websites without rich media or news pages.

Note: A sitemap isn't a replacement for a good internal link structure.

How to Create a Sitemap

Creating an XML sitemap file is now easier than ever, as most CMSs and website builders have dynamic sitemaps built-in.

Creating a Sitemap Using a WordPress Plugin:

Go to the WordPress plugin repository and look for a sitemap plugin. Most SEO plugins have this feature, so I'll use Rank Math in this example.

Install the plugin, go to the dashboard, and activate the sitemap feature to create an XML sitemap using Rank Math.

You can end the process here or go to "sitemap settings" to customise it with the different content types on the site, like images.

It's simple to make a sitemap in WordPress this way.

Screenshot of RankMath WordPress plugin

Creating a Sitemap Using a Sitemap Generator

There are many sitemaps generators, like Screaming Frog or SureOak, but I will show you how to build one using XML-Sitemaps.com:

Step 1: Go to the homepage of XML-Sitemaps, enter your website URL, and click Start.

XML-Sitemaps, a tool to generate an XML sitemap file for your site.

Step 2: It will start crawling your website. After a bit, it returns the sitemap details. Look at the preview or download the file straight away.

Sitemap generated by XML-Sitemaps.

Step 3: Upload this XML file to your server. This last step will depend mostly on your server type. The final step is to upload this XML file to your server.

Tip: Use a sitemap validator after creating the XML file through a generator like in this example.

XML Sitemap Structure

A valid XML sitemap must follow the protocol and use the correct schema. In addition, they should have the required attributes and some optional ones.

The sitemaps are primarily made for Robots and search engine crawlers, and here is what one looks like:

Screenshot of a sitemap.xml file.

The hierarchy of required attributes for a sitemap is the following:

  • XML header;
  • URLset;
  • URL;
  • Loc.

Let's now explain what each of these attributes is.

XML Header

The first line in a sitemap is the XML header.

<?xml version="1.0" encoding="utf-8"?>

This header informs the XML standard (1.0 in this example) and character encoding (UTF-8). XML sitemaps must be UTF-8.

URLset

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

The URLset attribute references the current sitemap standard used for the document (0.9 in this example).

Urlset attribute should be used as a pair, one after the header and the other at the end of the doc after all URLs and optional attributes.

URLs

The URL tag specifies the URL you want crawlers to use. It's recommended only to list canonical versions of the links.

The URL is a required attribute in the protocol and is the parent tag for every tag mentioned next in this list (loc, lastmod, changefreq, priority).

<url>

<loc>https://www.moneysavingheroes.co.uk/peacocks</loc>

<lastmod>2020-11-26</lastmod>

</url>

Loc

The loc (aka location) is the last of the three mandatory attributes in a sitemap. This tag refers to the location of the URL.

Location tag URL must start with the protocol (i.e., HTTPS or HTTP), end with a trailing slash, and be less than 2,048 characters long.

Optional URL Elements

Lastmod is the URL's last modification date and must use W3C Datetime format.

Changefreq is the expected frequency of changes likely to happen to a page. Accepted values are always, hourly, daily, weekly, monthly, yearly, and never.

Priority values can be between 0.0 and 1.0, showing search engines how important a page is for the website owner.

Sitemap Index File Structure

An XML sitemap index must have the following elements:

  • XML header: informs the XML standard used and character encoding;
  • Sitemapindex: the element containing all the sitemaps in the file with the standard used, similar to what URLset represents in a single XML sitemap;
  • Sitemap: this tag contains the information from a single XML sitemap, like location and lastmod;
  • Loc: identifies the location of the XML sitemap;

Lastmod is an optional element of the XML sitemap index.

Also, it is important to mention that this file can only mention sitemaps on the same site as the sitemap index, so it won't be valid for subdomains.

Screenshot of an example sitemap index file.

XML Sitemap Index Structure Example

<?xml version="1.0" encoding="UTF-8"?>

<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

<sitemap>

<loc>https://www.domain.com/sitemap-pages.xml

</loc>

<lastmod>2022-04-

19T11:54:44.774Z

</lastmod>

</sitemap>

<sitemap>

<loc>https://www.domain.com/sitemap-posts.xml

</loc>

<lastmod>2022-04-25T18:42:55.769Z

</lastmod>

</sitemap>

</sitemapindex>

Guidelines and Best Practices

Now, let's focus on the best practices for sitemaps:

  • Consistent URLs: Crawlers will use the exact URL in the sitemap, so make sure you use the same HTTP protocol and sub-domain/root domain for each URL. For example, don't use www and non-www versions of URLs.
  • Only canonical URLs: If a page has more than one version, only use the canonical version in the sitemap. For example, don't add ecommerce product variation URLs.
  • Use sitemap Indexes: If your site has more than 50,000 URLs or has more than one sitemap, use a sitemap Index file.
  • Use hreflang when there are alternate language versions of a URL.
  • lastmod: Include lastmod in the sitemap and make it consistent with the page's last updated date. Use dynamic sitemaps to do this.
    • Dynamic sitemaps update when new content is published or updated and keep the sitemap current.
  • Reference the sitemap in robots.txt: Telling where your sitemap location is in the robots.txt file is helpful to every crawler.
  • Use sitemap extensions: For websites with rich media, consider using Video sitemap, Image sitemap, and Google News sitemap.

Submit Sitemaps to Google Search Console

The last recommendation is to submit the sitemaps in Google Search Console.

This process is straightforward:

  1. In the left sidebar, go to "sitemaps";
  2. Enter the sitemap URL below "Add a new sitemap" and press submit.

Google Search Console accepts both sitemaps and sitemap Indexes.

How to submit an XML sitemap file to Google Search Console.

Frequently Asked Questions

What types of sitemap formats does Google support?

Google supports the following sitemap formats:

What does Google pay attention to within a sitemap?

Google pays attention primarily to URLs and lastmod in some situations. Their documentation explicitly says that Google doesn't consider priority in a sitemap.

On the same documentation page, Google says that they will use the <lastmod> value "if it's consistently and verifiably (for example, by comparing to the last modification of the page) accurate."

Further, John Mueller wrote in 2017, "The URL + last modification date is what we care about for web search.". In 2015, John also said, "Priority and change frequency doesn't really play that much of a role with sitemaps anymore".

It's important to mention that a URL's position in the sitemap doesn't matter because Google doesn't crawl pages in order of appearance.

What are some common sitemap file locations?

The most common sitemap location is in the root directory: Domain.com/sitemap.xml.

You can put a sitemap anywhere on the site, but it only affects the descendant directories. This is why putting sitemaps in the root directory is recommended, like in the example above.

Placing a sitemap on a sub-folder like Domain.com/blog/sitemap.xml will only affect URLs at the blog directory (Domain.com/blog/), leaving out all the URLs of the root directory like Domain.com/about or Domain.com/service-1.

Robots.txt files usually mention the sitemap location, helping crawlers find them.

Do you need a sitemap (or sitemaps) to rank on Google?

No, you don't need a sitemap to rank on Google, and they aren't a ranking factor. But it's better to use sitemaps because they don't hurt.

In John Mueller's opinion, sitemaps are "a minimal baseline for any serious website".

Google says the following in their documentation: "If your site's pages are properly linked, Google can usually discover most of your site. Proper linking means that all pages you deem important can be reached through navigation[...]Even so, a sitemap can improve the crawling of larger or more complex sites or more specialised files."

Using a sitemap is your decision, as they don't guarantee indexing. But sitemaps help search engines be more efficient with crawl budget. So, even if they aren't mandatory, they are helpful.

In my SEO career, I have often found that XML sitemaps are the using heroes of website optimisation. They serve as a roadmap to guide search engines through your site, ensuring that crawlers don't miss on any of your important pages. While they may not directly affect your rankings, they facilitate efficient crawling and indexing, which is crucial for SEO performance.

If you're running a large, complex, or frequently updated website, not having an XML sitemap is like setting off on a road trip without a map. You are likely to miss some turns!

Given how simple they are to create and maintain, especially with today's CMS platforms and tools, there's no goo reason to go without one. So do yourself and your website a favour: Create an XML sitemap, submit it to GSC, and keep it updated.

Want to make better use of your search console data? Find hidden optimisation opportunities? Or make SEO testing easier for your business? Give SEOTesting a try! We're currently running a 14-day free trial, with no credit card required for sign-up. So sign up today and see how SEOTesting can improve your SEO.