XML Sitemaps - a guide for SEOs

Written by Tiago Silva. Updated on 30, May 2023

A sitemap is a document with a list of relevant links from a website. This article will focus on XML sitemaps (eXtensible Markup Language) as they are the most popular sitemap protocol used for search engine optimization.

This article makes up part of our Google Search Console tutorials and training section, make sure to check the others out.


What is an XML sitemap?

An XML sitemap is a document containing a list of all the URLs from a website that you want a search engine to regularly crawl. They are a complementary tool used to help search engines find pages and crawl them more efficiently. You can think of a sitemap as a roadmap with metadata (like the last updated date) pointing search engines to important pages.

A single sitemap file must be UTF-8 encoded, have a maximum size of 50,000 URLs, and 50MB uncompressed, whichever is the highest. 

Sitemaps can be compressed to gzip format. These limits prevent servers from becoming overwhelmed. If a Sitemap hits the limitations mentioned, you can create an XML sitemap index.

What is an XML sitemap Index?

XML sitemap Index is a file containing a list of multiple sitemaps. The limitations are the same: a sitemap Index cannot have more than 50,000 sitemaps and be smaller than 50MB.

It’s also possible to have multiple XML sitemap indexes and compress them using gzip format.

Sitemap effect on SEO

As I wrote above, sitemaps are important, but not required for SEO.

Here is what we know about sitemaps impact on SEO:

So while using XML sitemaps won’t guarantee indexing or good rankings, these are the benefits of using them:

  • Indicate to Google which pages you consider important and their canonical version
  • Tell Google when a page has been updated

You can also check the index coverage reports by sitemap. This allows you to identify crawling issues for specific parts of a site if your sitemaps are separated in such way.

What websites should use sitemaps?

According to Google, these are the types of websites in more need of a sitemap:

  • Large websites;
  • Websites with a large archive without a proper internal link structure;
  • New websites with few external links;
  • Websites with a lot of rich media (video and images) or showing in Google News.

Using the same Google help doc, these are the cases where a website might not need a sitemap:

  • Small websites (with less than 500 pages);
  • Websites with good internal linking;
  • Websites without rich media or news pages.

Note: A sitemap isn't a replacement for a good internal link structure.

How to create a sitemap

Creating an XML sitemap file is now easier than ever as most CMS’s, and website builders have dynamic sitemaps built-in.

Creating a sitemap using a WordPress plugin

Go to the WordPress plugin repository and look for a sitemap plugin. Most SEO plugins have this feature, so I'll use Rank Math in this example.

Install the plugin, go to the dashboard, and activate the sitemap feature to create an XML sitemap using Rank Math.

You can end the process here or go to "sitemap settings" to customize it with the different content types on the site, like images.

It's simple to make a sitemap in WordPress this way.

Screenshot of RankMath WordPress plugin

Creating a sitemap using a sitemap generator

There are many sitemaps generators, like Screaming Frog or SureOak, but I will show you how to build a one using XML-Sitemaps.com:

Step 1: Go to the homepage of XML-Sitemaps, enter your website URL, and click start. 

XML-Sitemaps, a tool to generate an XML sitemap file for your site

Step 2: It will start crawling your website. After a bit, it returns the sitemap details. Look at the preview, or download the file straight away. 

Sitemap generated by XML-Sitemaps

Step 3: Upload this XML file to your server. This last step will depend mostly on your server type.The final step is to upload this XML file to your server.

Tip: Use a sitemap validator after creating the XML file through a generator like in this example.

XML Sitemap Structure

A valid XML sitemap must follow the protocol and use the correct schema. In addition, they should have the required attributes and some optional ones. 

The sitemaps are primarily made for Robots and search engine crawlers, and here is what one looks like:

Screenshot of a sitemap.xml file.

The hierarchy of required attributes for a sitemap is the following:

  • XML header;
  • URLset;
  • URL;
  • Loc.

Let's now explain what each of these attributes is.

XML Header

The first line in a sitemap is the XML header.

<?xml version="1.0" encoding="utf-8"?>

This header informs the XML standard (1.0 in this example) and character encoding (UTF-8). XML sitemaps must be UTF-8.

URLset

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

The urlset attribute references the current sitemap standard used for the document (0.9 in this example).

Urlset attribute should is used as a pair, one after the header and the other at the end of the doc after all URLs and optional attributes.

URLs

The URL tag specifies the URL you want crawlers to use, it’s recommended to only list canonical versions of the links. 

The URL is a required attribute in the protocol and is the parent tag for every tag mentioned next in this list (loc, lastmod, changefreq, priority). 

<url>

<loc>https://www.moneysavingheroes.co.uk/peacocks</loc>

<lastmod>2020-11-26</lastmod>

</url>

Loc

The loc (aka location) is the last of the 3 mandatory attributes in a sitemap. This tag refers to the location of the URL.

Location tag URL must start with the protocol (i.e., HTTPS or HTTP), end with a trailing slash, and be less than 2,048 characters long.

Optional URL elements

Lastmod is the URL's last modification date and must use W3C Datetime format.

Changefreq is the expected frequency of changes likely to happen to a page. Accepted values are always, hourly, daily, weekly, monthly, yearly, and never.

Priority values can be between 0.0 and 1.0 and show search engines how important a page is for the website owner.

Sitemap Index Files Structure

An XML sitemap index must have the following elements:

  • XML header: informs the XML standard used and character encoding;
  • Sitemapindex: the element containing all the sitemaps in the file with the standard used, similar to what urlset represents in a single XML sitemap;
  • Sitemap: this tag contains the information from a single XML sitemap, like location and lastmod;
  • Loc: identifies the location of the XML sitemap;

Lastmod is an optional element of the XML sitemap index.

Also, important to mention that this file can only mention sitemaps on the same site as the sitemap index, so it won't be valid for subdomains.

Screenshot of an example sitemap index file

XML Sitemap Index structure example

<?xml version="1.0" encoding="UTF-8"?>

    <sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

        <sitemap>

            <loc>https://www.domain.com/sitemap-pages.xml

            </loc>

            <lastmod>2022-04-

            19T11:54:44.774Z

            </lastmod>

        </sitemap>

        <sitemap>

            <loc>https://www.domain.com/sitemap-posts.xml

            </loc>

            <lastmod>2022-04-25T18:42:55.769Z

            </lastmod>

        </sitemap>

    </sitemapindex>

Guidelines and best practices

Now let's focus on the best practices for sitemaps:

  • Consistent URLs: crawlers will use the exact URL in the sitemap, so make sure you use the same HTTP protocol, and sub-domain/root domain, for each URL. For example, don't use www and non-www versions of URLs;
  • Only canonical URLs: if a page has more than one version, only use the canonical version in the sitemap. For example, don’t add ecommerce product variation URLs;
  • Use sitemap Indexes: if your site has more than 50,000 URLs or has more than one sitemap, use a sitemap Index file;
  • Use hreflang when there are alternate language versions of a URL;
  • lastmod: include lastmod in the sitemap and make it consistent with the page's last updated date (use dynamic sitemaps to do this);
  • Dynamic sitemaps update when new content is published or updated andkeep the sitemap current;
  • Reference the sitemap in robots.txt: telling where your sitemap location is in the robots.txt file is helpful to every crawler;
  • Use sitemaps extensions: for websites with lots of rich media, consider using Video sitemap, Image sitemap, and Google News sitemap.

Submit sitemap on Google Search Console

The last recommendation is to submit the sitemaps in Google Search Console.

This process is straightforward:

  1. In the left sidebar, go to "sitemaps";
  2. Enter the sitemap URL below "Add a new sitemap" and press submit.

Google Search Console accepts both sitemaps and sitemap Indexes.

How to submit an XML sitemap file to Google Search Console.

FAQs

What Types of sitemaps formats does Google support?

Google supports the following sitemap formats:

What does Google pay attention to in a sitemap?

Google pays attention primarily to URLs and lastmod in some situations. Their documentation explicitly says that Google doesn't consider priority in a sitemap.

On the same documentation page, Google says that they will use the <lastmod> value "if it's consistently and verifiably (for example by comparing to the last modification of the page) accurate."

Further, John Mueller wrote in 2017 that "The URL + last modification date is what we care about for websearch.". In 2015 John also said that "Priority and change frequency doesn’t really play that much of a role with sitemaps anymore".

It's important to mention that a URL’s position in the sitemap doesn’t matter because Google doesn't crawl pages in order of appearance.

Common locations for sitemaps (where to find sitemaps)?

The most common sitemap location is in the root directory, for example: Domain.com/sitemap.xml.

You can put a sitemap anywhere on the site, but it will only affect the descendant directories. This is why it's recommended to put sitemaps in the root directory like in the example above.

Putting a sitemap on sub-folder like Domain.com/blog/sitemap.xml will only affect URLs at the blog directory (Domain.com/blog/), leaving out all the URLs of the root directory like Domain.com/about or Domain.com/service-1.

Robots.txt files usually mention the sitemap location, helping crawlers find them.

Do you need a sitemap to rank on Google?

No, you don't need a sitemap to rank on Google, and they aren't a ranking factor. But it’s better to use sitemaps because they don't hurt.

In John Mueller's opinion, sitemaps are "a minimal baseline for any serious website".

Google says the following in their documentation: "If your site's pages are properly linked, Google can usually discover most of your site. Proper linking means that all pages that you deem important can be reached through some form of navigation[...]Even so, a sitemap can improve the crawling of larger or more complex sites, or more specialized files."

Using a sitemap is your decision as they don't guarantee indexing. But sitemaps help search engines be more efficient with crawl budget. So, even if they aren't mandatory, they are helpful.