Written by Tiago Silva. Updated on 30, May 2023
A sitemap is a document with a list of relevant links from a website. This article will focus on XML sitemaps (eXtensible Markup Language) as they are the most popular sitemap protocol used for search engine optimization.
This article makes up part of our Google Search Console tutorials and training section, make sure to check the others out.
An XML sitemap is a document containing a list of all the URLs from a website that you want a search engine to regularly crawl. They are a complementary tool used to help search engines find pages and crawl them more efficiently. You can think of a sitemap as a roadmap with metadata (like the last updated date) pointing search engines to important pages.
A single sitemap file must be UTF-8 encoded, have a maximum size of 50,000 URLs, and 50MB uncompressed, whichever is the highest.
Sitemaps can be compressed to gzip format. These limits prevent servers from becoming overwhelmed. If a Sitemap hits the limitations mentioned, you can create an XML sitemap index.
XML sitemap Index is a file containing a list of multiple sitemaps. The limitations are the same: a sitemap Index cannot have more than 50,000 sitemaps and be smaller than 50MB.
It’s also possible to have multiple XML sitemap indexes and compress them using gzip format.
As I wrote above, sitemaps are important, but not required for SEO.
Here is what we know about sitemaps impact on SEO:
So while using XML sitemaps won’t guarantee indexing or good rankings, these are the benefits of using them:
You can also check the index coverage reports by sitemap. This allows you to identify crawling issues for specific parts of a site if your sitemaps are separated in such way.
According to Google, these are the types of websites in more need of a sitemap:
Using the same Google help doc, these are the cases where a website might not need a sitemap:
Note: A sitemap isn't a replacement for a good internal link structure.
Creating an XML sitemap file is now easier than ever as most CMS’s, and website builders have dynamic sitemaps built-in.
Go to the WordPress plugin repository and look for a sitemap plugin. Most SEO plugins have this feature, so I'll use Rank Math in this example.
Install the plugin, go to the dashboard, and activate the sitemap feature to create an XML sitemap using Rank Math.
You can end the process here or go to "sitemap settings" to customize it with the different content types on the site, like images.
It's simple to make a sitemap in WordPress this way.
There are many sitemaps generators, like Screaming Frog or SureOak, but I will show you how to build a one using XML-Sitemaps.com:
Step 1: Go to the homepage of XML-Sitemaps, enter your website URL, and click start.
Step 2: It will start crawling your website. After a bit, it returns the sitemap details. Look at the preview, or download the file straight away.
Step 3: Upload this XML file to your server. This last step will depend mostly on your server type.The final step is to upload this XML file to your server.
Tip: Use a sitemap validator after creating the XML file through a generator like in this example.
A valid XML sitemap must follow the protocol and use the correct schema. In addition, they should have the required attributes and some optional ones.
The sitemaps are primarily made for Robots and search engine crawlers, and here is what one looks like:
The hierarchy of required attributes for a sitemap is the following:
Let's now explain what each of these attributes is.
The first line in a sitemap is the XML header.
<?xml version="1.0" encoding="utf-8"?>
This header informs the XML standard (1.0 in this example) and character encoding (UTF-8). XML sitemaps must be UTF-8.
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
The urlset attribute references the current sitemap standard used for the document (0.9 in this example).
Urlset attribute should is used as a pair, one after the header and the other at the end of the doc after all URLs and optional attributes.
The URL tag specifies the URL you want crawlers to use, it’s recommended to only list canonical versions of the links.
The URL is a required attribute in the protocol and is the parent tag for every tag mentioned next in this list (loc, lastmod, changefreq, priority).
<url>
<loc>https://www.moneysavingheroes.co.uk/peacocks</loc>
<lastmod>2020-11-26</lastmod>
</url>
The loc (aka location) is the last of the 3 mandatory attributes in a sitemap. This tag refers to the location of the URL.
Location tag URL must start with the protocol (i.e., HTTPS or HTTP), end with a trailing slash, and be less than 2,048 characters long.
Lastmod is the URL's last modification date and must use W3C Datetime format.
Changefreq is the expected frequency of changes likely to happen to a page. Accepted values are always, hourly, daily, weekly, monthly, yearly, and never.
Priority values can be between 0.0 and 1.0 and show search engines how important a page is for the website owner.
An XML sitemap index must have the following elements:
Lastmod is an optional element of the XML sitemap index.
Also, important to mention that this file can only mention sitemaps on the same site as the sitemap index, so it won't be valid for subdomains.
XML Sitemap Index structure example
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://www.domain.com/sitemap-pages.xml
</loc>
<lastmod>2022-04-
19T11:54:44.774Z
</lastmod>
</sitemap>
<sitemap>
<loc>https://www.domain.com/sitemap-posts.xml
</loc>
<lastmod>2022-04-25T18:42:55.769Z
</lastmod>
</sitemap>
</sitemapindex>
Now let's focus on the best practices for sitemaps:
The last recommendation is to submit the sitemaps in Google Search Console.
This process is straightforward:
Google Search Console accepts both sitemaps and sitemap Indexes.
Google supports the following sitemap formats:
Google pays attention primarily to URLs and lastmod in some situations. Their documentation explicitly says that Google doesn't consider priority in a sitemap.
On the same documentation page, Google says that they will use the <lastmod> value "if it's consistently and verifiably (for example by comparing to the last modification of the page) accurate."
Further, John Mueller wrote in 2017 that "The URL + last modification date is what we care about for websearch.". In 2015 John also said that "Priority and change frequency doesn’t really play that much of a role with sitemaps anymore".
It's important to mention that a URL’s position in the sitemap doesn’t matter because Google doesn't crawl pages in order of appearance.
The most common sitemap location is in the root directory, for example: Domain.com/sitemap.xml.
You can put a sitemap anywhere on the site, but it will only affect the descendant directories. This is why it's recommended to put sitemaps in the root directory like in the example above.
Putting a sitemap on sub-folder like Domain.com/blog/sitemap.xml will only affect URLs at the blog directory (Domain.com/blog/), leaving out all the URLs of the root directory like Domain.com/about or Domain.com/service-1.
Robots.txt files usually mention the sitemap location, helping crawlers find them.
No, you don't need a sitemap to rank on Google, and they aren't a ranking factor. But it’s better to use sitemaps because they don't hurt.
In John Mueller's opinion, sitemaps are "a minimal baseline for any serious website".
Google says the following in their documentation: "If your site's pages are properly linked, Google can usually discover most of your site. Proper linking means that all pages that you deem important can be reached through some form of navigation[...]Even so, a sitemap can improve the crawling of larger or more complex sites, or more specialized files."
Using a sitemap is your decision as they don't guarantee indexing. But sitemaps help search engines be more efficient with crawl budget. So, even if they aren't mandatory, they are helpful.