Not a valid URL
test url
Test results
Rendered HTML
Loading…
Page loading issues
view details
Tested on: Jul 29, 2020 at 7:49 AM
Page is eligible for rich results
All structured data on the page can generate rich results.
Preview results
View rendered HTML
Detected items
How-to
How to Download and Crawl XML Sitemaps Using Screaming Frog
1 warning
Missing field "supply" (optional)
type
HowTo
name
How to Download and Crawl XML Sitemaps Using Screaming Frog
description
Learn how to download and crawl xml sitemaps to avoid submitting dirty sitemaps to the search engines. View the step-by-step tutorial now.
image
type
ImageObject
url
https://www.gsqi.com/images/dirty-sitemaps.jpg
height
524
width
293
tool
type
HowToTool
name
Screaming Frog Spider
tool
type
HowToTool
name
Microsoft Excel
tool
type
HowToTool
name
Text Editor
step
type
HowToStep
url
https://www.gsqi.com/marketing-blog/dirty-sitemaps-how-to-download-crawl/#step1
name
Download the XML Sitemap(s)
itemListElement
type
HowToDirection
text
Enter the URL of your xml sitemap, or the sitemap index file. A sitemap index file contains the urls of all of your xml sitemaps (if you need to use more than one due to sitemap size limitations). If you are using a sitemap index file, then you will need to download each xml sitemap separately. Then you can either crawl each one separately or combine the urls into one master text file. After the sitemap loads in your browser, click “File”, and then “Save As”. Then save the file to your hard drive.
image
type
ImageObject
url
https://www.gsqi.com/images/dirty-sitemaps-download-xml.jpg
height
422
width
324
step
type
HowToStep
name
Import the Sitemap into Excel
url
https://www.gsqi.com/marketing-blog/dirty-sitemaps-how-to-download-crawl/#step2
itemListElement
type
HowToDirection
text
Next, you’ll need to get a straight list of urls to crawl from the sitemap. In order to do this, I recommend using the Import XML functionality in the Developer tab in Excel. Click Import and then select the sitemap file you just downloaded. After clicking the Import button after selecting your file, Excel will provide a dialog box about the xml schema. Just click OK. Then Excel will ask you where to place the data. Leave the default option and click OK. You should now see a table containing the urls from your xml sitemap. And yes, you might already see some problems in the list.
image
type
ImageObject
url
https://www.gsqi.com/images/dirty-sitemaps-import-excel.jpg
height
411
width
202
step
type
HowToStep
name
Copy the URLs to a Text File
url
https://www.gsqi.com/marketing-blog/dirty-sitemaps-how-to-download-crawl/#step3
itemListElement
type
HowToDirection
text
I mentioned earlier that Screaming Frog will only crawl text files with a list of urls in them. In order to achieve this, you should copy all of the urls from column A in your spreadsheet. Then fire up your text editor of choice (mine is Textpad), and paste the urls. Make sure you delete the first row, which contains the heading for the column. Save that file to your computer.
image
type
ImageObject
url
https://www.gsqi.com/images/dirty-sitemaps-text-file.jpg
height
494
width
231
step
type
HowToStep
name
Unleash the Frog
url
https://www.gsqi.com/marketing-blog/dirty-sitemaps-how-to-download-crawl/#step4
itemListElement
type
HowToDirection
text
Next, we’re ready to crawl the urls in the text file you just created. Fire up Screaming Frog and click the Mode tab. Select List, which enables you to load a text file containing a series of urls.
image
type
ImageObject
url
https://www.gsqi.com/images/dirty-sitemaps-sf-mode.jpg
height
400
width
205
step
type
HowToStep
name
Load The Text File and Start The Crawl
url
https://www.gsqi.com/marketing-blog/dirty-sitemaps-how-to-download-crawl/#step5
itemListElement
type
HowToDirection
text
Once you select List Mode, then click the Upload List button and select From a file. Then select the text file you created. Screaming Frog will load the urls and display them in a window. Once you click OK, the crawl will begin.
image
type
ImageObject
url
https://www.gsqi.com/images/xml-sitemaps-load-sf.jpg
height
536
width
206
step
type
HowToStep
name
Analyze the Crawl
url
https://www.gsqi.com/marketing-blog/dirty-sitemaps-how-to-download-crawl/#step6
itemListElement
type
HowToDirection
text
When the crawl is done, you now have a boatload of data about each url listed in the xml sitemap. The first place I would start is the Response Codes tab, which will display the header response codes for each url that was crawled. You can also use the filter dropdown to isolate 404s, 500s, 302s, etc. You might be surprised with what you find.
image
type
ImageObject
url
https://www.gsqi.com/images/dirty-sitemaps-analyze-crawl.jpg
height
500
width
263
totalTime
PT1H30M
Additional resources
Google apps
Main menu