Many of our clients offer whitepaper PDFs on their site to generate leads. Therefore, our clients don’t want people to find their whitepaper PDFs from a Google search. Rather, they want to get people’s email address first before giving access to their whitepaper.
The easiest way to hide a PDF uploaded to WordPress from search engines, or to noindex it, is to do the following:
- Install and activate the Yoast WordPress SEO plugin
- Upload the PDF to the media library
- Edit the PDF in the media library. Depending on how your media library looks (tile view or list view) here’s how to find the Edit link:
In grid view, click on the PDF and then click Edit More Details:
In list view, click Edit on the PDF:
- In the Yoast SEO settings for the media item, click the gear icon. Set the “Meta robots index” to noindex. This will make sure the file (not just the media attachment page) is not indexed by search engines. Ideally, you should modify this setting when you upload a new PDF. If the PDF already exists, it is probably already indexed in Google and might take some time for search engines to recrawl your site to noindex it.
Update: our client is using a plugin (WP Original Media Path) that uploads all media to a https://subdomain static.domain.com so we couldn’t use Yoast’s plugin which is only set to work on https://domain.com.
Therefore, I added a x-robots tag in the .htaccess file to hide the pdf:
<FilesMatch “Bestwhitepaperever.pdf”>
Header set X-Robots-Tag “noindex, noarchive, nosnippet”
</FilesMatch>
Why use x-robots tag instead of robots.txt:
The robots.txt does not prevent your page or file from being listed in search results.
What it does is stop the bot from crawling your page, but if a third party links to your PDF file from their website, your page will still be listed.
If you stop the bot from crawling your page using robots.txt, it will not have the chance to see the X-Robots-Tag: noindex response tag. Therefore, never disallow a page in robots.txt if you employ the X-Robots-Tag header.
I then used the Web Developer add on in Firefox to check the header response for this line:
x-robots-tag: noindex, noarchive, nosnippet
1 2 3 4 5 6 7 8 9 10 11 12 13 |
X-Firefox-Spdy: h2 accept-ranges: bytes content-length: 1147940 content-type: application/pdf date: Wed, 31 Jan 2018 14:20:39 GMT etag: "118424-55f7ff5fc6f00" host-header: 192fc2e7e50945beb8231a492d6a8024 last-modified: Mon, 04 Dec 2017 09:01:16 GMT server: nginx x-proxy-cache: HIT <strong>x-robots-tag: noindex, noarchive, nosnippet </strong> 200 OK |