IntelTechniques Guides

Archive Site Removal Guide

Anything posted to the internet stays online forever. Well, sort of. Services such as Archive.org and Archive.Today aim to preserve all websites, which can be an unwelcome feature for those who wish to erase their past. Fortunately we still have some privacy powers which can be executed to remove undesired content associated with the domains which we own.

Archive.org Removal

If your website(s) appear on Archive.org, you may want to eliminate any sensitive historical details. This could include an old family photos site or a self-hosted blog which has not aged well. Archive.org now ignores robots.txt and "NOARCHIVE" tags. Conduct ALL of the following for best removal results and consider the option which follows to prevent new exposure.

Search your domain at https://archive.org/
Document any domains which display sensitive content.
Add the following to a robots.txt file on your site.
(Create a new file if one is not already present.)

Create a file called verify.txt at the root of your site.
Add the following text and save.

Generate an email from an address at the target domain.
Direct the email to [email protected].
Create a Subject of "Domain Removal".
Insert the following text, modifying for your needs.

Wait 24-48 hours for a response.
If challenged, provide receipt of domain purchase.
If required, provide receipt of domain renewal(s).
Send PDFs, never screen captures.
After removal confirmation, search to confirm.
The following is the desired result.

Archive.org Prevention

Add a robots.txt file on your site with the following.

Modify your .htaccess file to include the following.

If using Cloudflare, add the following to a WAF rule.

While the robots.txt will be ignored, it can be cited later if the site is published again. The .htaccess modification prevents the Archive.org crawler from accessing any pages stored on your server at that domain. The Cloudflare option prevents any agent including "archive" within the string.

Archive.Today Removal

Search your domain at https://archive.ph/
Document any captures which display sensitive content.
Some URLs may appear as "https://archive.ph/jCqte".
Click any capture of concern.
Click "report bug or abuse" in the upper-right.
Insert your name.
Insert an email associated with the domain.
Insert any burner VOIP or generic number.
Select an "Abuse Type" of "Copyright".
Send the following message.

Complete Captchas until "Message Sent" appears.
After removal confirmation, search to confirm.
If ignored, submit DMCA to [email protected].

Archive.Today Prevention

Archive.Today does not honor robots.txt or meta tags within HTML. They also do not specify any unique User Agent when cloning a page. The only way to block them it so block their server IP addresses completely. Add the following to your .htaccess file at the root of your domain.

If using Cloudflare, add the following to a WAF rule.

This prevents their current servers from accessing your pages. However, if new servers are added, this could fail. Attempt to capture a non-existing page, such as inteltechniques.com/fakepage.html, but from your own domain, on archive.ph. If you receive an error from Archive.Today, you are protected. If the page is captured (even with a 404 error), analyze your web server logs for the IP which accessed the page, then block it.

Google/Bing Cache Removal & Prevention

If you do not want your site to be historically collected as a "Cache" file and presented within every Google or Bing search, add the following line within the "head" section of every HTML page on your site. If using WordPress, copy it to the header.php file within your theme. Be sure to place this line within "<" and ">" at the beginning and end of it.

The next time Google or Bing indexes your pages, they should remove any cached copies.

Archive Content DMCA

You may find content within Archive.org or Archive.Today which violates your copyright. This could be a PDF of your work, a photograph taken and owned by you, or a video which was stolen. If you do not own the domain which hosted this content, then you must rely on a traditional DMCA takedown request. This will be more likely to exist on Archive.org, as Archive.Today does not capture PDFs, videos, and other media. The following email to [email protected] should assist.

Privacy Book

Our latest (5th Edition) book on Extreme Privacy is now available. Click HERE for details.

Buy the Book

OSINT Book

Our latest (11th Edition) book on Open Source Intelligence (OSINT) is now available! Click HERE for details.

Buy the Book