Archive Site Removal Guide


Anything posted to the internet stays online forever. Well, sort of. Services such as Archive.org and Archive.Today aim to preserve all websites, which can be an unwelcome feature for those who wish to erase their past. Fortunately we still have some privacy powers which can be executed to remove undesired content associated with the domains which we own.

Archive.org Removal


If your website(s) appear on Archive.org, you may want to eliminate any sensitive historical details. This could include an old family photos site or a self-hosted blog which has not aged well. Archive.org now ignores robots.txt and "NOARCHIVE" tags. Conduct ALL of the following for best removal results and consider the option which follows to prevent new exposure.

  • Search your domain at https://archive.org/
  • Document any domains which display sensitive content.
  • Add the following to a robots.txt file on your site.
  • (Create a new file if one is not already present.)

  • User-agent: archive.org_bot
    Disallow: /

  • Create a file called verify.txt at the root of your site.
  • Add the following text and save.

  • please remove from archive.org

  • Generate an email from an address at the target domain.
  • Direct the email to [email protected]
  • Create a Subject of "Domain Removal".
  • Insert the following text, modifying for your needs.

  • I am NAME owner of DOMAIN. I’m officially requesting the immediate removal of my site from all archive.org products. The "User-agent: archive.org_bot Disallow: /" code present in our robots.txt file is not being honored. It can be seen at:

    https://DOMAIN/robots.txt

    I am requesting removal of DOMAIN from all stored dates, including today, and all days going forward. I have been the sole owner of this domain since inception. I have sent this message from an address hosted at the domain which should be removed. I have also placed a confirmation message at the following link:

    https://DOMAIN/verify.txt

    Thank you for your prompt attention.

    DMCA Notice:

    I am the site owner and sole copyright holder for each of the domains cited above. This letter is official notification under Section 512(c) of the Digital Millennium Copyright Act (”DMCA”), and I seek the removal of the aforementioned infringing material from your servers. Archive.org does not have any right or permission to reproduce, sell or display my websites in any way, shape or form. I am providing this notice in good faith and with the reasonable belief that rights I own are being infringed. Under penalty of perjury I certify that the information contained in the notification is both true and accurate, and I am the copyright owner and therefore have the authority to act on behalf of the owner of the copyright(s) involved. Thank you for your prompt assistance with this matter.

    NAME
    DOMAIN

  • Wait 24-48 hours for a response.
  • If challenged, provide receipt of domain purchase.
  • If required, provide receipt of domain renewal(s).
  • Send PDFs, never screen captures.
  • After removal confirmation, search to confirm.
  • The following is the desired result.

Archive.org Prevention


  • Add a robots.txt file on your site with the following.

  • User-agent: archive.org_bot
    Disallow: /

  • Modify your .htaccess file to include the following.

  • RewriteEngine On
    RewriteCond %{HTTP_USER_AGENT} (archive.org_bot) [NC]
    RewriteRule .* - [R=403,L]

  • If using Cloudflare, add the following to a WAF rule.

  • (lower(http.user_agent) contains "archive")

While the robots.txt will be ignored, it can be cited later if the site is published again. The .htaccess modification prevents the Archive.org crawler from accessing any pages stored on your server at that domain. The Cloudflare option prevents any agent including "archive" within the string.

Archive.Today Removal


  • Search your domain at https://archive.ph/
  • Document any captures which display sensitive content.
  • Some URLs may appear as "https://archive.ph/jCqte".
  • Click any capture of concern.
  • Click "report bug or abuse" in the upper-right.
  • Insert your name.
  • Insert an email associated with the domain.
  • Insert any burner VOIP or generic number.
  • Select an "Abuse Type" of "Copyright".
  • Send the following message.

  • I am the owner of DOMAIN and respectfully request removal of this capture and all other captures from DOMAIN.

    DMCA Notice:

    I am the site owner and sole copyright holder for each of the domains cited above. This letter is official notification under Section 512(c) of the Digital Millennium Copyright Act (”DMCA”), and I seek the removal of the aforementioned infringing material from your servers. Archive.Today does not have any right or permission to reproduce, sell or display my websites in any way, shape or form. I am providing this notice in good faith and with the reasonable belief that rights I own are being infringed. Under penalty of perjury I certify that the information contained in the notification is both true and accurate, and I am the copyright owner and therefore have the authority to act on behalf of the owner of the copyright(s) involved. Thank you for your prompt assistance with this matter. Thank you.

  • Complete Captchas until "Message Sent" appears.
  • After removal confirmation, search to confirm.
  • If ignored, submit DMCA to [email protected]

Archive.Today Prevention


Archive.Today does not honor robots.txt or meta tags within HTML. They also do not specify any unique User Agent when cloning a page. The only way to block them it so block their server IP addresses completely. Add the following to your .htaccess file at the root of your domain.

    order allow,deny
    Deny from 198.245.53.182
    Deny from 37.1.213.27
    Deny from 5.188.0.77
    Deny from 37.1.213.27
    allow from all
If using Cloudflare, add the following to a WAF rule.

    (ip.src eq 198.245.53.182) or (ip.src eq 37.1.213.27) or (ip.src eq 5.188.0.77) or (ip.src eq 37.1.213.27)
This prevents their current servers from accessing your pages. However, if new servers are added, this could fail. Attempt to capture a non-existing page, such as inteltechniques.com/fakepage.html, but from your own domain, on archive.ph. If you receive an error from Archive.Today, you are protected. If the page is captured (even with a 404 error), analyze your web server logs for the IP which accessed the page, then block it.

Google/Bing Cache Removal & Prevention


If you do not want your site to be historically collected as a "Cache" file and presented within every Google or Bing search, add the following line within the "head" section of every HTML page on your site. If using WordPress, copy it to the header.php file within your theme. Be sure to place this line within "<" and ">" at the beginning and end of it.

    meta name="robots" content="noarchive"
The next time Google or Bing indexes your pages, they should remove any cached copies.

Archive Content DMCA


You may find content within Archive.org or Archive.Today which violates your copyright. This could be a PDF of your work, a photograph taken and owned by you, or a video which was stolen. If you do not own the domain which hosted this content, then you must rely on a traditional DMCA takedown request. This will be more likely to exist on Archive.org, as Archive.Today does not capture PDFs, videos, and other media. The following email to [email protected] should assist.

    The following content, to which I hold copyright, has been illegally uploaded to your service, please remove it immediately:

    [link to Archive.org page]

    The following confirms my claim of copyright:

    [link to external proof of ownership page]

Services


Privacy Guide


My latest (2022 4th Edition) book on Extreme Privacy is now available. Click HERE for details.


Weekly Podcast


The weekly podcast presents ideas to help you become digitally invisible, stay secure from cyber threats, and make you a better online investigator. All book updates will be presented on the show. Click HERE to listen or subscribe.

UNREDACTED


This free quarterly digital magazine supplements the podcast with in-depth user-submitted content. Visit UNREDACTEDmagazine.com to download issues.