Postby Red » Mon Feb 18, 2013 7:00 am
Dave, "htf" is really easy. There are programs that are called "web snatchers" or "web walkers" which start by going to each IP address, or each top level registered domain, and looking at the top (index, home) page. Then they look for any link on that page and visit the linked page, and repeat the process on each page.
You can find freeware that will do this on your computer. Adobe Acrobat used to offer the same option, they took that out about five yeas ago. The proble today is that web pages are so bloated, with so much code and so many advertising links, that "snatching" a web site can generate a tremendous amount of data now. The copy of the old forum pages that is on the wayback machine runs well over 10GB in size and can take a broadband customer DAYS to download. Possibly a week, running 24x7. ALong the way you'll pull down every page from eery idiot advertiser who linked their own web sites. "OOpsie".
But the theory is really easy, and the besster software will let you adjust how many levels down it goes, etc. That's gotten harder to find.
--Red
-- Original owner, 1985 GT-S