How would I go about finding all "/_____.htm" pages within a site, where '_____' would be replaced by the page's filename? For example, say all I knew was "google.com". How would I find every ".htm" page that existed on Google.com, such as "google.com/index.htm", "google.com/games.htm", etc.? Is there a program that can do this for me? Thanks!
There are several web spider tools, or site backup tools that follow links on pages to create a site map. Aside from this I don't see any other option without either A) Finding a site hosted by someone who has turned directory listing on, and not set a default document (always fun) B) Typing common names for site map files that could be filled with goodies. or C) Take a look at the source of a web page, you'll often find includes (CSS, IFRAMES, JS) or HTML comments that can lead you to potential treasure troves. You could also try running a Google search for inurl:whatever.com which will then bring up a list of everything Google has found and indexed in their intensive and extensive site crawling.
you can use something like Xenu link checker http://home.snafu.de/tilman/xenulink.html it'll create a sitemap for that site and check for 404 errors! it's free... George
inurl: www.site.com will do the trick ! wre : www.site.com is the page u want to search..... works if the site is indexed by google bot