3) If a web page has been created without the usual HTML such as pdf, Word, Excel, Corell suite, etc then it may not always be easily read. For instance, a PDF document might only be accessed as a link and not easily found as an HTML web page so such pages also comes under invisible web.
I dont agree on this with you. Any major SE can read pdf, Word, Excel at least and I am not sure of Corell suite though.

Also regarding the adding of keyword is concern I dont think adding database would do any else apart from searching the database as the word among the document.

