An Investigation of Documents from the World Wide Web - Paper by Woodruff Aoki Brewer Gauthier and Rowe describing their analysis of over 2 6 million HTML documents collected by their Inktomi Web crawler The authors examined many characteristics of these documents including size number and types of tags and attributes file extensions and links