Keyword and Title Based Clustering (KTBC): An Easy and Effective Way to Dynamically Cluster Web Documents

Abstract: 

Web search engine users are most often bound to search documentsthrough a huge list of web documents returned by the search engine. With rapid proliferation of web documents on internet, fast and effective mining of informationfrom this data sources scattered all over the world has become a challenge to theInformation Retrieval (IR) community. The IR community has explored documentclustering as an alternative way of organizing retrieval results but clustering has yet to be deployed on many search engines. In this research, an effective clustering approach: Keyword and Title Based Clustering (KTBC) algorithm has been proposed. The KTBC algorithm is a fast, post-retrieval web document clustering method, suitable to be used by web search engines. Instead of viewing an extremely large list of documents, the algorithm returns a smaller number of clusters which will help web users finding relevant information at more ease. Here we have provided an algorithmic methodology along with mathematical and logical analysis and finally simulation result of the algorithm.

Year: 
Volume: 
XXXIV
Issue: 
2