Enhancement in Crawling and Searching (Using Extended Weighted Page Rank Algorithm based on VOL)

Ms.Isha Mahajan, Ms. Harjinder Kaur, Dr. Darshan Kumar

Department of Computer Science & Engineering SSIET, Dinanagar - 143531, Distt. Gurdaspur, Punjab (India)
As the World Wide Web is becoming gigantic day by day, the number of web pages is increasing into billions around the world. To make searching much easier for users, search engines came into existence. Search engines are used to find specific information on the WWW. Without search engines, it would be almost impossible for us to locate anything on the Web unless or until we know a specific URL address. Every search engine maintains a central repository or databases of HTML documents in indexed form. Whenever a user query comes, searching is performed within that database of indexed web pages. The size of a repository of every search engine cannot keep each page available on the WWW. So it is desired that only the most relevant and important pages be stored in the database to increase the efficiency of search engines. This search engine database is maintained by special software called “Crawler.” A Crawler is a software that traverses the web and downloads web pages. Web Crawlers are also known as “Web Spiders,” “Robots,” “Internet Bots,” “Agents” and automatic Indexers” etc. Broad search engines, as well as many more specialized search tools, rely on web crawlers to acquire large collections of pages for indexing and analysis. Since the Web is a distributed, dynamic and rapidly growing information resource, a crawler cannot download all pages. It is almost impossible for crawlers to crawl the whole web pages from World Wide Web. Crawlers crawl the only fraction of web pages from World Wide Web. So a crawler should observe that the fraction of pages crawled must be most relevant and the most important ones, not just random pages. The crawler is an important module of a search engine. The quality of a crawler directly affects the searching quality of search engines. In our Work, we propose to improve the crawling of a web crawler, to crawl only relevant and important pages from WWW, which will lead to reduced server overheads. With our proposed architecture we will also be optimizing the crawled data by removing least used or never browsed pages. The crawler needs a huge memory space or database for storing page content etc, by not storing irrelevant and unimportant pages and never removing accessed pages, we will be saving a lot of memory space that will eventually speed up the queries to the database. In our approach, we propose to use Extended Weighted page rank based on visits of links algorithm to sort the search results, which will reduce the search space for users, by providing mostly visited pages and most time devoted pages by the user on the top of search results list. Hence reducing search space for the user.

Isha Mahajan, “Enhancement in Crawling and Searching(Using Extended Weighted Page Rank Algorithm based on VOL)”, International Journal of Computer Engineering In Research Trends, 4(6):pp:202-230,June-2017.

Keywords : Web Crawler, Extended Weighted Page Rank based on Visits of links, Weighted Page Rank, Page Rank, Page Rank based on visit of links, Search Engine, Crawling, bot, Information Retrieval Engine, Page Reading Time, User Attention Time, World Wide Web, Inlinks, Outlines, Web informational retrieval, online search.

