• Main
  • Information retrieval Using Web Search Engine(CSCI-572)
  • The Guardian news web search engine
    • Devised a search engine to search through 15000 news webpages. PageRank and TF-IDF weighting have been used to rank different search results retrieved from Solr, having features such as spell checking, auto complete and snippets.
      Technologies Used:Java, PHP, Javascript, Python, jQuery, Apache Solr, Lucene, jSoup

    • Code
    • Report
  • Indexing the web using SOLR
    • Using a web server, I created a web page with a text box which a user can retrieve and then enter a query. The user’s query will be processed by a program at the web server which formats the query and sends it to Solr. Solr will process the query and return some results in JSON format. A program on the web server will re-format the results and present them to the user as any search engine would do. Results are clickable (i.e. open the actual web page on the internet).
      Technologies Used:PHP, NetworkX library, Solr, Lucene, PageRank

    • Code
    • Report
  • Creating an Inverted Index Using a Hadoop Cluster
    • Creating an Inverted Index of words occurring in a set of web pages and hands-on experience in GCP App Engine using MapReduce.

    • Code(Unigram)
    • Code(Bigram)
    • Inverted Index(Unigram)
    • Inverted Index(Bigram)
  • Web Crawler
    • A simple web crawler to measure aspects of a crawl, study the characteristics of the crawl, download web pages from the crawl and gather webpage metadata, all from pre-selected news websites.

    • Code
    • Report
  • Comparing Search Engine Results
    • This exercise is about comparing the search results from Google versus Bing, the two leading US search engines. Many search engine comparison studies have been done. All of them use samples of data, some small and some large, so no general conclusions can be drawn. But it is always instructive to see how the two search engines match up, even on a small data set. I followed the process of issuing a set of queries and to evaluate the returned results for relevance. These studies do not seek to answer the ultimate question of which search engine is “best”.

    • Report

© 2019 Sumit Parwal. All rights reserved.