$30
Points: 75
In this assignment, you will run an experiment to study the effect of relevance feedback on the recall,
precision and Mean Average Precision (MAP) values of an IR system. The IR system will use vector space
model with cosine similarity (tf-idf weighting). You will run the study on the TIME dataset, provided
along with the assignment
Part A: Cosine Similarity and Rocchio’s algorithm [40pts]
• You will implement a cosine similarity measure with tf-idf weighting. Your index should contain
the information that you will need to calculate the cosine similarity measure such as tf and idf
values. You may reuse code from the previous assignments as needed
• Implement the Rocchio algorithm for query refinement. Your system should display results and
then prompt the user for providing positive and negative feedback. Use 𝛼= 1, 𝛽= 0.75, and 𝛾=
0.15 as parameters for the Rocchio’s algorithm.
Part B: Experimental study [35pts]
• Run your system for at least 3 queries from the test bed. Pick queries that have 5 or more
relevant documents (see TIME.REL file). For each query, you will perform a series of 5 relevance
feedback and plot the change in precision, recall and MAP
• You will prepare a report on the experimental study where you will provide at least the
following details for each queries:
o Query text and ID (provided in the testbed)
o Precision, recall and MAP values of the query
o IDS of document which are Positive and negative feedback provided for each query
during each iteration of the Rocchio algorithm.
o For each interaction of the Rocchio algorithm, provide the terms of the new query and
their weights.
• Your report will have 3 plots (precision vs Rocchio iteration, recall vs Rocchio iteration and MAP
measure vs Rocchio iteration) that depict the progressive change in the performance values
over the iterations of the rocchio algorithm, for the three queries.
• Also discuss any query drift that you may observe in your results.
• Note: The queries provided in the testbed have varying number of relevant documents (see
TIME.REL file). This can be a problem when calculating the performance values, if k is kept
constant during the retrieval. For the experimental study, assume that the number of relevant
documents is provided to the system along with the query. In other words, the value of k will
change with the query.
Extra Credit: Pseudo Relevance Feedback [20pts]
One of the challenges of user feedback is that the user may not be willing to provide feedback. In such
cases, pseudo relevance feedback can be used. You will compare the performance of your user feedback
based system from Part A against a pseudo feedback mechanism, where the top 3 results of the system
are considered to be relevant. Run this system with the same queries from your experimental study in
Part B. Compare its recall, precision and MAP values against the system using user feedback.
Other instructions:
• Comment your code appropriately
• You may reuse the code from earlier assignment
Attachments:
1. Collection of documents
2. Skeleton code – implement the functions in the code. Use additional functions as needed.
3. Sample output
Submission:
Submit the following files on blackboard as a .zip file.
1. index.py
2. Report.pdf: with performance analysis and other discussions.
3. Output.txt: containing the output generated by your code for the three queries. This is for testing
your code.