Query 1: (6- Four+2)
Try to discover a Question of the shape [Query-term-1, Query-term-2] (with out quotes) that,
on Google, produces not less than one consequence that incorporates solely considered one of three phrases. That's, strive
to search out
an instance the place Google doesn't interpret a the-term question as a conjunction. (For those who
issue with discovering an applicable question, strive one which produces only a few hits, say,
(i) Take screenshot of the primary web page of Google outcomes (or extra if you wish to)
every consequence with 2 (each phrases happen on the web page), 1 (one time period happens on the
web page) or
zero (neither time period happens on the web page)
Based mostly on this proof, does Google interpret all queries as a Boolean
Query 2: (16: eight+Four+Four)
Recall and Precision are two essential analysis metrics that we use to research a set
unranked outcomes. Precision and Recall metrics contemplate the variations between set of
paperwork retrieved for given question and the set of paperwork which are related to the
A) Compute Recall, Precision and [email protected] for the next retrieval in opposition to
Q1, Q2 and Q3
Related doc Retrieved Doc
Q1 1,14,17,23, 24, 33,54, 55, 59,
74,101,103 2,5,7,23, 33,50, 55, 59, 77,98, 99,
101, 103, 110,120
Q2 14,19, 25, 27,30,39, 42, 63, 769,
790,1563 14, 21, 25, 26,27, 38, 42, 63, 569,
769, 790, 1565, 1589
Q3 eight, 11,32,54,67,69,78, 79,
91,99,111,122 11, 13, 17, 19, 21, 32,77,79,
This autumn Four,26, 38, 63, 569, 769, 790,
1565, 1589 14, 21, 25, 26,27, 38,63, 88, 769,
B) Recall and Precision are sometimes mentioned collectively as their focus is on complementary
data. If precision is essential, the we don’t not wish to see any non-relevant
paperwork. That's, no matter is retrieved, ought to be related. If recall is essential, we
wish to see all of the related paperwork, even when it requires sifting by way of some
nonrelevant ones. Present and Justify two information-seeking duties the place precision
could also be
significantly extra essential than recall. Equally, Present and Justify two
informationseeking duties the place recall could also be extra essential than precision. [Don’t
forget to justify
your choices: Justification will be graded, not the particular choices].
C) The trade-off between Recall and Precision could also be user-specific i.e. some customers might
enthusiastic about precision than recall and vice versa. How the search engine attempt to guess
with out asking, whether or not person cares extra about precision than recall, or vice versa?
Consider alternative ways, customers work together with a search engine and be artistic!
Query three: (6: three×three)
(a) Contemplate, we've three collections C1, C2, and C3 which have 500, 15,000 and
paperwork respectively. We now have added All paperwork in C1, to C2 and C3. Which
assortment is more likely to have extra new phrases added to its vocabulary (C1, C2 or C3) and
why? [Heaps’ Law]
(b) Calculate the tf-idf for under paperwork.
a. D1: Sweets Potatoes are Candy
b. D2: Candy Oranges are bitter and Candy
c. D3: I've candy Apple, Candy Orange, Candy Potatoes
Query Four: (10-5×2)
Doc-id home for sale
1 39 11 32
2 19 19 three
three 19 20 1
Four 12 20 14
(homes OR for OR sale OR in OR Geelong OR Melbourne)
(homes AND for AND sale AND in AND Geelong OR Melbourne)
Suppose these are issued to a search engine that makes use of the ranked Boolean retrieval
Assume, for simplicity, solely 4 paperwork within the assortment (with doc ids 1-Four).
Reply the next questions. The above desk provides the variety of occasions every queryterm
happens in every doc.
(i) Compute the doc scores and the rating related to the question
for OR sale OR in OR Geelong OR Melbourne).
How is the rating produced most likely sub-optimal and why does this
(iii) Compute the doc scores and the rating related to the question (homes
for AND sale AND in AND Geelong OR Melbourne).
(iv) How is the rating produced most likely sub-optimal and why does this occur?
(v) How would you lengthen the Boolean retrieval mannequin to deal with AND NOT constraints
(e.g., homes AND NOT Geelong)? Your proposed answer ought to give the next rating
to paperwork that include fewer occurrences of the time period to the fitting of the AND NOT
(e.g., Geelong). Please be as mathematical as attainable. In different phrases, saying: “I might
scale back the rating for paperwork that include the phrase to the fitting of AND NOT.” is just too
(vi) Utilizing the index, what can be the Boolean retrieval mannequin scores given to
1-Four by your proposed scoring methodology for the question “homes AND NOT Geelong”?
Query 5: (12-Four×three)
Doc1: A e book is taken into account an excellent e book that makes the reader feels higher.
Doc2: I really like studying good books to really feel higher.
Doc3: One can really feel higher after studying Tom’s current e book.
Question-1: I really like books which are good
Question -2: studying good books make you're feeling higher
Cease Phrase Dictionary=[is, can, after, a, to, I, the, about, that]
. Clarify the similarity scores of each Question -1 and Question -2 utilizing TF-IDF.
How would the consequence change if TF-IDF is used as an alternative of TF as Question?
What do want utilizing TF or TF-IDF as Question (Assist your declare utilizing F