$30
Homework 4: Structured prediction
CS 4650/7650
Natural Language Understanding
Instructions
• This homework has two parts: Q1 and Q2 are written questions and Q3 is a programming
assignment with some written questions. Each part needs to be submitted as follows:
– Submit the answers to the written questions as a pdf file on Canvas for the
assignment corresponding to Homework 4 Written. This should contain the
written answers to Q1 and Q2. Name the pdf file as LastName FirstName.pdf.
We recommend students type answers with LaTeX or word processors for this
part. A scanned handwritten copy would also be accepted, try to be clear as
much as possible. No credit may be given to unreadable handwriting.
– The programming assignment requires you to work on boilerplate code and text
data, which are provided in a zip file on Canvas. Submit the code to the programming
assignment in a zip that contains (1) POS tagging.ipynb, which will contain
both the code and the written answers to Q3, and (2) a PDF version of the
notebook POS tagging.pdf. You can convert the notebook to a PDF by clicking
“File” −→ “Download as” −→ “PDF.”
This submission is to be made on Canvas for the assignment corresponding to
Homework 4 Programming. Name the zip file as LastName FirstName.zip.
– You can also submit output files to Gradescope “HW4 POS Tagging” to check
your models’ performance. The uploaded file test labels.txt should contain the
predicted tags from your POS tagger. You can see your score for Q3 programming
questions as soon as you submit it.
• For the written questions, write out all steps required to find the solutions so that
partial credit may be awarded.
• We generally encourage collaboration with other students. You may discuss the questions
and potential directions for solving them with another student. However, you need to
write your own solutions and code separately, and not as a group activity. Please list
the students with whom you collaborated.
1 of 4
Homework 4: Structured prediction
Deadline: Mar 4
th, 3:00pm
CS 4650/7650
Natural Language Understanding
1. Suppose we are tagging a sentence “They run programs” with two types of tags: N
(noun) and V (verb). The initial probabilities, transition probabilities and emission
probabilities computed by an HMM model are shown below.
N V
π 0.8 0.2
Table 1: Initial probabilities
N V
N 0.4 0.6
V 0.8 0.2
Table 2: Transition probabilities
they run programs
N 0.6 0.2 0.2
V 0 0.6 0.4
Table 3: Emission probabilities
(a) Given the HMM model above, compute the probability of the given sentence
“They run programs” by using the Forward AND Backward Algorithm. Please
show all the steps of calculations. [6 pts]
(b) Tag the sentence “They run programs” by using the Viterbi Algorithm. Please
show all the steps of calculations. [4 pts]
Solution. Solution goes here. ?
2. Suppose we are parsing a sentence “submit the homework through Canvas” with a
Probability Context Free Grammar (PCFG) in Chomsky Normal Form. We give the
PCFG as follows.
(a) Parse the sentence “submit the homework through Canvas” by using the CKY
Algorithm. You need to show the probabilities of all possible parses and choose
the parse with the highest probability. Please show all the steps of calculations.
[8 pts]
(b) Compute the probability of the given sentence “submit the homework through
Canvas” by using the CKY Algorithm. Please show all the steps of calculations.
[2 pts]
Solution. Solution goes here. ?
2 of 4
Homework 4: Structured prediction
Deadline: Mar 4
th, 3:00pm
CS 4650/7650
Natural Language Understanding
Rule Probability
S −→ NP VP 0.8
S −→ submit 0.01
S −→ Verb NP 0.05
S −→ VP PP 0.03
NP −→ Canvas 0.16
NP −→ Gradescope 0.04
NP −→ Det Nominal 0.6
Nominal −→ Nominal Noun 0.2
Nominal −→ Nominal PP 0.5
Nominal −→ homework 0.15
VP −→ submit 0.1
VP −→ contain 0.04
VP −→ Verb NP 0.5
VP −→ VP PP 0.3
PP −→ Prep NP 1.0
Det −→ the 0.6
Prep −→ through 0.2
Verb −→ submit 0.5
Table 4: Initial probabilities
3. For the programming task, you will implement several models for part-of-speech tagging.
We will use the LSTM as the basic architecture for the model and try several modifications
to improve performance. To do this, you will complete the Python code started in
POS tagging.ipynb, available on Canvas along with the train and test data (train.txt
and test.txt) in POS tagging.zip. For all the written questions to Q3 (accuracy
reporting and error analysis), please provide your answers in the space provided inside
the notebook.
NOTE: for each model that you test, you will produce a file test labels.txt that
you can upload to Gradescope to evaluate your model’s performance.
(a) To begin, implement an LSTM tagger by completing the skeleton code provided
in BasicPOSTagger in the notebook. The model will use word embeddings to
represent the sequence of words in the sentence.
For this problem, implement the forward pass and the training procedure for the
tagger. After training for 30 epochs, the model should achieve at least 80% on
the held-out data.
In addition, compute the top-10 most frequent types of errors that the model made
in labeling the data in test.txt (e.g. if the model labeled NN as VB 10 times) and
report these errors in the written answer. Report the results in a table containing
the top-10 mistakes, with the following columns: model tag type, ground truth
tag type, error frequency, up to 5 example words that were tagged incorrectly.
What kinds of errors did the model make and why do you think it made them?
3 of 4
Homework 4: Structured prediction
Deadline: Mar 4
th, 3:00pm
CS 4650/7650
Natural Language Understanding
[5 pts]
(b) Word-level information is useful for part-of-speech tagging, but what about character
level information? For instance, the present tense of English verbs is marked with
the suffix -ing, which can be captured by a sequential character model.
For this problem, implement the character-level model by completing the skeleton
code provided in CharPOSTagger in the notebook. Unlike part (a), this model will
use character embeddings. After training for 30 epochs, the model should achieve
at least 80% on the held-out data.
In addition, compare the top-10 types of errors made by the character-level model
with the errors made by the word-level model from part (a) and report these errors
in the written answer. Report the error results in the same format as before. What
kinds of errors does the character-level model make as compared to the original
model, and why do you think it made them? [10 pts]
(c) The previous models considered only a single direction, i.e. feeding the information
from left to right (starting at word i = 1 and ending at word i = N). For the final
model, you will implement a bidirectional LSTM that uses information from both
left-to-right and right-to-left to represent dependencies between adjacent words
in both directions. In order to decide if the word “call” is VB or NN, the model can
use information from the following words to disambiguate (e.g. “call waiting” vs.
“call him”).
For this problem, implement the bidirectional model by completing the skeleton
code provided in BiLSTMPOSTagger in the notebook. Like the model in part (a),
this will use word embeddings. After training for 30 epochs, the model should
achieve at least 90% on the held-out data.
In addition, implement one of the three modifications listed in the notebook
to boost your score: (a) different word embeddings to initialize the model (one
different type), (b) different hyperparameters to initialize the model (five different
combinations), (c) different model architecture on top of the BiLSTM (one different
architecture). Explain in the written answer which modifications you have made,
why you chose a certain modification over another (e.g. Glove over word2vec)
and the resulting accuracy of the modified model.
Lastly, compare the top-10 errors made by this modified model with the errors
made by the model from part (a). Report the error results in the same format as
before. If you tested multiple different hyperparameters, compute the errors for
the model with the highest accuracy on the validation data. What errors does the
original model make as compared to the modified model, and why do you think
it made them? [10 pts]
4 of 4