Starting from:

$30

CS143 - Written Assignment 1

CS143 - Written Assignment 1

This assignment covers regular languages, finite automata and lexical analysis. You may discuss
this assignment with other students and work on the problems together. However, your write-up
should be your own individual work, and you should indicate in your submission who you worked
with, if applicable. Assignments can be submitted electronically throught Gradescope as a PDF
by 11:59 PM PDT. A LATEX template for writing your solutions is available on the course website.
There is a post on Piazza describing how to create the finite automata diagrams.
1. Write regular expressions for the following languages over the alphabet Σ = {0, 1}:
(a) The set of all strings where no two consecutive characters are the same.
(b) The set of all strings representing a binary number that is a power of 2. Allow for leading
zeros e.g. 001000.
(c) The set of all strings containing at least one of 1110 or 1011.
2. Draw the DFAs for each of the languages from Question 1.
3. Using the techniques covered in class, transform the following NFAs with -transitions over
the given alphabet Σ into DFAs. Note that a DFA must have a transition defined for every
state and symbol pair, whereas a NFA need not. You must take this fact into account for
your transformations. Hint: Is there a subset of states the NFA transitions to when fed a
symbol for which the set of current states has no explicit transition?
(a) Σ = {a, b, c}
s0 s2
s1 s3
a
a
a
c
b
 
(b) Σ = {a, b, c}
s0 s1
b
a
a, c
1
(c) Σ = {a, b}
s0 s1 s2 s3
a
a, b
b
a
a, b
a
b
a, b
a
4. Let L be the language over Σ = a, b, c where the following holds: w is in L if at most one
character has more than one occurrence in w.
Examples of Strings in L: a, baaaaaac, 
Examples of Strings not in L: ccbb, abcab
Draw an NFA for L.
5. Consider the following tokens and their associated regular expressions, given as a flex scanner
specification:
%%
(01|10) printf (" snake ")
0(01) *1 printf (" badger ")
(1010*1|0101*0) printf (" mushroom ")
Give an input to this scanner such that the output string is (badger11mushroom2
)
4 snake3
,
where A
i denotes A repeated i times. (And, of course, the parentheses are not part of the
output.) You may use similar shorthand notation in your answer.
6. Recall from the lecture that, when using regular expressions to scan an input, we resolve
conflicts by taking the largest possible match at any point. That is, if we have the following
flex scanner specification:
%%
do { return T_Do ; }
[A - Za - z_ ][ A - Za - z0 -9 _ ]* { return T_Identifier ; }
and we see the input string “dot”, we will match the second rule and emit T Identifier for
the whole string, not T Do.
However, it is possible to have a set of regular expressions for which we can tokenize a
particular string, but for which taking the largest possible match will fail to break the input
into tokens. Give an example of a set of regular expressions and an input string such that:
a) the string can be broken into substrings, where each substring matches one of the regular
expressions, b) our usual lexer algorithm, taking the largest match at every step, will fail to
break the string in a way in which each piece matches one of the regular expressions. Explain
how the string can be tokenized and why taking the largest match won’t work in this case.
2

More products