Starting from:

$29.99

Lab 7: Exercises in Regular Expressions

CS202: PROGRAMMING PARADIGMS & PRAGMATICS

Lab 7: Exercises in Regular Expressions
 Aim: The goal of this lab is to get hands-on experience with using Regular Expressions.
 Let’s get started!
a. Create a directory structure to hold your work for this course and all the subsequent labs:
 Suggestion: CS202/Lab7
b. Write scripts / code to implement regular expressions for the following exercises in Perl!
c. For exercise 1 and 2 below, the program should take a string as an input and display either “ACCEPTED” or
“REJECTED”
 Exercises
o You are in the market to buy a red pick-up truck, and you wish to develop an automated web searching
program (a spider) to search daily through various online newsgroups and classified ad websites to find text
containing the word red and the phrase pick-up truck close to each other, followed by a price. Specifically,
you should match the words red and the phrase (pickup/pick-up/pick up) truck separated by at most two
other words in between. The pick-up truck phrase could appear before or after the word red. After the words
red and the phrase pick-up truck, the text should also contain a price. Sample text strings that should be
accepted / rejected by the RE are given below: (Truck.pl)
ACCEPT REJECT
 red pickup truck $5000
 red pickup truck $5,000
 red pickup truck $1,234.56
 red pick-up truck $5000
 red pick up truck $5000
 red toyota pick-up truck $5000
 red toyota 1993 pick-up truck $5000
 blah blah red toyota 1993 pick-up
truck blah blah $5000 blah
 pickup truck red $5000
 pick-up truck 1993 toyota red $5000
 blah blah blah pick-up truck toyota
1993 red blah blah blah $5000
 desperate: red 1993 toyota pickup
truck for sale. $2,000 o.b.o.
 toy pickup truck - cherry red: $12.
 red red pickup pickup truck truck
$5000.
 Red
 Truck
 pickup truck
 red pickup truck
 red $5000
 pickup truck $5000
 red truck $5000
 $5000 red pickup truck
 blue pickup truck $5000
 red car $5000
 red toyota 1993 pick-up truck
 red 1993 toyota automatic pick-up
truck $5000
 fred's pick-up truck sold for $5000
 pick-up trucks by fred: $5000
 reddy for sale pickup truck: $5000)
o DNA sequences are comprised of a simple 4-alphabet language with the symbols {A,C,G,T}. Three consecutive
letters are known as a codon, so ACT and TCG are both codons. A Gene is a collection of at least three codons
that starts with an ATG codon and ends with aTAA, TAG, or TGA codon. You need to develop a regular
expression that will match strings that contain a gene. Sample DNA sequences that should be
accepted/rejected as Genes are given below: (Gene.pl)
ACCEPT REJECT
 ATGCCCTAA
 ATGCCCTAG
 ATGCCCTGA
 CATGCCCTAA
 CATGCCCTAG
 CATGCCCTGA
 CATGCCCTAAC
 CATGCCCTAGC
 CATGCCCTGAT
 TCATGCCCTGACC
 TTATGCCCGGGTGACC
 AAACTCATGCCCGGGCCCTGACCTTAA
 ATGATGATGTAA
 ATGAAAAACAAGAATTAA
 ATGACAACCACGACTTAA
 ATGAGAAGCAGGAGTTAA
 ATGATAATCATGATTTAA
 ATGCAACACCAGCATTAA
 ATGCCACCCCCGCCTTAA
 ATGCGACGCCGGCGTTAA
 ATGCTACTCCTGCTTTAA
 ATGGAAGACGAGGATTAA
 ATGGCAGCCGCGGCTTAA
 ATGGGAGGCGGGGGTTAA
 ATGGTAGTCGTGGTTTAA
 ATGTACTATTCATCCTCGTCTTGCTGGTGTTTATTCTTGTTTTAA
 GATTACA
 ATGTAA
 ATGTAG
 ATGTGA
 ATGCCCCTAG
 ATGCCCCCTAG
 CCCATGCCCCTAGCCC
 CCCATGCCCCCTAGCCC
o Tokenization is the task of extracting tokens from the input text. The definition of ‘token’ depends on the
application, but in most cases complete words count as tokens; sometimes punctuation markers do as well.
Write a simple tokenizer that given an input text and delimiting characters outputs one word per line by
replacing strings of delimiting characters with newlines. (Token.pl)
 Submitting your work:
o All source files and class files as one tar-gzipped archive.
 When unzipped, it should create a directory with your ID. Example: 2008CSB1001 (NO
OTHER FORMAT IS ACCEPTABLE!!! Case sensitive!!!)
 Should include: Truck.pl, Gene.pl, Token.pl, and README file
 Negative marks for any problems/errors in running your programs
o If any aspects of the tasks are confusing, make an assumption and state it clearly in your README
o README file should also have instructions on how to use/run your program!
o Submit/Upload it to Google Classroom
 Marks Allocation: Truck [5 points], Gene [5 points], Token [3 points], README [2 points]

More products