$30
APS105 Page 1 of 3
APS 105 — Computer Fundamentals
Lab 4: Functions and Arrays
In this lab you will be writing a program to solve a search problem (that is similar to something that
arises in the world of DNA), AND learning skills in finding errors in your programs (called
“debugging”).
Preparation
Read through this entire document carefully and do the work to create the programs that are described
below. You are encouraged to ask for assistance from your lab/tutorial TAs.
Notes:
• In the sample output examples that follow:
o The text <enter> stands for the user pressing the enter key on the keyboard. On some
keyboards, the enter key may be labeled return.
• Throughout this lab, there is a single space after the colon (:) in each output line.
Searching DNA
The DNA that exists in living cells is a code that describe basic structures that are built by cellular
“machines”. It is known that the sequence of DNA consists of just four nucleic acids - guanine (referred
to as G), adenine (A), thymine (T), and cytosine (C), and that the human genome consists of a sequence
of roughly three billion total nucleic acids. One of the relatively recent things done in genetic research
was to determine the sequence (AG-GTCGATT etc.) of the entire human genome. To do that, part of
the process was to match smaller sequences to larger sequences to see where they fit. Subsequent to
that, scientists have looked for patterns and relationships between different parts of the sequence, and
between historical sequences and present-day sequences of plants and animals.
A core algorithm at the root of all of this is searching for a smaller sequence within a larger one, and
finding all places that match, or match closely. One way that approximate matches are found is by
allowing some elements of the search sequence to match any of the four possible nucleic acids. Any
element of the searching sequence that has that property is called a wildcard (from the notion of wild
cards in card-playing games, if you’re familiar with those).
You are to write a C program that searches through the an array that contains a sequence of numbers in an
array (these numbers are restricted to be 1, 2, 3 or 4, which we will use instead of the letters A,G,T and C)
and that a end with the number zero. That array should be declared and initialized exactly as follows in your
program:
int DNA[] = {1, 2, 4, 2, 2, 2, 1, 4, 4, 2, 3, 4, 4, 4, 2, 1, 4,
1, 3, 3, 2, 1, 3, 2, 1, 1, 2, 2, 2, 3, 4, 1, 3, 1, 2, 1, 4,
4, 4, 1, 1, 3, 1, 4, 2, 4, 4, 1, 4, 4, 1, 4, 4, 4, 4, 1, 1,
2, 3, 3, 3, 3, 4, 4, 3, 2, 3, 2, 3, 4, 3, 3, 4, 4, 1, 3, 3,
2, 1, 2, 3, 1, 2, 1, 3, 3, 2, 1, 4, 1, 4, 3, 4, 4, 4, 1, 2,
1, 3, 2, 0};
APS105 Page 2 of 3
Your program should repeatedly ask the user for two things: the length of a search sequence, and the
search sequence itself. The program should then search through the array “DNA” to find the starting
element subscript (index) of all possible matching sequences. The elements of the search sequence may
take on one of five characters: 1,2,3,4 and 5. The meaning of the ‘5’ character is that of wild card, i.e.
it matches 1,2,3 and 4.
The program should terminate when the length of the input sequence is zero or less. If the input search
sequence contains a character other than 1 to 5, the program should report the error and terminate, as
shown in the examples below. You can assume that only numbers will be input, i.e. no non-numeric
characters.
Your program must make use of at least two functions outside of the main function - one to read in the
search string from the user, and one to do the actual searching.
In the sample output examples that follow, the user input appears after the colon character (‘:’) and is
followed by the enter key. (These are not displayed in bold as in previous labs) Note that there is a
single space after the colon (:) in each output line.
APS105 Page 3 of 3
Notes:
In the above example runs, you can see that the search sequence is input as a series of one-digit
numbers, but because there are no spaces between them, they would look like a multi-digit number if
you used the format specifier %d in scanf. To have scanf only read in a single digit at a time into an
integer, use the format specifier %1d - notice the extra ‘1’, which means just consume one digit from
the input stream.
Good Luck!