Starting from:

$29.99

Lab 3: Word Histogram

Lab 3: Word Histogram
Reading: Deitel & Deitel Chapter 2-5 Please watch this video first...
www.sciencechannel.com/tv-shows/through-the-wormhole/videos/through-the-wormhole-information-theory/ In this lab you will develop a software system to collect data from a portion of the book War of The Worlds by H.G. Wells to verify the experiments performed by Dr. Lawrence. Ultimately, Dr. Lawrence is trying to apply information theory to see if the chatter of dolphins constitutes a complex language. You will create a software system that will analyze English text by creating word frequency histograms. Some general program specifications are as follows: 1. You must develop a C++ class library to work with the main.cc file provided to you in Appendix A. Your libraries will be used to create a word frequency histogram from a text file that is provided to you on t-square called ProcessedWOW.dat. This file contains modified text for War of the Worlds; as you see in the file it contains only lower case letters and no puncuation marks. 2. Your program will create two output files that contain the histograms sorted in two different fashions -- one sort alphabetically and one sorted by frequency. 3. You must create at least TWO C++ classes, which are discussed in the next section. 4. We are limited to data structures that we have talked about so far; therefore, you can only use static C++ arrays to store your word frequency histograms. A discussion of a strategy to do this appears later in this lab. 5. You must have three specific files to manage your system. The header file histogram.h contains your class interfaces for all your classes, the source file histogram.cc will contain the implementations of your member functions of all your C++ classes, and the source file main.cc will contain the main function that is provided to you in Appendix A. The objectives of this lab are to give you practice: 1. Using basic C++ classes; 2. Creating basic arrays of user-defined objects; 3. Creating constructors and overloading constructors; 4. Using and creating set and get member functions in C++ classes; 5. Using C++ string objects; and 6. Using basic text file I/O objects and operators. This lab has been tested using the g++ compiler on deepthought.cc.gatech.edu. Please see Appendix B to see your turn in options.

C++ Class Requirements I would like for you to create at least two separate classes for your program. The specifications are below. WordUnit This class will be called WordUnit, and you will need to create a class with data members that correspond to a string that contains a single word of text and an integer that contains the number of times that the word appears in a given book. You can choose your own data member names. WordHistogram This class will be call WordHistogram, and it will contain the entire word histogram for the book that you are analyzing. A primary data member will be an array of WordUnits. We have not discussed dynamic arrays at this point, so I would like for you to have a static fixed array with a large number of entries. For this exercise, assume that you have 10,000 elements in your array. In addition, you will need to have an integer variable that contains the number of elements that you are storing in this array. Furthermore, I would like for you to have a string that contains the file name of the text file that contains your book. You may need other member functions, but as you see in Appendix A, you will definitely need to create the following member functions in the WordHistogram class. void makeHistogram() This member function belongs to the WordHistogram class and it creates the histogram from the name of the text file that is passed to the constructor. You will need to open the text file and populate the static array of WordUnits. Because we have not discussed dynamic arrays yet, please use an array over-allocation strategy as seen below. You will need to keep track of how much of the array is being used as you build the histogram.
Figure 1: Illustration of using an array with a fixed size to contain a list with an initially unknown length. void sortAlphaHistogram() This member function belongs to the WordHistogram class and it will sort the WordUnit array in alphabetical order according to the word in each WordUnit. This arrangement of the array will be useful if you want to quickly find a frequency of a given word.
Example of Array Over-Allocation Strategy
”a” ”the” “he” 5 12 3 size = 3
unused elements
max_size= 11
used elements
“” 0 0 0 0 0 0 0 0 “” “” “” “” “” “” “”


void sortFreqHistogram() When this member function is called, the WordUnit array will be sorted by the frequency parameter. This will be useful when making a plot of the histogram to see if a slope of -1 in a log-log plot describes the data. void exportHistogram(string filename) This member function in WordHistogram will create an output file that contains the word histogram. The argument of this function is a string that contains the name of the output file. For the file that is alphabetized by each word, the format of the output should look something like the following: a 937 abandoned 3 abandoning 1 abart 1 ability 1 ablaze 1 able 8 aboard 1 about 112 above 16 Please note that the word and its frequency are separated by a single space. C++ String Objects In this lab, you will need manipulate strings in a very basic way. For completeness, I have included in Appendix E a list of a variety of member functions for C++ string objects. You can use any of these you like, but I believe that you may find the following operators that can be used with C++ string objects more useful. string1 == string2 The equality operator can be used to compare two strings. If ALL characters are the same, then this operation returns a true value; otherwise, it will return false. string1 = string1 + string2; Both the assignment (=) operator and the addition (+) operator can be used with strings. When used with strings the + operators will concatenate the two string operands. string1 string2 The greater than (and less than) can both be used with C++ strings objects. One string is greater than the other when it appears later in an alphabetize list. For example, "cat" is greater than "apple" because "cat" is listed after "apple" in an alphabetized list.


string1[number] You can also use the indexing operator [] to access each character in the string. This is similar to accessing an array of characters with zero indexing. Input and Output Text Files For this lab, you will do some basic manipulation of text files. In this section, I will show you how to instantiate an input or output text file object. In addition, you can use the insertion stream (<<) and extraction stream () operators to send data to an output file or receive data from an input file, respectively. Preprocessor Directive You will need to have the following include statement in your header file that allows you to use the C++ standard library for I/O files. #include <fstream Instantiating Output File Objects To instantiate an output file object that you can use to manipulate your output text file, you will need something like the following. std::ofstream YourOutputFileObject("outputfile.dat", std::ios::out); The "outputfile.dat" name is arbitrary, and it will create a file in the local directory that you execute your program in. The std::ios::out is a designation that you are creating an output file; any existing file with the same name will be overwritten. Instantiating Input File Objects To instantiate an input file object from which you can read data, you can do the following: std::ifstream YourInputFileObject("inputfile.dat", std::ios::in); Like the output file example, the "inputfile.dat" name is arbitrary and specifies the name of the file in the local directory from which you would like to read. As you see in the main file, you can use this file object with the ! operator to check to see if the file is valid. Insertion Stream Operator Just like with the cout object, you can use the << operator to write data to an output file. You use the object name in place of cout in the following way. YourOutputFileObject << "Hello File! " << std::endl; Extraction Stream Operator Furthermore, just like the cin object, you can use the operator to read data from an input file. You can use the object name in the following way. YourInputFileObject string1; This command will read a single string from the input file that is delineated by white space. For example, if the input file has the following text: hello from professor Snape


The above line would have "hello" stored in string1 with no spaces. Furthermore, the operation itself will return a true value if a string is successfully read in from the input file. This can enable you to embed this statement in a while loop condition to access all the strings in a file sequentially. For example, the following while loop will continue until all the strings have been read into the program one at a time. while (YourInputFileObject string1) { //manipulate value in string1

More products