$30
Lab 5 – Large Number Arithmetic ECE 375
ECE 375: Computer Organization and
Assembly Language Programming
Lab 5 – Large Number Arithmetic
SECTION OVERVIEW
Complete the following objectives:
• Understand and use arithmetic/ALU instructions.
• Manipulate and handle large (> 8 bits) numbers.
• Create and handle functions and subroutines.
• Verify the correctness of large number arithmetic functions via simulation.
PRELAB
To complete this prelab, you may find it useful to look at the AVR Starter Guide
and the AVR Instruction Set Manual. If you consult any online sources to help
answer the prelab questions, you must list them as references in your prelab.
1. For this lab, you will be asked to perform arithmetic operations on numbers
that are larger than 8 bits. To be successful at this, you will need to understand and utilize many of the various arithmetic operations supported by
the AVR 8-bit instruction set. List and describe all of the addition, subtraction, and multiplication instructions (i.e. ADC, SUBI, FMUL, etc.) available in
AVR’s 8-bit instruction set.
2. Write pseudocode for an 8-bit AVR function that will take two 16-bit numbers (from data memory addresses $0111:$0110 and $0121:$0120), add
them together, and then store the 16-bit result (in data memory addresses
$0101:$0100). (Note: The syntax “$0111:$0110” is meant to specify that the
function will expect little-endian data, where the highest byte of a multi-byte
value is stored in the highest address of its range of addresses.)
3. Write pseudocode for an 8-bit AVR function that will take the 16-bit number
in $0111:$0110, subtract it from the 16-bit number in $0121:$0120, and
then store the 16-bit result into $0101:$0100.
BACKGROUND
Arithmetic calculations like addition and subtraction are fundamental operations
in many computer programs. Most programming languages support several different data types that can be used to perform arithmetic calculations. As an
8-bit microcontroller, the ATmega128 primarily uses 8-bit registers and has several different instructions available to perform basic arithmetic operations on
8-bit values. Some examples of instructions that add or subtract 8-bit values
contained in registers are:
ADD R0, R1 ; R0 <- R0 + R1
ADC R0, R1 ; R0 <- R0 + R1 + C
SUB R0, R1 ; R0 <- R0 - R1
If we want to perform arithmetic operations on values that are too large to
represent with only 8 bits, but still use the same 8-bit microcontroller for these
large number operations, then we need to develop a procedure for manipulating
multiple 8-bit registers to produce the correct result.
Multi-byte Addition
This example demonstrates how 8-bit operations can be used to perform an
addition of two 16-bit numbers. (The layout of this example should look familiar;
it is meant to look like the way you would usually write out an addition by hand.)
Possible Carry-out of R0 + R2 Addition -> 1
R1 R0
Possible Carry-out of R1 + R3 Addition -> 1 R3 R2
+ -----------
R4 R3 R2
Initially, one of the 16-bit values is located in registers R1:R0 and the other
value is in R3:R2. First, we add the 8-bit values in R0 and R2 together, and save
the result in R2. Next, we add the contents of R1 and R3 together, account for
a possible carry-out bit from the previous operation, and then save this result in
R3. Finally, if the result of the second addition generates a carry-out bit, then
that bit is stored into a third result register: R4.
Why is an entire third result register necessary? Even though our 16-bit addition can result in at most a 17-bit result, the ATmega128’s registers and data
memory words have an intrinsic size of 8 bits. As a consequence, we must handle
our 17-bit result as if it’s a 24-bit result, even though the most significant byte
only has a value of either 1 or 0 (depending on if there was a carry-out).
©2021 Oregon State University Winter 2021 Page 1
Lab 5 – Large Number Arithmetic ECE 375
Multi-byte Subtraction
Subtracting one 16-bit number from another 16-bit number is similar to the 16-
bit addition. First, subtract the low byte of the second number from the low byte
of the first number. Then, subtract the high byte of the second number from the
high byte of the first number, accounting for a possible borrow that may have
occurred during the low byte subtraction.
When performing 16-bit signed subtraction, the result sometimes requires a
17th bit in order to get the result’s sign correct. For this lab, we will just deal with
the simpler unsigned subtraction, and we will also assume that the subtraction
result will be positive (i.e., the first operand’s magnitude will be greater than
the second operand’s magnitude). Therefore, our 16-bit subtraction result
can be contained in just two result registers, unlike the 16-bit addition.
Multiplication
The AVR 8-bit instruction set contains a special instruction for performing unsigned multiplication: MUL. This instruction multiplies two 8-bit registers, and
stores the (up to) 16-bit result in registers R1:R0. This instruction is a fast and
efficient way to multiply two 8-bit numbers. Unfortunately, multiplying numbers
larger than 8 bits wide isn’t as simple as just using the MUL instruction.
The easiest way to understand how to multiply large binary numbers is to
visualize the “pencil & paper method”, which you were likely taught when you
first learned how to multiply multi-digit decimal numbers. This method is also
known as the sum-of-products technique. The following diagram illustrates using
this typical method of multiplication for decimal numbers:
24
* 76
------
24 (4 * 6 = 24)
12_ (2 * 6 = 12, but aligned with the tens’ place)
28_ (4 * 7 = 28, but aligned with the tens’ place)
+ 14__ (2 * 7 = 14, but aligned with the hundreds’ place)
------
1824
This method multiplies single decimal digits by single decimal digits, and then
sums all of the partial products to get the final result. This same technique can
be used for multiplying large binary numbers. Since there is an instruction that
implements an 8-bit multiplication, MUL, we can partition our large numbers into
8-bit (1 byte) portions, perform a series of one byte by one byte multiplications,
and sum the partial products as we did before. The diagram below shows this
sum-of-products method used to multiply two 16-bit numbers (A2:A1 and B2:B1):
A2 A1
* B2 B1
-----------------
H11 L11 (A1 * B1 = H11:L11)
H21 L21 ___ (A2 * B1 = H21:L21, but properly aligned)
H12 L12 ___ (A1 * B2 = H12:L12, but properly aligned)
+ H22 L22 ___ ___ (A2 * B2 = H22:L22, but properly aligned)
-----------------
P4 P3 P2 P1
The first thing you should notice is that the result of multiplying two 16-bit
(2 byte) numbers yields an (up to) 32-bit (4 byte) result. In general, when
multiplying two binary numbers, you will need to allocate enough room for a
result that can be twice the size of the operands.
H and L signify the high and low result bytes of each 8-bit multiplication, and
the numbers after H and L indicate which bytes of the 16-bit operands were used
for that 8-bit multiplication. For example, L21 represents the low result byte of
16-bit partial product produced by the A2 * B1 multiplication.
Finally, it is worth noticing that the four result bytes are described by the
following expressions:
P1 = L11
P2 = H11 + L21 + L12
P3 = H21 + H12 + L22 + any carries from P2 additions
P4 = H22 + any carries from P3 additions
PROCEDURE
First, you need to implement three different large number arithmetic functions:
ADD16 (a 16-bit addition), SUB16 (a 16-bit subtraction), and MUL24 (a 24-bit
multiplication). A pre-written MUL16 function has been provided in the skeleton
file; you should use it as the basis for MUL24 (or complete the challenge instead,
which has you implement an alternate approach to multiplication).
Each of these functions should read its input operands from data memory, and
also store its result in data memory. The skeleton file provides test values for
each of these functions, which you must use to demonstrate that your functions
©2021 Oregon State University Winter 2021 Page 2
Lab 5 – Large Number Arithmetic ECE 375
are working correctly. You have to declare these test values in program memory
and then have your program load them into data memory. Entering test values
directly into the simulators’s memory is not allowed. In general, memory organization is left up to you in this lab, so come up with a system that makes
it easy for you to debug your functions and quickly verify that your answers are
correct.
After completing and testing ADD16, SUB16, and MUL24, you need to write
a fourth function named COMPOUND. COMPOUND uses the first three functions to
evaluate the expression ((D − E) + F)
2
, where D, E, and F are all unsigned 16-
bit numbers. The test values you must use for D, E, and F are provided in the
skeleton file. COMPOUND should be written to automatically provide the correct
inputs to ADD16, SUB16, and MUL24, so that you can run COMPOUND all at once
without pausing the simulator to enter values at each step. Please note that
the D, E, and F values are different than the values you used to individually
verify ADD16, SUB16, and MUL24.
This is a simulation-based lab; to demonstrate that your four functions work
correctly, have the TA observe the results of ADD16, SUB16, MUL24, and COMPOUND
and confirm that everything is working correctly. You have to understand using
memory properly. TAs will check your functions with 4 break points (ADD16,
SUB16, MUL24, and COMPOUND).
STUDY QUESTIONS / REPORT
A full lab write-up is required for this lab. When writing your report, be sure
to include a summary that details what you did and why, and explains
any problems you may have encountered. Your write-up and code must be
submitted by the beginning of next week’s lab. Remember, NO LATE WORK
IS ACCEPTED.
Study Questions
1. Although we dealt with unsigned numbers in this lab, the ATmega128 microcontroller also has some features which are important for performing signed
arithmetic. What does the V flag in the status register indicate? Give an
example (in binary) of two 8-bit values that will cause the V flag to be set
when they are added together.
2. In the skeleton file for this lab, the .BYTE directive was used to allocate
some data memory locations for MUL16’s input operands and result. What
are some benefits of using this directive to organize your data memory, rather
than just declaring some address constants using the .EQU directive?
CHALLENGE
To receive challenge credit for this lab, you must implement a 24-bit multiplication using the following shift-and-add technique, instead of using the sum-ofproducts technique described in the main part of the lab handout. If you complete the challenge section, you do not need to implement the sum-of-products
multiply, but you do still need to implement ADD16, SUB16, and COMPOUND.
Although the sum-of-products technique is easier for humans to use when performing multiplication, it is not the most efficient way of multiplying binary
numbers, especially very large (e.g., 1024-bit) numbers. So, a different technique
can be used, one that takes into account the inherent properties of a base-2
number system. This technique is known as shift-and-add multiplication.
In basic mathematics terminology, multiplication has two operands, the multiplicand (i.e., the operand to be multiplied) and the multiplier. When two decimal
numbers are multiplied, the multiplication is carried out by essentially adding
the multiplicand to itself over and over again a certain number of times; this
number is specified by the multiplier.
This may seem fairly obvious, until you consider the analogous operation in
binary. In binary, the bits of the multiplier operand are used to determine whether
the multiplicand is added to itself or not. The example diagrams below will
demonstrate both decimal and binary shift-and-add multiplications.
Example: Decimal shift-and-add
23 <- Multiplicand
* 16 <- Multiplier
-----
138 (6 * 23 = 23 + 23 + 23 + 23 + 23 + 23 = 138)
+ 230 (1 * 23 = 23, but then shift left once to get 230)
-----
368 <- Product
In the above example, the decimal digit in the ones’ place of the multiplier is
6, and therefore 6 instances of the multiplicand are added together to yield the
value 138. The decimal digit in the tens’ place of the multiplier is 1, which yields
a value of 23 that then needs to be shifted left by one decimal digit to ultimately
©2021 Oregon State University Winter 2021 Page 3
Lab 5 – Large Number Arithmetic ECE 375
yield 230. Adding the two partial products together results in our product, 368.
The following expression explains why this works:
23*16 = (23*6) + (23*10)
Example: Binary shift-and-add
Binary(4-bit): 1011 <- Multiplicand
* 1101 <- Multiplier
Multiplier ---------
(LSB) 1 0000 1011 <- 1 * 1011 = 1011, then shift left 0 times
0 + 0000 0000 <- 0 * 1011 = 0000, then shift left 1 time
---------
0000 1011 <- Add results together
1 + 0010 1100 <- 1 * 1011 = 1011, then shift left 2 times
---------
0011 0111 <- Add results together
(MSB) 1 + 0101 1000 <- 1 * 1011 = 1011, then shift left 3 times
---------
1000 1111 <- Add results together for final product
In this example, we’ve performed the same shift-and-add technique as before,
but in a binary configuration that uses 4-bit values. The basic principle is that
we shift through the multiply. When a 1 is received, we add the multiplicand to
the low bit of the result, if a zero is received we do nothing. We then shift the
entire result one bit to the left, thus essentially multiplying the result by 2 and
repeat the process. The following binary expression explains why this works:
1011 * 1101 = (1011 * 0001) + (1011 * 0000) + (1011 * 0100) + (1011 * 1000)
To make things even clearer, the right-hand side of the above binary expression
is equivalent to:
1011 * 1101 = (1011 << 0) + (0000 << 1) + (1011 << 2) + (1011 << 3)
Although this method is sound and easily understandable, there is a more
efficient method. Assume you are running on a 4-bit system where the registers
are 4 bits wide. The binary method shown above would require 4 registers. A
better approach would be to combine the lower result register with the multiplier
register, so that you can shift all the registers at once and only use 3 registers
instead of 4. In order perform the multiplication in this more space-efficient
way, you must shift everything to the right (instead of to the left), and also
rotate through the carry flag. If the carry flag is set after a rotation, then add
the multiplicand to the upper result register, and proceed to the next rotation;
otherwise, skip the addition and simply rotate again. Keep in mind that is very
important to rotate to the right, rather than just shift, so that whatever is in the
carry flag will be shifted into the result MSB, and the result LSB will be shifted
out into the carry flag. The following example illustrates this more efficient
method:
Example: Binary shift-and-add (more efficient)
1011 <- Multiplicand
* 1101 <- Multiplier
Carry ---------
0000 1101 <- Load Multiplier into low register
1 0000 0110 <- Rotate right through carry
1011 <- Carry is set, so add multiplicand
---------
0 1011 0110 <- Result of addition
0 0101 1011 <- Rotate right through carry
---- <- Don’t add since carry is 0
---------
0 0101 1011 <- Result thus far
1 0010 1101 <- Rotate right through carry
1011 <- Add multiplicand since carry is set
---------
0 1101 1101 <- Result of addition
1 0110 1110 <- Rotate right through carry
1011 <- Add multiplicand since carry is set
---------
1 0001 1110 <- Result of addition
0 1000 1111 <- Result after final shift, note that a ’1’
was shifted in, because it was the carry
that was set from the last addition
As you can see, this approach to the shift-and-add technique can be easily
implemented with a simple for loop, with a specific number of iterations that
depends on the size of the data. In this example, we used 4-bit numbers, so we
looped 4 times. Inside the loop, there is a simple rotate right, and an addition
©2021 Oregon State University Winter 2021 Page 4
Lab 5 – Large Number Arithmetic ECE 375
depending on the status of the carry bit after the rotation. This loop implementation of shift-and-add can be easily used for binary numbers of any width
with minimal effort, and is actually used internally for multiplication in most
microarchitectures.
©2021 Oregon State University Winter 2021 Page 5