Starting from:

$30

Lab 6: Linear Regression

Lab    6:    Linear    Regression
  Follow    ALL    
instructions    otherwise    you    may lose    points.    In    this    lab,    you    will    be    finding    the    best    fit    line    using    two    
methods.        You    will    need    to    use    numpy,    pandas,    and    matplotlib    for    this    lab.
Background (least squared regression):
Least squared regression is a popular method to find the line of best fit. Although I wanted to go
over how to do it in class, we don’t have time to do it. I’ll do my best to explain it through these
words and examples on this paper L
The goal is to calculate the slope (m) and y-intercept (b) in the equation of the line:
� = �� + �
The steps to compute the line of best fit for � ordered pairs:
1. For each point (x, y), calculate x2 and xy
2. Find ∑ � , ∑ � , ∑ �! , ∑ ��
3. Calculate the slope (N is the number of ordered pairs):
� = � ∑ �� − ∑ � ∑ �
� ∑ �! − (∑ �)!
� = (20)(16718.5006) − (505.748847)(507.922204)
20(16655.073) − 505.748847! = 1.00219
4. Calculate the y-intercept:
� = ∑ � − � ∑ �

� = 507.922204 − 1.00219(505.748847)
20 = 0.05327206
5. Make our equation � = �� + �
� = 1.00219� + 0.05327206
The graph is shown below (I used Excel not Python). The line of best fit is graphed and so are
the points that we used to find the line of best fit.
So, now that you’ve seen the algebraic method, let’s see the linear algebra method!
The setup is based on this matrix equation:
� = � ;

� <
� is a nx1 matrix of y-coordinates
X is an nx2 matrix where the first column is the x-coordinate. The second column is 1 for matrix
multiplication purposes.
To find the slope (m) and the y-intercept (b), use…
;

� < = (�"�)#$�"�
Let’s use the same points as last time to find the best fit line with this method.
Note: X (not x) is a matrix and it looks like this:
The first column has all of the x’s (like in the previous example). The second column is full of
1’s. This is for the y-intercept.
The y is the same as in the last example.
The calculations are as follows:
�"� = =16655 505.749
505.75 20 >
(�"�)#$ = =
0.00025867 −0.006541
−0.006541 0.21540568>
�"� = ;
16718.5006
507.922204<
;

� < = (�"�)#$�"� = ;
1.0022
0.0533<
And we get the same results! J
Task:
1. Take    a    close    look    at    the    lin_reg.py file.    There    are    four empty    functions:    
least_sq(file_name) and    mat_least_sq(file_name) and
predict(file_name, x) and plot_reg(file_name, using_matrix).    Read    
through    all of    their    descriptions    carefully.    Remember,    you    will    lose    points    if    you    do    
not    follow    the    instructions.    We    are    using    a    grading    script
Summary    of    function    tasks
least_sq(file_name):
Given    the    csv    file_name,    find    the    slope    and    y-intercept    of    the    data    using    algebraic    least    
squares    (the    first    linear    regression    presented).    You    need    to    return    the    slope    and    yintercept    IN    THAT    ORDER.    Round    the    slope    and    y-intercept    to    four    decimal    places.
mat_least_sq(file_name):
Given    the    csv    file_name,    find    the    slope    and    y-intercept    of    the    data    using    linear    
algebraic    least    squares    using    matrices    (the    second    linear    regression    presented).    You    
need    to    return    the    slope    and    y-intercept    IN    THAT    ORDER.    Round    the    slope    and    yintercept    to    four    decimal    places.
predict(file_name, x):
Given    the    csv    file_name and    an    input    value    x,    predict    what    the    output    would    be    using    
the    equation    that    is    derived    from    mat_least_sq().    This    means    that    you    should    be    
calling    mat_least_sq() in    this    function.    Round    the    predicted    output    to    four    decimal    
places    before    returning    the    value.
plot_reg(file_name, using_matrix):
Given    the    csv    file_name and    an    indicator    of    which    linear    regression    method    to    use    
using_matrix,    output    a    graph    of    the    data    points    and    the    line    of    best    fit.
• If    using_matrix=False,    then    you    should    be    plotting    your    results    from    
least_sq.    You    should    be    using    red for    everything    in    the    graph    with    X markers    for    
the    data    points.
• If    using_matrix=True,    then    you    should    be    plotting    your    results    from    
mat_least_sq.    You    can    use    any    color    but    the    default    blue and    red.    You    can    use    any    
data    point    marker    except    for    the    default    dot    and    X.
plot_reg() should    not    return    anything.    Your    graphs    should    also    contain    the    
following:
• Labeled    x    axis
• Labeled    y    axis
• Graph    Title
• Legend    (see    example    for    details)
Some    important    notes:
• For    consistency’s    sake,    do    not    round    until    the    very    end.    Meaning    you    should    not    
round    anything    until    you    return    your    answers.
• Hint:    to    plot    the    best    fit    line,    find    the    smallest    and    largest    x-coordinate.    Plug    these    xcoordinates    into    the    linear    equation    and    plot    them.
• If    you    want    to    create    extra    functions/methods to    assist    you,    feel    free    to    do    so.    
However,    we    will    only    be    testing    the    three    functions    that    are    originally    in    the    file.
• If    you    use    any    library’s    linear    regression    or    least    squares    method    function,    you    will    
get    an    automatic    zero.    You    must    implement    this    on    your    own!    
2. Your    job    is    to    implement    all    four of    these    functions    so    that    it    passes    all    test    cases.    We    
provide    one    csv    file    for    you    to    test    on (data.csv),    but    we    will    be    using    other    data    
sets    and    csv    files    to    check    if    your    work    is    correct.    
3. By    running    the    test    case    provided    (data.csv),    you    should    get    the    following    
results:
Note: your “matrix using least squares” graph may have different colors and
markers from mine.
In NO CASE should your graphs have the dot marker or the blue color shown
above!
4. If you feel confident in your program so far, run your program after changing the test
case’s csv_file from    “data.csv” to    “data2.csv”
5. Take    screenshots    of    the    two    graphs    you    obtain    (one    from    using    algebraic    least    
squares    and    the    other    from    matrix    least    squares).    Put    these    two    screenshots    in    a    pdf
or    word file.    You    will    be    submitting    this    with    your    py    and    txt    files
6. After    completing    these    functions,    comment    out    the    test    cases    (or    delete    them)    or    
else    the    grading    script    will    pick    it    up    and    mark    your    program    as    incorrect. Ensure    
that    you    have    commented    out    or    deleted    ALL    print    statements.    You    risk    losing    
points    if    your    file    prints    anything.
7. Convert    your    lin_reg.py file    to    a    .txt file.    Submit    your    lin_reg.py file    and    
your .txt file    AND    YOUR    PDF on BeachBoard.    Do    NOT    submit    it    in    compressed    
folder. IN    TOTAL,    YOU    SHOULD    BE    SUBMITTING    THREE    FILES!
Some helpful functions
Function name What it does
round(x, y) Rounds the value, x, to y decimal places:
Example: round(1.23456, 3) => 1.235
matrix_name.T Transposes matrix
np.ones(num) Creates a vector full of ones. There will be num
ones.
Example:    np.ones(3) => s
1
1
1
t
np.column_stack((col1, col2)) Concatenates two 1d numpy arrays to make a 2d
numpy array.
If � = s
1
2
3
t ��� � = s
1
1
1
t
np.column_stack(x,b) => s
1 1
2 1
3 1
t
np.linalg.inv(mat_name) Finds the inverse of the matrix mat_name
Grading    rubric:    
To    achieve    any    points,    your    submission    must    have    the    following.    Anything    missing    from    
this    list    will    result    in    an    automatic    zero.    NO    EXCEPTIONS!
• Submit    everything:    py    file,    txt    file,    and    pdf    file
• Program    has    no    errors    (infinite    loops,    syntax    errors,    logical    errors,    etc.)    that    
terminates    the    program
Please    note    that    if    you    change    the    function    headers    or    if    you    do    not    return    the    proper    
outputs    according    to    the    function    requirements,    you    risk    losing    all    points    for    those    test    
cases.    
Points     Requirement    
5 Submission    is    correct.    All    three    files    are    part    of    submission    (py    file,    txt    
file,    and    pdf    file)- All    or    nothing    
4 Graphs    from    pdf    file    (testing    data2.csv)    are    correct- 2    points    each
16 Implemented    least_sq correctly (four    other    cases    not    including    
data.csv    and    data2.csv)
16 Implemented    mat_least_sq correctly (four    other    cases    not    including    
data.csv    and    data2.csv)
8 Implemented    predict correctly    (four    other    cases    not    including    data.csv    
and    data2.csv)
8 Implemented    plot_reg correctly.    Remember    that    least_sq and    
mat_least_sq should    be    called    here.    (four    other    cases    not    including    
data.csv    and    data2.csv)
8 Graphs    have    proper    x-axis    labels,    y-axis    labels,    titles,    and    legends (1
point    each)
5 Passes    original    test    case    (test    cases    on    python    file    have    been    commented    
out    too)- all    or    nothing
TOTAL:    70

More products