Starting from:

$30

CptS 350 Homework #1


CptS 350
Homework #1

3.
(A) An algorithm M could help scientists to develop new medical drugs to fight tough diseases by performing
computational analysis on protein structures which could be stored in a database that is used to compute structural
analysis that finds similar functions of proteins. All of this information could then be ued to create new drugs by taking
advantage of the similar protien functions that the database provides. Computational analysis is much less expensive
than traditional expirmental techniques.
(B) Using an image file, such as jpeg, to store the algorithm input in memory would be the most beneficial way as it will
be easiest for the algorithm to determine similarities from.
(C)
Definition 1: A similarity metric based on the two protein molecule’s amino acid sequences, computing their total match
rate in the sequence.
Pros: Quick and inexpensive, low cost both computing and timewise, can be used to develop structure for both
Cons: Does not provide functional similarity, also doesn’t provide the quickest manner for structural similarity
Definition 2: A similarity metric based on the two-protein molecule’s 3D structure, visibly compared through defining
pictures of each other to develop a match rate.
Pros: Can be used with machine learning to develop predictors, can quickly match structural similarity which also helps
functional similarity
Cons: Requires a database of known proteins to begin with
Algorithm 1: M = P0 + P1 * x1
This algorithm follows Linear regression model of machine learning, in which M represents the weight used for
prediction, while P0 and P1 represent the bias coefficient of the width and height columns, which is represented by x1.
This allows the use of statistics to estimate the coefficients but requires that the statistical properties of the data such as
deviations, correlations, and more to be calculated beforehand. This algorithm benefits from pictures as it can transform
them into a noise map and be used to create predictors of other protein structures from an initial sample.
Algorithm 2: M = a – (gamma)(delta)f(a)
This algorithm follows a Stochastic Gradient Descent for Linear Regression in which M is the models next step in the
learning process, a is the current step in the learning process, and f(a) is the direction of steepest descent. This approach
versus other forms of gradient descent uses its training set with each step allowing for more information to be
processed during its learning course. This matches perfect with the manner in which protein structures are matched in
order to perform more accurate predicting of protein structures and their transforms. The storage of jpegs is again used
in the same manner as the first algorithm for machine learning

More products