2 min read

Gradient Descent

I am taking the Machine Learning course on Coursera being taught by Andrew Ng. It is turning out to be useful so far, and he has presented the material clearly. It’s a nice introduction to the Machine Learning/Computer Science language, since I come from a statistics background.

I learned about gradient descent today for simple linear regression. The following is my code in R and I compare it to the lm function in base R.

I am using the Prostate dataset from the lasso2 package. The model I am fitting is:

$$ lpsa = \beta_0 + \beta_1 \times lcavol $$

#prostate cancer data set
library(lasso2)
## R Package to solve regression problems while imposing
## 	 an L1 constraint on the parameters. Based on S-plus Release 2.1
## Copyright (C) 1998, 1999
## Justin Lokhorst   
## Berwin A. Turlach 
## Bill Venables     
## 
## Copyright (C) 2002
## Martin Maechler 
data(Prostate)

# hypothesis
hypothesis <- function(x, theta0,theta1){
    h <- theta0 + theta1*x
    return(h)
}

# Jacobian
deriv <- function(x,y,theta0,theta1){
    dt0 <- (length(x))^(-1)* sum((hypothesis(x,theta0,theta1)-y))
    dt1 <- (length(x))^(-1)* t(x) %*% (hypothesis(x,theta0,theta1)-y)
    return(c(dt0,dt1))
}

theta <- c(0,0)
alpha <- 0.5
X <- Prostate$lcavol
Y <- Prostate$lpsa
i=1
theta.star <- deriv(Prostate$lcavol,Prostate$lpsa,theta[1],theta[2])
# set convergence threshold
threshold <- 1e-7
# logical to check if threshold has been achieved
continue=TRUE

while (continue){
    theta[1] <- theta.star[1] - alpha*deriv(x=X,y=Y,theta.star[1],theta.star[2])[1]
    theta[2] <- theta.star[2] - alpha*deriv(x=X,y=Y,theta.star[1],theta.star[2])[2]
    continue <- (abs((theta.star-theta)[1])>threshold & abs((theta.star-theta)[2])>threshold)
    theta.star[1] <- theta[1]
    theta.star[2] <- theta[2]
    i=i+1
}

# number of iterations
i
## [1] 214
# beta0 and beta1
theta.star
## [1] 1.5072975 0.7193205
# compare to lm
fit <- lm(lpsa~lcavol, data=Prostate)
summary(fit)
## 
## Call:
## lm(formula = lpsa ~ lcavol, data = Prostate)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.67624 -0.41648  0.09859  0.50709  1.89672 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.50730    0.12194   12.36   <2e-16 ***
## lcavol       0.71932    0.06819   10.55   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7875 on 95 degrees of freedom
## Multiple R-squared:  0.5394,	Adjusted R-squared:  0.5346 
## F-statistic: 111.3 on 1 and 95 DF,  p-value: < 2.2e-16