Newton's Method

The problem I will discuss in this post is finding the zeros, or roots, of a function. This is a problem most algebra students are familiar with. Given a function $f(x)$ find all $x$ such that $f(x)=0$. This is a simple problem for polynomials with powers less than 5. Most people are familiar with the quadratic formula:

$$ \dfrac{-b\pm \sqrt{b^2-4ac}}{2a} $$

For an polynomial of the form $ax^2+bx+c$ we can find all the roots using this formula. Similar formulas exist for cubic and quartic functions, but interestingly no such formula exists for quintic or higher order polynomials. See Fred Akalin's post for an awesome explanation for why quintic polynomials are unsolvable.

So how do we find the roots of these functions? Enter the Newton-Raphson method. As you may recall, the derivative of a function at a point $x$ is the rate at which the function is changing at that point. An intuitive example of this is distance and velocity. Velocity is the rate of change in distance. The derivative can also be defined as the slope of the tangent line at a given point. I was going to try to explain what a tangent line is, but I think it is easier with pictures. Let's talk about the tangent line of $f(x)=x^2-1$ at the point $(2,3)$.

from IPython.display import HTML, display
import pandas as pd
import numpy as np
import bokeh.plotting as bp
from bokeh.models import Label

def show(p, filename='plot.html'):
    bp.output_file(filename)
    bp.save(p)
    display(HTML(filename))


def f(x):
    return (x)**2-1

def tangent(x):
    #y-y0=m(x-x0) => y=mx-mx0+y0
    # m = 2*2 = 4
    return 4*x +(3-4*2)

x = np.linspace(-5,5,100)
p = bp.figure(title="f(x)=x\u00b2-1",
              x_axis_label="x",
              y_axis_label="f(x)")
p.line(x, f(x))
p.line(x[50:], tangent(x)[50:], color='red')
label = Label(x=5/4,y=-.5,text="<- x-intercept")
p.add_layout(label)
show(p)

Bokeh Plot

The tangent line just touches the function at the given point. If you zoom in on the point $(2,3)$, you will see that the tangent line is a decent approximation of the function in a neighborhood around the point, but it only touches at that point (Actually if you zoom in far enough you'll see that the lines don't touch due to rounding error, but hey it is close enough). That is because the rate of change (the slope) of the function at that point matches the slope of the line.

So what does this have to do with finding roots? What do you notice about the x-intercept of the red line? It gets closer to a root, $(1,0)$, of the function. We can repeat this process using x-intercept as our new guess for the root. As we continue to iterate, we will converge on the root. There are exceptions, where this method will not converge, but I won't cover them here.

So, we need a few things before we get to the algorithm. First, we need to be able to calculate the derivative of a function to get the slope of the tangent line. The derivative of a function is defined as

$$ \lim_{h\rightarrow 0}{\dfrac{f(x+h)-f(x)}{h}} $$

Limits are tricky on computers. We can only approximate them, and care should be taken to account for edge cases where the derivative may not exist, etc. But here we are looking at nice continuous, differentiable functions. So the following function will work nicely

def f_prime(f, x, h=0.000000001):
    return (f(x+h) - f(x))/h

This function takes another function $f$ and a point $x$ as inputs. (along with an optional argument $h$ which defaults to something close to 0). It evaulates the limit with an $h$ that approaches 0. If we want the approximation to be more exact we can make $h$ smaller. Bear in mind eventually you will get an error if you keep reducing $h$, since the expression will overflow.

Next, we need the equation of the tangent line. You may remember the standard form for the equation of a line

$$ y=mx+b $$

There is another way to represent a line called the point-slope form. Given the slope $m$ and a point $(x_0,y_0)$

$$ (y-y_0)=m(x-x_0) $$

Rearranging this gives us the equation of the line in standard form.

$$ y = mx+(y_0-mx_0) $$

By evaluating the function and the derivative at the point $(x_0)$, we get $y_0$ and $m$ respectively. This is enough information to get the equation of the tangent line. We then update our guess, by finding the x-intercept. This is done by setting $y=0$ in the above equation and solving for $x$

$$\begin{align} 0 &= mx+(y_0-mx_0)\ -mx &= y_0-mx_0 \ x &= -\dfrac{y_0-mx_0}{m} \ x &= \dfrac{mx_0}{m}-\dfrac{y_0}{m} \ x &= x_0-\dfrac{y_0}{m} \ \end{align}$$

This looks a lot like a recursive sequence, and it is :D We can write the sequence as follows

$$x_{n+1}=x_n-\dfrac{f(x_n)}{f'(x_n)}$$

I know I changed things up, but it is the same as before. $f(x_0)=y_0$ and $f'(x_0)=m$ or the slope of the tangent line at $x_0$.

Now we are finally ready to look at the algorithm for the Newton-Raphson method. I should note that this method is actually quite different from the ones originally proposed by Newton and Raphson. They used algebraic methods as opposed to calculus. Just a fun fact.

Okay, so here it is Newton's Method for finding roots:

MAX_DEPTH = 1000
def newtons_method(f, x0, tol=1e-20, depth=0):
    if np.absolute(f(x0)) < tol:
        return x0
    elif depth >= MAX_DEPTH:
        return "Failed to converge"
    else:
        return newtons_method(f, x0-(f(x0)/f_prime(f,x0)), depth=depth+1)

We input a function f, and an initial guess x0. The tolerance and depth arguments are to check for convergance. We recursively iterate using our equation from above as the updated guess. So let's test it out. We know the roots of $x^2-1$ are 1 and -1.

newtons_method(f,2)

1.0

Great! How do we find the other root? Well, in a nutshell, we guess and check.

newtons_method(f,-25)

-1.0

We use different starting points, until we find all of the roots. This aspect of the algorithm creates the concept of basins of attraction, meaning certain ranges of x will converge to one root as opposed to another. I was going to make graphs of this, but I want to get this published, so maybe in a later post.

This algorithm was my first introduction to mathematical programming while I was an undergrad studying math, and it is what set me on the course to study computer science in graduate school. So I enjoyed remembering this little algorithm, and I hope you enjoyed reading about it.