Analysis of Algorithms

Recall, that an algorithm is a method for solving a problem or accomplishing some task expressed as a sequence of steps that are suitable for execution by a computer.

We are interested in designing "good" algorithms -- that is to say, algorithms that don't take forever to run, and don't completely exhaust the resources of our machines, specifically with regard to memory.

With regard to the first consideration, the running time of an algorithm gives us a means to compare two algorithms to decide which is faster. However, things are more involved than just running the algorithm once or twice and timing how long it takes to finish. First, we note that running time typically increases with the problem size (typically, either the size of the input or value of some required argument). Second, running time is affected by the hardware and software environment in which the algorithm is executed.

As such, we focus on the relationship between the running time and the input size. In particular, we are interested in how fast the running time is growing as the input size is increasing -- as a faster growth rate means that we will run into long running times more quickly and -- if these running times become long enough -- may not be able to deal at all with problems past a certain size. More generally, understanding this relationship enables one to make reasonable predictions for how long a program will run before it finishes.

One way to analyze an algorithm is to do so experimentally, using the scientific method. Recall, with the scientific method one formulates some hypothesis consistent with initial observations, uses that hypothesis to make falsifiable predictions, and then verifies those predictions with experiments.

To apply this to running times, we:

  1. Write a program that implements the algorithm.
  2. Run the program with inputs of various sizes and compositions, using a method like System.currentTimeMillis() to get an accurate measure of the actual running times.
  3. Plot the results, and formulate a hypothesis for how quickly the running time increases as a function of the problem size.
  4. Make predictions for the running times associated with other inputs, and verify these by again running and timing the program.

Analyzing algorithms experimentally, of course, has its own set of problems:

A better way to analyze an algorithm is to realize that the running time of a program is determined primarily by two factors:

To help with the quantification of the above, we first make a definition: A primitive operation is defined to be an operation that corresponds to a low-level, basic computation with a constant execution time. Examples of primitive operations include declaring a variable, assigning a value to a variable, deciding if one integer is larger than another, accessing the element of an array at some index $i$, finding the length of an array, etc.

Then, to analyze an algorithm, we simply determine the frequency of primitive operations, and characterize this as a function of the input size, $n$. This function is referred to as the Cost Function, and is denoted by $C(n)$. In finding $C(n)$, we have the benefit of taking into account all possible inputs, and our result is independent of any hardware/software environment.

Given that an algorithm may run faster on some inputs than it does on others (even with the same input size), we have different ways of counting the frequency of whichever primitive operations we seek to analyze. Specifically, we can look at:

Let's take a look at some examples of finding running-time cost functions for some primitive operations of interest...

Example 1

sum = 0.0;
for (int i = 0; i < n; i++) {
   sum += array[i];  // <----- primitive operation of interest

Ignoring the update to the loop variable $i$, we note the marked statement above is executed $n$ times, so the cost function is $C(n) = n$.

Example 2

sum = 0.0;
for (int i = 0; i < n; i += 2) {
    sum += array[i];    // <----- primitive operation of interest

Here however, since the counter variable $i$ is going up by $2$ with each pass through the loop, the number of times the marked statement above gets executed this time is halved -- leading to $C(n) = n/2$.

Example 3

for (int i = 0; i < n; i++) {
   for (int j = 0; j < n; j++) {
      int x = i*j;    // <----- primitive operation of interest            
      sum += x;       // <----- primitive operation of interest

When nested loops are encountered, notice that the two primitive operations each get executed $n^2$ times, yielding $C(n) = 2n^2$.

Example 4

for (int i = 0; i < n; i++) {
   for (int j = i; j < n; j++) {
      sum += i*j;      // <----- primitive operation of interest           

Here, the cost function is a little more complicated to calculate. Note that the primitive operation is executed $$n + (n-1) + (n-2) + \cdots + 3 + 2 + 1 \quad \textrm{times}$$ However, an identity from algebra will help consolidate this into $\displaystyle{C(n) = \frac{n(n+1)}{2}}$.

The cost function $C(n)$ can often be a complicated and lengthy mathematical expression. However, when the problem-size is large some terms of the cost function will be relatively negligible when compared to others. There are two ways we can take this factor into consideration to simplify things.

One way is to focus on how the cost function for an algorithm can be "bounded" by a simpler function or functions. There are three primary notations that can be used to describe this type of bounding:

The other way to simplify things upon realizing that when the problem size is large some of the terms of the cost function will be relatively negligible when compared to others is to simply "throw away" these negligible terms! This leads to yet another notation defined below.

Big Oh, Big Omega, Big Theta, and Tilde are all examples of asymptotic notations. Regardless of which asymptotic notation one uses, it is important to understand that the common functions one encounters in such analyses can be ranked according to their growth rate (as shown in the table below -- with higher growth rates appearing closer to the bottom of the table):

logarithmic$\textrm{log } n$
linearithmic$n\,\textrm{log } n$

Below are some additional examples of running-time analysis for different primitive operations of interest seen in various code snippets -- including two involving logarithmic or linearithmic growth:

Example 5

double product = 1.0;
for (int i = 1; i <= n; i *= 2) { 
   product *= i;     // <----- primitive operation of interest                 

Here, if we suppose that $n = 2^m$, then we have the primitive operation executed when $i = 1, 2, 4, 8, ..., 2^m$, and thus $m+1$ times. Since $m = \log_2 n$, we have a logarithmic cost function $C(n) = \log_2 n + 1 \sim \log_2 n$.

Note that we can also say that $C(n)$ is $O(\textrm{log } n)$. If the switch to $\log n$ from $\log_2 n$ was confusing, note that changing the base requires only a multiplication by a constant, per the logarithm rule from high school algebra shown below. $$\log_b a = \frac{\log_x a}{\log_x b}$$

Example 6

for (int i=1; i<=N; ++i){
   for (int j=1; j<N; j*=2){
      sum++;     // <----- primitive operation of interest                

In this example, the inner loop contributes a logarithmic cost, while the other loop requires this be done $n$ times. Thus, we have a linearithmic cost function, $C(n) \sim n \log n$.

Example 7

int n = a.length;
int count = 0;
for (int i = 0; i < n; i++)
   for (int j = i+1; j < n; j++)
      for (int k = j+1; k < n; k++)
         if (a[i] + a[j] + a[k] == 0) // <----- primitive operation of interest

Note that the ordered triple (i,j,k) eventually works its way through every possible increasing sequence of three values taken from the $n$ integers $0$ to $n-1$. As such, the condition in the if statement gets evaluated ${}_n C_3 = n(n-1)(n-2)/6$ times. Thus, the related cost function is $\sim n^3/6$.