Linked list primer

A ways back now I wrote this tutorial about linked lists, primarily using C and C++ for examples. I hope it helps you understand more about linked lists ! Before you start reading, you should be able to read and understand C and/or C++ (especially all the pointer notation in these languages), and you should have a decent knowledge of how pointers work.

Why learn about linked lists ?

Suppose you are given a programming task that involves storing a list of integer x–coordinates to be used in drawing a picture. "Aha!" you might think, "I'll use an array to store all those values." And indeed, an array is one solution to the task. Arrays store lists of data, so you might just go ahead and declare your data structure as :

const int size = 128;
int xcoords[size];

Now, all would be well and good. But what if I came along in your program and tried to add 129 points to your data structure? You might have built-in dumb-user-devices to prevent me from doing such a diabolical deed, or your program might overwrite some of the data that's already in the array, or it might do something altogether different (and completely undesirable). You would start getting bug reports, and then nasty emails would come your way complaining that users want to be able to add 129 points. You really need to face it : you have to be able to dynamically add data to the list. You could simply reserve enough space for 100000 coordinates, but that will eat up memory quickly, and you know that a user will eventually find a way to break a program with such a hard-coded restriction. Even by declaring your array (in C++) as :

int size = 128;
// possibly change the value of size here ...
int *xcoords = new int[size];

or as :

int getNumXCoords(void);
// ... later, inside a function call somewhere ...
int *xcoords = new int[getNumXCoords()];

The user will have to know ahead of time how many points she is planning on using. There must be a better way.

Thankfully, there are several better ways, and one of them is a linked list.

What is a linked list ?

A linked list is a very useful data structure ; it is very much like an array, but it can dynamically grow and shrink to fit the number of data points that are entered in a program. Linked lists can be implemented in any computer language that is capable of using pointers (or any other analogous data type), but the code here uses C and sometimes C++.

List nodes

http://media.leifjohnson.net/img/linkedlist/snode-labels.png

Basically, a non-null list consists of several nodes arranged in a particular linear order, just like an array. Unlike an array, though, each node consists of, in its simplest form, a data structure that has both a data element and a pointer.

http://media.leifjohnson.net/img/linkedlist/snode-arrows.png

The data item holds the node's data element (like the x-coordinate), and the pointer (often called the next pointer) points to, or holds the address of, the next node in the list. Sometimes, as shown in the diagram, the data item is itself a pointer, but I'll stick with the example case of the data being integers. To make a list of arrays of floats, for example, just change the type of the data element in the linked list nodes.

There is one convention I'd like to clear up before continuing, and that is the null pointer convention. Pointers can either point to something by holding some useful data value, or they can point to nothing by holding a special value, often called NULL or NUL or just plain 0. Different folks have different ways of representing null pointers in diagrams. (Usually, it's a slash, or ground, or something else null-looking.)

http://media.leifjohnson.net/img/linkedlist/snode-ground.png

I prefer the electrical ground symbol (three parallel lines) ; therefore, that is the notation that I will use here. Before going any further, draw a few list nodes just to get a feel for using your pencil or crayons or whatever you've got near your computer.

The list structure

Here's the conceptual diagram of a small linked list that holds integers as data :

http://media.leifjohnson.net/img/linkedlist/slist.png

Notice three things about the list :

  • The pointer at the end of the list is null.
  • Each node's next pointer (except for the last node) points to another unique node in the list, making the list a linear graph.
  • The first node of the list is pointed to by a pointer named top. (It could also be called start, go, fred, or xcoords ...) What is the type of the top pointer (e.g. is it a pointer to float or a char, or something else) ?

Thus a linked list is a linear data structure that is a lot like an array. The tricky part is that linked lists are a little bit more complex to implement than arrays. I'll give a small start, and then I will provide some programming ideas to help the linked list concept along.

Linked list code starter

To write linked list code, you first need to create the nodes. This is an easy enough process using the class concept in C++. A simple example might be :

class IntList {
public:
  IntList *next; // the pointer to the next node
  int data;      // the node's data element
};

In the C programming language, which does not use objects per se, we might write the following instead (note the striking similarity ...) :

typedef struct _IntListNode IntList;
struct _IntListNode {
  IntList *next;
  int data;
};

So, using our new class (or struct), we could build the list above by writing in C, for example :

IntList *one   = malloc(sizeof(IntList));
IntList *two   = malloc(sizeof(IntList));
IntList *three = malloc(sizeof(IntList));
IntList *four  = malloc(sizeof(IntList));

IntList *top = one;

one->data   = 128;  one->next   = two;
two->data   = 5;    two->next   = three;
three->data = 1979; three->next = four;
four->data  = -7;   four->next  = NULL;

Now, you might think this is a little bit repetitive, and it certainly is. There are much better ways to handle list formation. My current favorite is this C++ code :

class IntList {
public:
  // constructor initializes both data members
  IntList(int d = 0, IntList *n = 0)
  : data(d), next(n) { }

  // destructor recursively deletes whole list
  ~IntList() { if(next) delete next; }

private:
  IntList *next; // pointer to next node
  int data;      // node's data element
};

The constructor basically initializes the class instance's data elements. The code after the colon on the constructor's line assigns d to the class instance's data element, and n to the class instance's next element.

If you're not sure how the destructor destroys the entire list, draw out a list with your crayon, and then pretend that you called the first node's destructor. Notice that when you delete the first node by calling its destructor, the next pointer gets deleted as well. See what happens to the node that follows the first one (and the node that follows the second one, and so on).

To illustrate how this has changed things, let's build the list again using the new class constructor, this time in C++.

// default argument here makes next null
IntList *top = new
IntList(-7);

top = new IntList(1979, top);
top = new IntList(5, top);
top = new IntList(128, top);

What a difference ! Of course, you can use this in a loop to automate the process even further.

How do I get data out of the list ?

There are two basic ways to traverse a list : recursively and iteratively. The iterative way makes use of a loop that has a fixed termination point. Recursion relies on nested function calls to traverse the list. Here is an example of using iteration to traverse the list and print out the data elements :

// suppose buildList() returns the list we made above.
IntList *top = buildList();

// do other things before you need the list again ...

for (IntList *t = top; t; t = t->next)
  cout << "element data is " << t->data << endl;

Note that top could even be null when the for loop starts ; the code would still work.

For the curious, here is an example of a function in C that would traverse a linked list recursively :

void
printListRecursive(IntList *l)
{
  if (! l) return;
  printf("element data is %d\n", l->data);
  printListRecursive(l->next);
}

Traversing the list recursively, like doing anything else recursively, may seem a bit obfuscated until you become familiar with the process of recursion. It is very important to remember to put a terminating condition in a recursive function : we don't want ten million cats in hats running around here !

Once you've seen the code for doing one thing to each list element (like printing them, for example), it's pretty easy to change it to do something else to each list element. For instance, you could count the elements in the list by doing the following :

IntList *top = buildList();

// do other things before you need the list again ...

int numElem = 0;
for (IntList *t = top; t; t = t->next) numElem++;

That was iterative ; the recursive function might look like this :

int
countListRecursive(IntList *l)
{
  if (! l) return 0;
  return 1 + countListRecursive(l->next);
}

Extending data retrieval

Now that you know how to do things to each item in the list sequentially, it might seem like a logical next step to use this same process to do nothing to each element in the list sequentially until you find a particular element that you need. This is just the way that one goes about, for example, finding an element in the list or counting only specific elements in the list. Of course, there are many other tasks that you might do to elements in a list that fall into this category, but searching is one of the most common.

So here's an example of how you might traverse the list iteratively (in C) to find a specific element. The function returns NULL if the number we're looking for isn't in the list.

IntList *
find(int needle, IntList *haystack)
{
  IntList *t;

  for(t = haystack; t; t = t->next)
    if (t->data == needle) return t;

  return NULL;
}

Another example of how you might use this functionality would be to extract only those elements from a list that satisfy certain criteria, building up a list of such elements that you find. You might want to find all the prime numbers in a given list of integers, for example. Here's some code in C++ that uses recursion to do that.

IntList *
primesFromListRecursive(IntList *start, IntList *filtered)
{
  if (! start) return filtered;
  if (isPrime(start->data)) {
    IntList *newF = new IntList(start->data, filtered);
    return primesFromListRecursive(start->next, newF);
  } else {
    return primesFromListRecursive(start->next, filtered);
  }
}

List manipulation: deleting, reversing, and randomizing

Another step you might want to take with this neato list concept is to change the list once you've built it up. There are several ways to change a list, but they are on different levels of complexity. So far, we've addressed the simplest levels of complexity, with building an unsorted list and traversing the list to do something to each element. Now let's take a look at how you might delete an element.

Basics to cover in linked list programming

Whenever you are dealing with linked lists, it's important to address three types of elements in the list :

  • the beginning
  • the end
  • a general element that's between the middle and the end

Sounds pretty easy, and the above three cases are enough to keep yourself out of hot water when programming a linked list. If you've covered the case of the beginning of the list, covered the case of the end of the list, and covered an arbitrary case in the middle of the list, you're set !

Deleting

http://media.leifjohnson.net/img/linkedlist/slist.png

So how does this apply to deleting ? Well, let's take a look at the list from above and think of how to delete the beginning, the end, and the middle of the list.

http://media.leifjohnson.net/img/linkedlist/delete-first.png

The beginning is pretty easy : we would need to keep a temp pointer to the beginning (so we don't lose track of it), reroute the top pointer to the value of the first element's next pointer, and then set the first element's next pointer to NULL. Then we'll be safe to delete that node without destroying the rest of the list, and thanks to the rerouted top pointer, we know where the new start of the list will be.

http://media.leifjohnson.net/img/linkedlist/delete-last.png

The end is even easier, once you see how the beginning works : once we find the end of the list, we keep a temp pointer to that element, set the previous node's next pointer to NULL (because it will become the end of the list), and delete the last node.

http://media.leifjohnson.net/img/linkedlist/delete-middle.png

Finally, deleting a node from the middle : this is a little more complex, but it uses the same process as before. We again need to keep a temp pointer to the node we're trying to get rid of. Then, like the case for the beginning of the list, we reroute the pointer that points to the current node to the value of the current node's next pointer. After we set the current node's next to NULL, we'll be set to delete the node.

That covers our three cases. Now the question is how to unify the three into one compact function ... it becomes more clear when you notice that in all three cases, we take the pointer that points to the node we want to delete and then reroute it to the node after the current node. There are two somewhat special cases: in the beginning of the list, the previous pointer is the pointer that keeps track of the start of the list, and for the end of the list, the node after the one we want to delete is effectively NULL. So here is one way to delete a node from a linked list, iteratively in C :

IntList *
delete(IntList *l, integer value)
{
  IntList *ret = l, *cur = l, *prev = l;

  /* advance the current node pointer, trailing behind
     the previous node pointer, until we find the end
     of the list or the value we want. */
  while (cur && (cur->data != value)) {
    prev = cur;
    cur = cur->next;
  }

  /* if we found a node with the value, delete it. */
  if (cur && (cur->data == value)) {
    if (cur == prev) ret = cur->next;
    else prev->next = cur->next;

    cur->next = NULL; free(cur);
  }

  return ret;
}

There are some things to keep in mind about this particular implementation. First, it's not pretty. Next, this function deletes the first node it encounters in the list that has a data value of value. There are ways to delete nodes based on a pointer to a particular node. (Say, for instance, you found a pointer to the node using a search function, then passed the pointer to the delete function.) In addition, this function deletes a maximum of one node from the list. There are ways to delete all nodes with a specified data element.

More information

A great intro page is The Code Project linked list tutorial. It discusses use of the CList class included with the Microsoft Foundation Classes (MFC).

For those of you looking for a quick code fix, check the 10 minute question about linked lists from http://devx.com.

Created and published