Arrays

Next: Character strings Up: Scientific programming in C Previous: Global variables

Arrays

Scientific programs very often deal with multiple data items possessing common characteristics. In such cases, it is often convenient to place the data items in question into an array, so that they all share a common name (e.g., x). The individual data items can be either integers or floating-point numbers. However, they all must be of the same data type.

In C, an element of an array (i.e., an individual data item) is referred to by specifying the array name followed by one or more subscripts, with each subscript enclosed in square brackets. All subscripts must be nonnegative integers. Thus, in an n-element array called x, the array elements are x[0], x[1], ..., x[n-1]. Note that the first element of the array is x[0] and not x[1], as in other programming languages.

The number of subscripts determines the dimensionality of an array. For example, x[i] refers to an element of a one-dimensional array, x. Similarly, y[i][j] refers to an element of a two-dimensional array, y, etc.

Arrays are declared in much the same manner as ordinary variables, except that each array name must be accompanied by a size specification (which specifies the number of elements). For a one-dimensional array, the size is specified by a positive integer constant, enclosed in square brackets. The generalization for multi-dimensional arrays is fairly obvious. Several valid array declarations are shown below:

int j[100];
double x[20];
double y[10][20];

Thus, j is a 100-element integer array, x is a 20-element floating point array, and y is a 10x20 floating-point array. Note that variable size array declarations, e.g.,

double a[n];

where n is an integer variable, are illegal in C.

It is sometimes convenient to define an array size in terms of a symbolic constant, rather than a fixed integer quantity. This makes it easier to modify a program that utilizes an array, since all references to the maximum array size can be altered by simply changing the value of the symbolic constant. This approach is used in many of the example programs employed in this course.

Like an ordinary variable, an array can be either local or global in extent, depending on whether the associated array declaration lies inside or outside, respectively, the scope of any of the functions which constitute the program. Both local and global arrays can be initialized via their declaration statements.¹⁰ For instance,

int j[5] = {1, 3, 5, 7, 9};

declares j to be a 5-element integer array whose elements have the initial values j[0]=1, j[1]=3, etc.

Single operations which involve entire arrays are not permitted in C. Thus, if x and y are similar arrays (i.e., the same data type, dimensionality, and size) then assignment operations, comparison operations, etc. involving these two arrays must be carried out on an element by element basis. This is usually accomplished within a loop (or within nested loops, for multi-dimensional arrays).

The program listed below is a simple illustration of the use of arrays in C. The program reads a list of numbers, entered by the user, into a one-dimensional array, list, and then calculates the average of these numbers. The program also calculates and outputs the deviation of each number from the average.

/* average.c */
/*
  Program to calculate the average of n numbers and then
  compute the deviation of each number from the average

  Code adapted from "Programming with C", 2nd Edition, Byron Gottfreid,
  Schaum's Outline Series, (McGraw-Hill, New York NY, 1996) 
*/

#include <stdio.h>
#include <stdlib.h>

#define NMAX 100

int main() 
{
  int n, count;
  double avg, d, sum = 0.;
  double list[NMAX];

  /* Read in value for n */
  printf("\nHow many numbers do you want to average? ");
  scanf("%d", &n);
  
  /* Check that n is not too large or too small */
  if ((n > NMAX) || (n <= 0)) 
   {
    printf("\nError: invalid value for n\n");
    exit(1);
   }

  /* Read in the numbers and calculate their sum */
  for (count = 0; count < n; ++count) 
   {
    printf("i = %d  x = ", count + 1);
    scanf("%lf", &list[count]);
    sum += list[count];
   }

  /* Calculate and display the average */
  avg = sum / (double) n;
  printf("\nThe average is %5.2f\n\n", avg);

  /* Calculate and display the deviations about the average */
  for (count = 0; count < n; ++count) 
   {
    d = list[count] - avg;
    printf("i = %d  x = %5.2f  d = %5.2f\n", count + 1, list[count], d);
   }
  return 0;
}

Note the use of the symbolic constant NMAX to specify the size of the array list, and, hence, the maximum number of values which can be averaged. The typical output from the above program looks like:

How many numbers do you want to average? 5
i = 1  x = 4.6
i = 2  x = -2.3
i = 3  x = 8.7
i = 4  x = 0.12
i = 5  x = -2.7
 
The average is  1.68
 
i = 1  x =  4.60  d =  2.92
i = 2  x = -2.30  d = -3.98
i = 3  x =  8.70  d =  7.02
i = 4  x =  0.12  d = -1.56
i = 5  x = -2.70  d = -4.38
%

It is important to realize that an array name in C is essentially a pointer to the first element in that array.¹¹ Thus, if x is a one-dimensional array then the address of the first array element can be expressed as either &x[0] or simply x. Moreover, the address of the second array element can be written as either &x[1] or (x+1). In general, the address of the (i+1)th array element can be expressed as either &x[i] or (x+i). Incidentally, it should be understood that (x+i) is a rather special type of expression, since x represents an address, whereas i represents an integer quantity. The expression (x+i) actually specifies the address of the array element which is i memory locations offset from the address of the first array element (C, of course, stores all elements of an array both contiguously and in order in the computer memory). Hence, (x+i) is a symbolic representation of an address, rather than an arithmetic expression.

Since &x[i] and (x+i) both represent the address of the (i+1)th element of the array x, it follows that x[i] and *(x+i) must both represent the contents of that address (i.e., the value of the (i+1)th element). In fact, the latter two terms are completely interchangeable in C programs.

For the moment, let us concentrate on one-dimensional arrays. An entire array can be passed to a function as an argument. To achieve this, the array name must appear by itself, without brackets or subscripts, as an argument within the function call. The corresponding argument in the function definition must be declared as an array. In order to do this, the array name is written followed by an empty pair of square brackets. The size of the array is not specified. In a function prototype, an array argument is specified by following the data type of the argument by an empty pair of square brackets.

Since, as we have seen, an array name is essentially a pointer, it is clear that when an array is passed to a function it is passed by reference, and not by value. Hence, if any of the array elements are altered within the function then these alterations are recognized in the calling portion of the program. Likewise, if an array (rather than an individual array element) appears in the argument list of a scanf() function then it should not be preceded by the address operator (&), since an array name already is an address. The reason why arrays in C are always passed by reference is fairly obvious. In order to pass an array by value, it is necessary to copy the value of every element. On the other hand, to pass an array by reference it is only necessary to pass the address of the first element. Clearly, for large arrays, passing by reference is far more efficient than passing by value.

The program listed below is yet another version of printfact.c, albeit a far more efficient one than any of those listed previously. In this version, the factorials of all the non-zero integers up to 20 are calculated in one fell swoop, by the function factorial(), using the recursion relation

$\begin{displaymath} (n+1)! = (n+1)\,n! \end{displaymath}$

(1)

The factorials are stored as elements of the array fact[], which is passed as an argument from factorial() to the main part of the program.

/* printfact5.c */
/*
  Program to print factorials of all integers
  between 0 and 20
*/

#include <stdio.h>

/* Function prototype for factorial() */
void factorial(double []);  

int main() 
{
  int j;
  double fact[21];           // Declaration of array fact[]

  /* Calculate factorials */
  factorial(fact);

  /* Output results */
  for (j = 0; j <= 20; ++j) 
    printf("j = %3d    factorial(j) = %12.3e\n", j, fact[j]);

  return 0;
}

//%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

void factorial(double fact[]) 
{
  /*
    Function to calculate factorials of all integers
    between 0 and 20 (in form of floating-point
    numbers) via recursion formula

    (n+1)! = (n+1) n!

    Factorials returned in array fact[0..20]
  */

  int count;

  fact[0] = 1.;               // Set 0! = 1

  /* Calculate 1! through 20! via recursion */
  for (count = 0; count < 20; ++count) 
    fact[count+1] = (double)(count+1) * fact[count];

  return;
}

The output from the above program is identical to that from printfact.c.

It is important to realize that there is no array bound checking in C. If an array x is declared to have 100 elements then the compiler will reserve 100 contiguous, appropriately sized, slots in computer memory on its behalf. The contents of these slots can be accessed via expressions of the form x[i], where the integer i should lie in the range 0 to 99. As we have seen, the compiler interprets x[i] to mean the contents of the memory slot which is i slots along from the beginning of the array. Unfortunately, expressions such as x[100] or x[1000] are interpreted in a like manner, leading the compiler to instruct the executable to access memory slots which lie off the end of the memory block reserved for x. Obviously, accessing elements of an array which do not exist is going to produce some sort of error. Exactly what sort of error is very difficult to say--the program may crash, it may produce absurdly incorrect output, it may produce plausible but incorrect output, it may even produce correct output--it all depends on exactly what information is being stored in the memory locations surrounding the block of memory reserved for x. This type of error can be extremely difficult to debug, since it may not be immediately apparent that something has gone wrong when the program is executed. It is, therefore, the programmer's responsibility to ensure that all references to array elements lie within the declared bounds of the associated arrays.

Let us now discuss multi-dimensional arrays in more detail. The elements of a multi-dimensional array are stored contiguously in a block of computer memory. In scanning across this block, from its start to its end, the order of storage is such than the last subscript of the array varies most rapidly whilst the first varies least rapidly. For instance, the elements of the two-dimensional array x[2][2] are stored in the order: x[0][0], x[0][1], x[1][0], x[1][1]. The elements of a multi-dimensional array can only be addressed if the program is explicitly told the size of the array in its second, third, etc. dimensions. It is, therefore, not surprising to learn that when a multi-dimensional array is passed to a function, as an argument, then the associated argument declaration within the function definition must include explicit size declarations in all of the subscript positions except the first. The same is true for a multi-dimensional array argument appearing in a function prototype.

Next: Character strings Up: Scientific programming in C Previous: Global variables

Richard Fitzpatrick 2006-03-29