Usually when we analyse the running time of algorithm we consider the number of CPU cycles it requires. However, this assumes that the reading and writing to memory is as cheap as some computation performed by the CPU. In reality accessing memory *is* cheap, but if the problem size (or the generated data) is so massive that it cannot fit into memory, we will have to write to external storage. The properties of this external storage is that even though we have a lot of it (almost unbounded), it is orders of magnitudes slower than the RAM. So if we are counting both external storage accesses and CPU cycles, the CPU cycles contributes almost nothing to the total. Thus for very big problem it makes sense to analyse them in terms of the number of external storage accesses it does and completely disregard CPU cycles. This is what is called the I/O model and we will see how this different way of counting influences our algorithmic design.

To define more formally these different ways of analyzing an algorithm, let us consider first how a simplified computer is structured. The general structure is shown in fig. 1 having two caches close to the CPU followed by a Random Access Memory (RAM) and furthest away the disk storage. RAM is much faster than disk storage but expensive which is why it is limited in size. Disk storage, on the other hand, is slow but cheap which is why we have a lot. As indicated by the distance from the CPU the disk is much slower to access.

There are two simplifying models we will consider to approximate how things are in reality. The first model we considered, called the RAM model, assumes that everything can fit in memory (formally that we have unbounded memory).

This hypothetical computer makes two assumptions:

- Each memory access and simple operation (e.g. arithmetic, boolean operators, function calls) takes exactly 1 time step.
- The memory is unbounded.

The cost is number of timesteps.

What the I/O model instead counts is only the read/write operations to disk.

The model assumes:

- Problem size is .
- Limited memory of size .
- Infinite disk space.
- And read/write a consecutive block of between memory and disk.
- Computation can only be done on cells in memory.

The cost is only the number of I/O’s operations (as a function of , and ) so computations *are free*.

Even though the I/O model is still an oversimplification of the actual architecture it has shown to predict the asymptotic running time accurately in many cases.

We will consider how to obtain efficient sorting in the I/O model.

One merge algorithm that can be adjusted to be efficient for the I/O model is the mergesort algorithm, specifically a -way mergesorting [1].

The objective is to merge a number of sorted arrays together. Say we want to keep one block for each array in memory at every given time then we can only merge sorted arrays if using a multi-array merge.

The -way merge proceed as follows. We merge the usual way by maintaining a pointer for each array starting at the first index. We then take the minimum value of the pointers and increment the pointer at the minimum value. The different is that if the pointer overflows the currently loaded block for the given array the next block in the array is loaded instead. Additionally, when the output in memory reaches a size of it is written to disk and removed from memory.

The argument for correctness is similar to normal -way merge but we also have to consider whether enough memory is available. Fortunately, we can keep all array of size in memory since they take up in total.

Here we use that we have divided the input arrays and output array into chunks of size . The arrays are divided into blocks and the output is written using blocks. So I/O operations are spent.

*Figure* 2: -way merge. indicates blocks that have already been in memory. indicates blocks that are currently in memory. denotes the current pointer for the array..

The merging step requires already sorted arrays. Observe that we can sort cells at a time efficiently since computations are free when the cells fit in memory.

So to turn an unsorted array of length into sorted arrays (each of length ) first load in cells using blocks and sort them using some sorting algorithm. Do this for all sub arrays.

Each sub array takes I/O reads and writes, and there are block so number of I/O operations is .

Now, the idea is that we can combine the sorting and merging step to get an efficient mergesort in the I/O model.

Apply sec. 2.2 to get sorted arrays of length . Now, iteratively merge bundles of arrays so that the number of arrays at each level is reduced by .

To figure out the complexity for the entire computational tree (illustrated in fig. 3) we analyse how much each layer uses and how many layers we have.

*How many I/O operations are used on each level?* We count the level index from above. There are arrays in total at level each having size thus taking operations to sort using sec. 2.1. In total .

*How many levels are needed?* The number of arrays at level is . When this is equal to we can apply sec. 2.2. Solving for yields:

So we spend for each levels and apply sec. 2.2 once on the final level taking . Number of I/O is thus

which is actually a closer bound than what the article suggests.

*Figure* 3: Mergesort that sorts arrays in the bottom layer using sec. 2.2 and then merging bundles of sorted arrays recursively until only one sorted array of length is left..

We have introduced the I/O model and shown an efficient mergesort algorithm when . It has very similar running time as the standard mergesort with but it differs by depending on and . Specifically it makes I/O operations.

Next we will explore what we can do if we do not know and .

[1] A. Aggarwal and S. Vitter Jeffrey, “The input/output complexity of sorting and related problems,” *Commun. ACM*, vol. 31, no. 9, pp. 1116–1127, Sep. 1988 [Online]. Available: http://doi.acm.org/10.1145/48529.48535