CSE-250 Fall 2022 - Section B - ArrayBuffers, Amortized Analysis

ArrayBuffers, Amortized Analysis

CSE-250 Fall 2022 - Section B

Sept 19, 2022

Textbook: Ch. 6.4

Announcements

  • PA 1 due tonight!
  • WA 1 posted (Due Weds, Sept 28)
Abstract Data Type (ADT)
The interface to a data structure (What)
Data Structure
The implementation of one or more ADTs (How)

Types of Collections

Iterable
Any collection of items.
Seq
A collection of items arranged in a specific order.
IndexedSeq
Like Seq, but $O(1)$ access to individual items by index.
Set
A collection of unique items.
Map
A collection of items identified by a key.

Types of Collections

mutable.Seq
Like Seq, but can be changed.
mutable.Buffer
Like mutable.Seq, but "efficient" append.
Queue
Like mutable.Seq, but "efficient" append and remove first.
Stack
Like mutable.Seq, but "efficient" prepend and remove first.

The mutable.Seq ADT

apply(idx: Int): A
Get the element (of type A) at position idx.
iterator: Iterator[A]
Get access to
    view
all elements in the seq, in order, once.
length: Int
Count the number of elements in the seq.
insert(idx: Int, elem: A): Unit
Insert an element at position idx with value elem.
remove(idx: Int): A
Remove the element at position idx and return the removed value.

Array[T] : Seq[T]

An Array of $n$ items of type T:

  • size: 4 bytes for $\texttt{sizeof}(\texttt{data})$ (optional).
  • bytesPerElement: 4 bytes for $\texttt{sizeof}(\texttt{T})$ (optional).
  • data: $\texttt{size} \times \texttt{bytesPerElement}$ bytes of memory.

Challenge: Operations that modify the array size
require copying the array.

Solution Reserve extra space in the array!

ArrayBuffer[T] : Buffer[T] ( : Seq[T] )

An ArrayBuffer of type T:

  • size: 4 bytes for $\texttt{sizeof}(\texttt{data})$ (optional).
  • bytesPerElement: 4 bytes for $\texttt{sizeof}(\texttt{T})$ (optional).
  • used: 4 bytes for the number of fields $used$
  • data: $\texttt{size} \times \texttt{bytesPerElement}$ bytes of memory.

ArrayBuffer


  class ArrayBuffer[T] extends Buffer[T]
  {
    var used = 0
    var data = Array[Option[T]].fill(INITIAL_SIZE) { None }

    def length = used

    def apply(i: Int): T =
    {
      if(i < 0 || i >= used){ throw new IndexOutOfBoundsException(i) }
      return data(i).get
    }

    /* ... */
  }
  

What the heck is Option[T]?

Option[T]


    val x = functionThatCanReturnNull()
    x.frobulate()
  

java.lang.NullPointerException (in production)

Option[T]


    val x = functionThatCanReturnNull()
    if(x == null) { handle this case }
    else { x.frobulate() }
  

Problem: It's easy to miss this test
(and bring down a million-dollar server)!

Option[T]


    val x = functionThatReturnsOption()
    x.frobulate()
  

error: value frobulate is not a member of Option[MyClass]
At compile time.

Option[T]

Some(x)
value.isDefined == true
A valid value. Access with value.get
None
value.isEmpty == true
Analogous to null. No value

Bonus: an Option[T] is a Seq[T]

Digression over!

ArrayBuffer.remove(i)


    def remove(target: Int): T =
    {
      /* Sanity-check inputs */
      if(target < 0 || target >= used){ 
        throw new IndexOutOfBoundsException(target) }
      /* Shift elements left */
      for(i <- target until (used-1)){
        data(i) = data(i+1)
      }
      /* Update metadata */
      data(used-1) = None
      used -= 1
    }
  

What is the complexity?

$O(\texttt{data.size})$ (i.e., $O(n)$) or $\Theta(used-target)$

$$T_{remove}(n) = \begin{cases} 1 & \textbf{if } target = used-1\\ 2 & \textbf{if } target = used-2\\ 3 & \textbf{if } target = used-3\\ ... & ...\\ n-1 & \textbf{if } target = 0 \end{cases}$$

$T_{remove}(n)$ is $O(n)$ and $\Omega(1)$
(these bounds are "tight")

We usually parameterize runtime complexity by datastructure size, we can measure runtime in terms of other parameters (e.g., used and i).

ArrayBuffer.append(elem)


    def append(elem: T): Unit =
    {
      if(used == data.size){ /* 🙁 case */
        /* assume newLength > data.size, but pick it later */
        val newData = Array.copyOf(original = data, newLength = ???)
        /* Array.copyOf doesn't init elements, so we have to */
        for(i <- data.size until newData.size){ newData(i) = None }
      }
      /* Append element, update data and metadata */
      newData(used) = Some(elem)
      data = newData
      used += 1
    }
  

What is the complexity?

$O(\texttt{data.size})$ (i.e., $O(n)$) ... but ...

ArrayBuffer.append(elem)

$$T_{append}(n) = \begin{cases} n & \textbf{if } \texttt{used} = \texttt{n} \text{ // 🙁 case}\\ 1 & \textbf{otherwise} \text{ // 😃 case} \end{cases}$$

$T_{append}(n)$ is $O(n)$ and $\Omega(1)$
(these bounds are also "tight", so no $\Theta$-bound)

How often do we hit the 🙁 case?

newLength = data.size + 1

newLength = data.size + 1

For $n$ appends into an empty buffer...

While $\texttt{used} \leq \texttt{INITIAL_SIZE}$: $\sum_{i = 0}^{\texttt{IS}} \Theta(1)$

And after: $\sum_{i = \texttt{IS}+1}^{n} \Theta(i)$

Total for $n$ insertions: $\Theta(n^2)$

newLength = data.size + 10

newLength = data.size + 10

For $n$ appends into an empty buffer...

While $\texttt{used} \leq \texttt{INITIAL_SIZE}$: $\sum_{i = 0}^{\texttt{IS}} \Theta(1)$

And after: $$\sum_{i = \texttt{IS}+1}^{n} \begin{cases} \Theta(i) & \textbf{if } i = \texttt{IS} \mod 10\\ \Theta(1) & \textbf{otherwise} \end{cases}$$

newLength = data.size + 10

... or ... $$ \left(\sum_{i = \texttt{IS}+1}^{n} \Theta(1)\right) + \left(\sum_{j = 0}^{\frac{(n - \texttt{IS}+1)}{10}} \Theta((\texttt{IS}+1+j)\cdot 10) \right) $$

Total for $n$ insertions: $\Theta(n^2)$

newLength = data.size × 2

newLength = data.size × 2

For $n$ appends into an empty buffer...

While $\texttt{used} \leq \texttt{INITIAL_SIZE}$: $\sum_{i = 0}^{\texttt{IS}} \Theta(1)$

And after... $$\sum_{i = IS+1}^{n} \begin{cases} \Theta(i) & \textbf{if } i = \texttt{IS} \cdot 2^k \textbf{ (for any $k \in \mathbb N$)}\\ \Theta(1) & \textbf{otherwise} \end{cases}$$

newLength = data.size × 2

newLength = data.size × 2

How many boxes for $n$ inserts? $\Theta(\log(n))$

How much work for box $j$?

$\Theta(\texttt{IS} \cdot 2^j) + \sum_{1}^{\texttt{IS} \cdot 2^j}\Theta(1)$ $ = \Theta(2^j)$

How much work for $n$ inserts? $$\sum_{j = 0}^{\Theta(\log(n))}\Theta(2^j)$$

Total for $n$ insertions: $\Theta(n)$

Amortized Runtime

append(elem) is $O(n)$

$n$ calls to append(elem) are $O(n)$

The cost of $n$ calls is guaranteed.
(It would be nice if we had a name for this...)

Amortized Runtime

If $n$ calls to a function take $O(T(n))$...

We say the Amortized Runtime is $O\left(\frac{T(n)}{n}\right)$

e.g., the amortized runtime of append is $O(\frac{n}{n}) = O(1)$

(even though the worst-case runtime is $O(n)$)

Next time...

  • Linked Lists
  • Iterators
  • Access-by-reference