Mastering Swift's Development Blog

Follow Us On Twitter
  • Jon Hoffman

Implementing Copy-On-Write

Updated: Jul 14


Apple says that we should prefer value types (structs) over reference types (classes). While there are number advantages with using value types there are also performance considerations when using them for large data structures. As you are probably aware, when you pass an instance of a value type you are actually passing a copy of the instance and not the instance itself. There can be quite a bit of performance overhead in this if we are passing large data structures within our code. For example, if we had a data structure that contains one million elements, then every time that structure is passed within our code, all one million elements must be copied to the new data structure.


In order to get around this with Swifts built in collection types (Arrays, Sets and Dictionaries), Apple implements a feature known as Copy-On-Write, or COW for short. COW lets Swift avoid making a new copy of the data structure, when it is passed within our code, until a change is actually made to the structure.


COW is a very powerful feature that can greatly increase the performance of our code however we do not get this feature by default with our custom value types, we need to implement it ourselves. This article will show you how to implement COW with your custom value types.


In order to show you how to implement COW, we will build a custom queue type that implements COW. We will start off by creating a backend storage queue named BackendQueue that our main queue will use as its storage backend. This type will be implemented as a reference (Class) type. Here is our code for the BackendQueue type:



fileprivate class BackendQueue<T> {
    private var items = [T]()
    
    public init() {}
    private init(_ items: [T]) {
        self.items = items
    }
    
    public func enqueue(item: T) {
        items.append(item)
    }
    
    public func dequeue() -> T? {
        if items.count > 0 {
            return items.remove(at: 0)
        } else {
            return nil
        }
    }
    
    public func count() -> Int {
        return items.count
    }
    
    public func copy() -> BackendQueue<T> {
        return BackendQueue<T>(items)
    }
    
}


This is a very basic queue type but there are a couple things to take notice of here:

  • We used Generics to ensure our data structure can store instances of any data type. If you are unsure how or why we use Generics, please see my post on Generics here

  • We used the access control modifier of FilePrivate for the class itself. This restricts access to this type to only code defined within the same source file. We will implement our main Queue type in the same source file so we can use this type as our backend data store while the rest of our code will be unable to use it.

  • We created a private initializer that can be used within an instance of this type to create a copy of itself.

  • We create a copy() function which uses the private initializer to create a copy of itself.


Now that we have our backend storage type, let’s look at how we would use it to create a Queue which implements COW. We will start off by creating a basic queue type that uses our BackendQueue type as it’s storage type. We will implement this with a value type (struct) and also uses Generics like we did in the BackendQueue type.



public struct Queue<T> {
    private var internalQueue = BackendQueue<T>()

    public mutating func enqueue(item: T) {
        internalQueue.enqueue(item: item)
    }
    public mutating func dequeue() -> T? {
        return internalQueue.dequeue()
    }
    public func count() -> Int {
        return internalQueue.count()
    }  
}


This is a very basic queue that simply uses the enqueue(), dequeue() and count() methods of the BackendQueue type to perform its functionality. Before we start to implement Copy-On-Write with our queue type we need to look at a global function that Swift provides. This function is named isKnownUniquelyReferenced(_ object: inout T?) -> Bool which will return true if the object has only a single strong reference otherwise it will return false. We will use this function anytime we make a change to the internal data structure to see if there are multiple references to the BackendQueue instance and if so, we will make a copy prior to changing the data structure. Let’s start off by creating a method that will use the isKnownUniquelyReferenced() function to see if there are multiple references and if so, make a copy of the BackendQueue instance. We will name the method checkUniquelyReferencedQueue() and make it private so it can only be accessed internally.



mutating private func checkUniquelyReferencedInternalQueue() {
        if !isKnownUniquelyReferenced(&internalQueue) {
            internalQueue = internalQueue.copy()
            print("Making a copy of internalQueue")
        } else {
            print("Not making a copy of internalQueue")
        }
    }


In this method, if it is determined that there are multiple references to the BackendQueue instance, then we make a new copy to use internally. We also put in some debug statements so we can see what is going on when we test the functionality.


Now we will need to call this function from both our enqueue() and dequeue() methods prior to making a change to the internal instance of the BackendQueue data structure. Lets change these two methods to look like this:



    public mutating func enqueue(item: T) {
        checkUniquelyReferencedInternalQueue()
        internalQueue.enqueue(item: item)
    }
    public mutating func dequeue() -> T? {
        checkUniquelyReferencedInternalQueue()
        return internalQueue.dequeue()
    }


It is very important that we call the checkUniquelyReferencedQueue() method prior to making changes to the internal data structure. Also note that we did not change the count() method because it does not make any change to the internal data structure therefore there is no reason make a copy.


Finally let’s add a method that we can use for testing purposes to see if we have a unique reference to the internal data structure. You would not need this method in any production code but in order to see what is going on we may want to use this to test our queue. We will name this method uniquelyReferenced() and it will return a Bool type signifying if we have a unique instance or not.



    mutating public func uniquelyReferenced() -> Bool {
        return isKnownUniquelyReferenced(&internalQueue)
    }


Now let’s see how this works using the following code:



var queue1 = Queue<String>()
print("Adding 'One'")
queue1.enqueue(item: "One")

var queue2 = queue1

print("-------------")
print("Adding 'Two'")
queue2.enqueue(item: "Two")

print("-------------")
print("Adding 'Three'")
queue2.enqueue(item: "Three")

print("-------------")
print("Dequeue called")
if let value = queue2.dequeue() {
    print(value)
}


Before we run this, lets walk through what is going on. We start off by creating an instance of the Queue type named queue1. We then add the string “One” to the queue. At this point we have one instance of the queue type and one instance of the backend data structure.


Next we make a copy of the queue1 Queue instance and name that copy queue2. At this point we have two separate instances of the Queue type named queue1 and queue2 and both instances point to the same instance of the data structure therefore they are effectively sharing the data.


Now we add the string “Two” to the queue2 however before we add that string to the data (change the data), we call the checkUniquelyReferencedQueue() method from the enqueue() method. This method will check to see if the data structure has multiple references pointing to it, which it does therefore prior to adding the String “Two” it makes a unique copy for itself for queue2 to use. At this point we have two instances of the queue type and each instance references its own unique instance of the backend data structure.


When we add the “Three” string and also dequeue the first value from the queue, we do not make new copies of the data store because this instance of the Queue type already has a unique instance. Now try to run the code and see what the output is telling you.


There are a number of ways that we could have structured this example differently like having a nested class within our structure however my goal for this post was to present the concepts behind implementing copy on write to make it as easy to understand as possible. I hope I achieved this.


If you are creating your own custom data structures, and suspect that you may have large data sets, I would recommend implementing copy-on-write to avoid making unnecessary copies of the data structure. This will improve the performance of your code.

masteringSwift.jpeg

Mastering Swift 5.3

The sixth edition of this bestselling book, updated to cover through version 5.3 of the Swift programming language

Amazon

Packt

pop.jpeg

Protocol Oriented Programming

Embrace the Protocol-Oriented design paradigm, for better code maintainability and increased performance, with Swift.

Amazon

Packt