Swift string characters in unicode, UTF-16 and UTF-8 representation

Swift string provides different properties to get the characters in different unicode encodings. unicodeScalars, utf16 and utf8 are main three properties that we can use. Each property returns as a series of code units.

In this post, we will learn how to get unicode, UTF-16 and UTF-8 values of a swift string.

unicodeScalars :

unicodeScalars property returns a collection of unicode scalars values. Unicode scalar is a unique 21 bit code to represent a character.

For the below example program :

var givenString = "Hello World 🌎"

print(givenString.unicodeScalars.map{ $0.value})

It prints :

[72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100, 32, 127758]

UTF-16 :

utf16 property returns a collection of UTF-16 code units. UTF-16 code unit is 16 bit encoding form of the string’s unicode scalar values.

var givenString = "Hello World 🌎"

print(Array(givenString.utf16))

Output :

[72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100, 32, 55356, 57102]

UTF-8 :

Similar to UTF-16, UTF-8 property returns a collection of UTF-8 code units. UTF-8 code unit is 8 bit encoding form of the string’s unicode scalar values.

var givenString = "Hello World 🌎"

print(Array(givenString.utf8))

Output :

[72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100, 32, 240, 159, 140, 142]