Strings

In Julia, as in other programming languages, a string is a sequence of one or more characters and can be created using quotes.

julia> str = "Hello, world."
"Hello, world."

The strings are immutable and, therefore, cannot be changed after creation. However, it is simple to create a new string from parts of existing strings. Individual characters of a string can be accessed via square brackets and indices (the same syntax as for arrays).

julia> str[1] # returns the first character
'H': ASCII/Unicode U+0048 (category Lu: Letter, uppercase)

The return type, in this case, is a Char.

julia> typeof(str[1])
Char

A Char value represents a single character. It is just a 32-bit primitive type with a special literal representation and appropriate arithmetic behaviour. Chars can be created using an apostrophe.

julia> 'x'
'x': ASCII/Unicode U+0078 (category Ll: Letter, lowercase)

It is also possible to convert characters to a numeric value representing a Unicode and vice versa.

julia> Int('x')
120

julia> Char(120)
'x': ASCII/Unicode U+0078 (category Ll: Letter, lowercase)

Substrings from the existing string can be extracted via square brackets. The indexing syntax is similar to the one for arrays.

julia> str[1:5] # returns the first five characters
"Hello"

julia> str[[1,2,5,6]]
"Heo,"

We used the range 1:5 to access the first five elements of the string (further details on ranges are given in the section on arrays). Be aware that the expressions str[k] and str[k:k] do not give the same results.

julia> str[1] # returns the first character as Char
'H': ASCII/Unicode U+0048 (category Lu: Letter, uppercase)

julia> str[1:1] # returns the first character as String
"H"

When using strings, we have to pay attention to following characters with special meaning: \, " and $. In order to use them as regular characters, they need to be escaped with a backslash (\). For example, unescaped double quote (") would end the string prematurely, forcing the rest being interpreted as Julia code. This is a common malicious attack vector called code injection.

julia> str1 = "This is how a string is created: \"string\"."
"This is how a string is created: \"string\"."

Similarly, the dollar sign is reserved for string interpolation (it will be explained soon). If we want to use it as a character, we have to use a backslash too.

julia> str2 = "\$\$\$ dollars everywhere \$\$\$"
"\$\$\$ dollars everywhere \$\$\$"
julia> "The $ will be fine."
ERROR: syntax: invalid interpolation syntax: "$ "

No, they won't. If used incorrectly, Julia will throw an error. Printing of strings can be done by the print function or the println function that also add a new line at the end of the string.

julia> println(str1)
This is how a string is created: "string".

julia> println(str2)
$$$ dollars everywhere $$$

There is one exception to using quotes inside a string: quotes without backslashes can be used in multi-line strings. Multi-line strings can be created using triple quotes syntax as follows:

julia> mstr = """
       This is how a string is created: "string".
       """
"This is how a string is created: \"string\".\n"

julia> print(mstr)
This is how a string is created: "string".

This syntax is usually used for docstring for functions. It will have the same form after printing it in the REPL.

julia> str = """
             Hello,
             world.
           """
"  Hello,\n  world.\n"

julia> print(str)
  Hello,
  world.
Exercise:

Create a string with the following text

Quotation is the repetition or copy of someone else's statement or thoughts.
Quotation marks are punctuation marks used in text to indicate a quotation.
Both of these words are sometimes abbreviated as "quote(s)".

and print it into the REPL. The printed string should look the same as the text above, i.e., each sentence should be on a separate line. Use an indent of length 4 for each sentence.

Solution:

There are two basic ways to get the right result. The first is to use a multi-line string and write the message in the correct form.

julia> str = """
           Quotation is the repetition or copy of someone else's statement or thoughts.
           Quotation marks are punctuation marks used in text to indicate a quotation.
           Both of these words are sometimes abbreviated as "quote(s)".
       """;

julia> println(str)
    Quotation is the repetition or copy of someone else's statement or thoughts.
    Quotation marks are punctuation marks used in text to indicate a quotation.
    Both of these words are sometimes abbreviated as "quote(s)".

We do not have to add backslashes to escape quotation marks in the text. The second way is to use a regular string and the new line symbol \n. In this case, it is necessary to use backslashes to escape quotation marks. Also, we have to add four spaces before each sentence to get a proper indentation.

julia> str = "    Quotation is the repetition or copy of someone else's statement or thoughts.\n    Quotation marks are punctuation marks used in text to indicate a quotation.\n    Both of these words are sometimes abbreviated as \"quote(s)\".";

julia> println(str)
    Quotation is the repetition or copy of someone else's statement or thoughts.
    Quotation marks are punctuation marks used in text to indicate a quotation.
    Both of these words are sometimes abbreviated as "quote(s)".

String concatenation and interpolation

One of the most common operations on strings is their concatenation. It can be done using the string function that accepts any number of input arguments and converts them to a single string.

julia> string("Hello,", " world")
"Hello, world"

Note that it is possible to concatenate strings with numbers and other types that can be converted to strings.

julia> a = 1.123
1.123

julia> string("The variable a is of type ", typeof(a), " and its value is ", a)
"The variable a is of type Float64 and its value is 1.123"

In general, it is not possible to perform mathematical operations on strings, even if the strings look like numbers. However, there are two exceptions. The * operator performs string concatenation.

julia> "Hello," * " world"
"Hello, world"

Unlike the string function, which works for other types, this approach can only be applied to Strings. The second exception is the ^ operator, which performs repetition.

julia> "Hello"^3
"HelloHelloHello"

The example above is equivalent to calling the repeat function.

julia> repeat("Hello", 3)
"HelloHelloHello"

Using the string function to concatenate strings can be cumbersome due to long expressions. To simplify the strings' construction, Julia allows interpolation into string literals with the $ symbol.

julia> a = 1.123
1.123

julia> string("The variable a is of type ", typeof(a), " and its value is ", a)
"The variable a is of type Float64 and its value is 1.123"

julia> "The variable a is of type $(typeof(a)), and its value is $(a)"
"The variable a is of type Float64, and its value is 1.123"

We use parentheses to separate expressions that should be interpolated into a string. It is not mandatory, but it can prevent mistakes. In the example below, we can see different results with and without parentheses.

julia> "$typeof(a)"
"typeof(a)"

julia> "$(typeof(a))"
"Float64"

In the case without parentheses, only the function name is interpolated into the string. In the second case, the expression typeof(a) is interpolated into the string literal. It is more apparent when we declare a variable myfunc that refers to typeof function

julia> myfunc = typeof
typeof (built-in function)

julia> "$myfunc(a)"
"typeof(a)"

julia> "$(myfunc(a))"
"Float64"

Both concatenation and string interpolation call string to convert objects into string form. Most non-AbstractString objects are converted to strings closely corresponding to how they are entered as literal expressions.

julia> v = [1,2,3]
3-element Vector{Int64}:
 1
 2
 3

julia> "vector: $v"
"vector: [1, 2, 3]"

julia> t = (1,2,3)
(1, 2, 3)

julia> "tuple: $(t)"
"tuple: (1, 2, 3)"
Exercise:

Print the following message for a given vector

"<vec> is a vector of length <len> with elements of type <type>"

where <vec> is the string representation of the given vector, <len> is the actual length of the given vector, and <type> is the type of its elements. Use the following two vectors.

a = [1,2,3]
b = [:a, :b, :c, :d]

Hint: use the length and eltype functions.

Solution:

We will show two ways how to solve this exercise. The first way is to use the string function in combination with the length function to get the length of the vector, and the eltype function to get the type of its elements.

julia> a = [1,2,3];

julia> str = string(a, " is a vector of length ",  length(a), " with elements of type ", eltype(a));

julia> println(str)
[1, 2, 3] is a vector of length 3 with elements of type Int64

The second way is to use string interpolation.

julia> b = [:a, :b, :c, :d];

julia> str = "$(b) is a vector of length $(length(b)) with elements of type $(eltype(b))";

julia> println(str)
[:a, :b, :c, :d] is a vector of length 4 with elements of type Symbol

Useful functions

A handy function is the join function that performs string concatenation. Additionally, it supports defining a custom separator and a different separator for the last element.

julia> join(["apples", "bananas", "pineapples"], ", ", " and ")
"apples, bananas and pineapples"

In many cases, it is necessary to split a given string according to some conditions. In such cases, the split function can be used.

julia> str = "JuliaLang is a pretty cool language!"
"JuliaLang is a pretty cool language!"

julia> split(str)
6-element Vector{SubString{String}}:
 "JuliaLang"
 "is"
 "a"
 "pretty"
 "cool"
 "language!"

By default, the function splits the given string based on whitespace characters. This can be changed by defining a delimiter.

julia> split(str, " a ")
2-element Vector{SubString{String}}:
 "JuliaLang is"
 "pretty cool language!"

Julia also provides multiple functions that can be used to find specific characters or substring in a given string. The contains function checks if the string contains a specific substring or character. Similarly, the occursin function determines if the specified string or character occurs in the given string. These two functions differ only in the order of arguments.

julia> contains("JuliaLang is pretty cool!", "Julia")
true

julia> occursin("Julia", "JuliaLang is pretty cool!")
true

Another useful function is endswith, which checks if the given string ends with the given substring or character. It can be used, for example, to check that the file has a proper suffix.

julia> endswith("figure.png", "png")
true

Sometimes, it is necessary to find indices of characters in the string based on some conditions. For such cases, Julia provides several find functions.

julia> str = "JuliaLang is a pretty cool language!"
"JuliaLang is a pretty cool language!"

julia> findall(isequal('a'), str)
5-element Vector{Int64}:
  5
  7
 14
 29
 33

julia> findfirst(isequal('a'), str)
5

julia> findlast(isequal('a'), str)
33

The first argument isequal('a') creates a function that checks if its argument equals the character a.

As we said before, strings are immutable and cannot be changed. However, we can easily create new strings. The replace function returns a new string with a substring of characters replaced with something else:

julia> replace("Sherlock Holmes", "e" => "ee")
"Sheerlock Holmees"

It is also possible to apply a function to a specific substring using the replace function. The following example shows how to change all e letters in the given string to uppercase.

julia> replace("Sherlock Holmes", "e" => uppercase)
"ShErlock HolmEs"

It is even possible to replace a whole substring:

julia> replace("Sherlock Holmes", "Holmes" => "Homeless")
"Sherlock Homeless"
Exercise:

Use the split function to split the following string

"Julia!"

into a vector of single-character strings.

Hint: we can say that an empty string "" separates the characters in the string.

Solution:

To separate a string into separate single-character strings, we can use the split function and an empty string ("") as a delimiter.

julia> split("Julia!", "")
6-element Vector{SubString{String}}:
 "J"
 "u"
 "l"
 "i"
 "a"
 "!"