Storing Keys with Associated Values in Hash Maps
The last of our common collections is the hash map. The type HashMap<K, V> stores a mapping of keys of type K to values of type V using a hashing function, which determines how it places these keys and values into memory. Many programming languages support this kind of data structure, but they often use a different name, such as hash, map, object, hash table, dictionary, or associative array, just to name a few.
Hash maps are useful when you want to look up data not by using an index, as you can with vectors, but by using a key that can be of any type. For example, in a game, you could keep track of each team's score in a hash map in which each key is a team's name and the values are each team's score. Given a team name, you can retrieve its score.
We'll go over the basic API of hash maps in this section, but many more goodies are hiding in the functions defined on HashMap<K, V> by the standard library. As always, check the standard library documentation for more information.
Creating a New Hash Map
One way to create an empty hash map is to use new and to add elements with insert. In the following example, we're keeping track of the scores of two teams whose names are Blue and Yellow. The Blue team starts with 10 points, and the Yellow team starts with 50:
import std.collections.HashMap
fn main() {
var scores: HashMap<String, Int> = HashMap.new()
scores.insert("Blue".toString(), 10)
scores.insert("Yellow".toString(), 50)
}
Note that we need to first import the HashMap from the collections portion of the standard library. Of our three common collections, this one is the least often used, so it's not included in the features brought into scope automatically in the prelude. Hash maps also have less support from the standard library; there's no built-in macro to construct them, for example.
Just like vectors, hash maps store their data on the heap. This HashMap has keys of type String and values of type Int. Like vectors, hash maps are homogeneous: All of the keys must have the same type, and all of the values must have the same type.
Rust comparison: Oxide uses dot notation (HashMap.new()) instead of path notation (HashMap::new()), and imports use dot notation (std.collections.HashMap) instead of std::collections::HashMap.
// Rust use std::collections::HashMap; fn main() { let mut scores: HashMap<String, i32> = HashMap::new(); scores.insert(String::from("Blue"), 10); scores.insert(String::from("Yellow"), 50); }
Accessing Values in a Hash Map
We can get a value out of the hash map by providing its key to the get method:
import std.collections.HashMap
fn main() {
var scores: HashMap<String, Int> = HashMap.new()
scores.insert("Blue".toString(), 10)
scores.insert("Yellow".toString(), 50)
let teamName = "Blue".toString()
let score: Int = scores.get(&teamName).copied().unwrapOr(0)
println!("Blue team score: \(score)")
}
Here, score will have the value that's associated with the Blue team, and the result will be 10. The get method returns an Option<&V> (or (&V)? in Oxide terms); if there's no value for that key in the hash map, get will return null. This program handles the Option by calling copied to get an Option<Int> rather than an Option<&Int>, then unwrapOr to set score to zero if scores doesn't have an entry for the key.
We can iterate over each key-value pair in a hash map in a similar manner as we do with vectors, using a for loop:
import std.collections.HashMap
fn main() {
var scores: HashMap<String, Int> = HashMap.new()
scores.insert("Blue".toString(), 10)
scores.insert("Yellow".toString(), 50)
for (key, value) in &scores {
println!("\(key): \(value)")
}
}
This code will print each pair in an arbitrary order:
Yellow: 50
Blue: 10
Rust comparison: Oxide uses camelCase for method names: unwrapOr instead of unwrap_or.
#![allow(unused)] fn main() { // Rust let score = scores.get(&team_name).copied().unwrap_or(0); }
Hash Maps and Ownership
For types that implement the Copy trait, like Int, the values are copied into the hash map. For owned values like String, the values will be moved and the hash map will be the owner of those values:
import std.collections.HashMap
fn main() {
let fieldName = "Favorite color".toString()
let fieldValue = "Blue".toString()
var map: HashMap<String, String> = HashMap.new()
map.insert(fieldName, fieldValue)
// fieldName and fieldValue are invalid at this point!
// println!("\(fieldName)") // Error: value has been moved
}
We aren't able to use the variables fieldName and fieldValue after they've been moved into the hash map with the call to insert.
If we insert references to values into the hash map, the values won't be moved into the hash map. The values that the references point to must be valid for at least as long as the hash map is valid. We'll talk more about these issues in the chapter on lifetimes.
Updating a Hash Map
Although the number of key and value pairs is growable, each unique key can only have one value associated with it at a time (but not vice versa: for example, both the Blue team and the Yellow team could have the value 10 stored in the scores hash map).
When you want to change the data in a hash map, you have to decide how to handle the case when a key already has a value assigned. You could replace the old value with the new value, completely disregarding the old value. You could keep the old value and ignore the new value, only adding the new value if the key doesn't already have a value. Or you could combine the old value and the new value. Let's look at how to do each of these!
Overwriting a Value
If we insert a key and a value into a hash map and then insert that same key with a different value, the value associated with that key will be replaced. Even though the following code calls insert twice, the hash map will only contain one key-value pair because we're inserting the value for the Blue team's key both times:
import std.collections.HashMap
fn main() {
var scores: HashMap<String, Int> = HashMap.new()
scores.insert("Blue".toString(), 10)
scores.insert("Blue".toString(), 25)
println!("\(scores)") // Prints: {"Blue": 25}
}
This code will print {"Blue": 25}. The original value of 10 has been overwritten.
Adding a Key and Value Only If a Key Isn't Present
It's common to check whether a particular key already exists in the hash map with a value and then to take the following actions: If the key does exist in the hash map, the existing value should remain the way it is; if the key doesn't exist, insert it and a value for it.
Hash maps have a special API for this called entry that takes the key you want to check as a parameter. The return value of the entry method is an enum called Entry that represents a value that might or might not exist. Let's say we want to check whether the key for the Yellow team has a value associated with it. If it doesn't, we want to insert the value 50, and the same for the Blue team:
import std.collections.HashMap
fn main() {
var scores: HashMap<String, Int> = HashMap.new()
scores.insert("Blue".toString(), 10)
scores.entry("Yellow".toString()).orInsert(50)
scores.entry("Blue".toString()).orInsert(50)
println!("\(scores)")
}
The orInsert method on Entry is defined to return a mutable reference to the value for the corresponding Entry key if that key exists, and if not, it inserts the parameter as the new value for this key and returns a mutable reference to the new value. This technique is much cleaner than writing the logic ourselves and, in addition, plays more nicely with the borrow checker.
Running this code will print {"Yellow": 50, "Blue": 10}. The first call to entry will insert the key for the Yellow team with the value 50 because the Yellow team doesn't have a value already. The second call to entry will not change the hash map, because the Blue team already has the value 10.
Rust comparison: Oxide uses camelCase: orInsert instead of or_insert.
#![allow(unused)] fn main() { // Rust scores.entry(String::from("Yellow")).or_insert(50); }
Updating a Value Based on the Old Value
Another common use case for hash maps is to look up a key's value and then update it based on the old value. For instance, the following code counts how many times each word appears in some text. We use a hash map with the words as keys and increment the value to keep track of how many times we've seen that word. If it's the first time we've seen a word, we'll first insert the value 0:
import std.collections.HashMap
fn main() {
let text = "hello world wonderful world"
var map: HashMap<String, Int> = HashMap.new()
for word in text.splitWhitespace() {
let count = map.entry(word.toString()).orInsert(0)
*count += 1
}
println!("\(map)")
}
This code will print {"world": 2, "hello": 1, "wonderful": 1}. You might see the same key-value pairs printed in a different order: Recall that iterating over a hash map happens in an arbitrary order.
The splitWhitespace method returns an iterator over subslices, separated by whitespace, of the value in text. The orInsert method returns a mutable reference (&mut V) to the value for the specified key. Here, we store that mutable reference in the count variable, so in order to assign to that value, we must first dereference count using the asterisk (*). The mutable reference goes out of scope at the end of the for loop, so all of these changes are safe and allowed by the borrowing rules.
Rust comparison: Oxide uses splitWhitespace in camelCase instead of split_whitespace.
#![allow(unused)] fn main() { // Rust for word in text.split_whitespace() { let count = map.entry(String::from(word)).or_insert(0); *count += 1; } }
Hashing Functions
By default, HashMap uses a hashing function called SipHash that can provide resistance to denial-of-service (DoS) attacks involving hash tables. This is not the fastest hashing algorithm available, but the trade-off for better security that comes with the drop in performance is worth it. If you profile your code and find that the default hash function is too slow for your purposes, you can switch to another function by specifying a different hasher. A hasher is a type that implements the BuildHasher trait. We'll talk about traits and how to implement them in a later chapter. You don't necessarily have to implement your own hasher from scratch; crates.io has libraries shared by other users that provide hashers implementing many common hashing algorithms.
A Complete Example
Let's put together a more complete example that demonstrates working with hash maps. This program tracks player scores in a game:
import std.collections.HashMap
public struct GameScores {
scores: HashMap<String, Int>,
}
extension GameScores {
public static fn new(): GameScores {
GameScores { scores: HashMap.new() }
}
public mutating fn addPlayer(name: String) {
self.scores.entry(name).orInsert(0)
}
public mutating fn addPoints(name: &str, points: Int) {
if let Some(score) = self.scores.getMut(&name.toString()) {
*score += points
}
}
public fn getScore(name: &str): Int {
self.scores.get(&name.toString()).copied().unwrapOr(0)
}
public fn displayScores() {
println!("Current Scores:")
for (name, score) in &self.scores {
println!(" \(name): \(score)")
}
}
}
fn main() {
var game = GameScores.new()
game.addPlayer("Alice".toString())
game.addPlayer("Bob".toString())
game.addPoints("Alice", 10)
game.addPoints("Alice", 5)
game.addPoints("Bob", 20)
game.displayScores()
let aliceScore = game.getScore("Alice")
println!("Alice's final score: \(aliceScore)")
}
This example demonstrates:
- Creating and initializing a
HashMap - Using the
entryAPI for safe insertions - Iterating over key-value pairs
- Getting mutable references to update values
- Working with hash maps inside a struct
Summary
Vectors, strings, and hash maps will provide a large amount of functionality necessary in programs when you need to store, access, and modify data. Here are some exercises you should now be equipped to solve:
-
Given a list of integers, use a vector and return the median (when sorted, the value in the middle position) and mode (the value that occurs most often; a hash map will be helpful here) of the list.
-
Convert strings to pig latin. The first consonant of each word is moved to the end of the word and -ay is added, so first becomes irst-fay. Words that start with a vowel have -hay added to the end instead (apple becomes apple-hay). Keep in mind the details about UTF-8 encoding!
-
Using a hash map and vectors, create a text interface to allow a user to add employee names to a department in a company; for example, "Add Sally to Engineering" or "Add Amir to Sales." Then, let the user retrieve a list of all people in a department or all people in the company by department, sorted alphabetically.
The standard library API documentation describes methods that vectors, strings, and hash maps have that will be helpful for these exercises!
We're getting into more complex programs in which operations can fail, so it's a perfect time to discuss error handling. We'll do that next!