Key-Value Pairs and Unique Sets: Dictionaries & HashSets

Welcome back, Pathfinder! You have a solid grasp of how to manage ordered sequences of data using arrays and List<T>. These are fantastic tools, but they have a limitation: to find an item, you either need to know its index or loop through the collection to search for it.

What if you have thousands of items, and you need to retrieve a specific one instantly? Imagine having to flip through every page of a dictionary to find a word, instead of just looking it up directly by the word itself. That's the problem we're solving today.

Let's explore two powerful collection types, Dictionary<TKey, TValue> and HashSet<T>, that are optimized for incredibly fast data lookups and managing unique items.

Tommy and Gina looking at five small and cute robots standing on a desk in the ascending order, with numbers displayed on robots' displays

The Dictionary – Your Data's Index

A Dictionary<TKey, TValue> is one of the most useful collection types in C#. It stores a collection of key-value pairs. The magic is in the key: each key in the dictionary must be unique, and it serves as a direct lookup index to its corresponding value.

The analogy to a real-world dictionary is perfect:

  • The Key (TKey): The word you look up (e.g., "Automation"). It must be unique.
  • The Value (TValue): The definition you find (e.g., "The technique of making a process or a system operate automatically.").

Because of how dictionaries are structured internally (using a hash table, which you don't need to worry about for now), retrieving a value by its key is an extremely fast operation, even for collections with millions of items. This makes them ideal for any scenario where you need quick access to data based on a unique identifier.

Dictionary<TKey, TValue>: A collection that represents a set of key-value pairs. It is optimized for retrieving a value when you know its corresponding key.

Working with Dictionaries

Let's explore how to declare, initialize, and interact with Dictionary<TKey, TValue>, ensuring your data operations are both performant and intuitive.

Declaration and Initialization

You declare a dictionary by specifying the data types for both its key and its value. A common use case is storing application configuration, where the key is a string and the value is also a string.

using System.Collections.Generic;
 
// Create an empty dictionary to store environment URLs
var environmentUrls = new Dictionary<string, string>();

You can also initialize a dictionary with values using collection initializer syntax:

var testUsers = new Dictionary<string, string>
{
    { "admin_user", "password123" },
    { "standard_user", "password456" }
};  

Adding and Accessing Items

There are two primary ways to add or update items:

  • .Add(key, value) method: This adds a new key-value pair. It will throw an exception if the key already exists.
    environmentUrls.Add("QA", "https://qa.mytestapp.com");
  • Indexer [key]: Using square brackets is more flexible. If the key exists, it updates the value. If the key does not exist, it creates a new key-value pair.
    environmentUrls["Staging"] = "https://staging.mytestapp.com"; // Adds a new entry
    environmentUrls["QA"] = "https://qa-new.mytestapp.com"; // Updates the existing entry

To access a value, you use the indexer with the key. Be careful: if you try to access a key that doesn't exist, your program will throw an exception!

string qaUrl = environmentUrls["QA"]; // Retrieves the value

Safely Checking and Retrieving Data

To avoid exceptions, it's best practice to check if a key exists before trying to access it.

  • .ContainsKey(key): This method returns true or false, and is the simplest way to check for a key's existence.
    if (testUsers.ContainsKey("guest_user"))
    {
        // ... do something
    }           
  • .TryGetValue(key, out value): This is the most efficient and professional way to get a value. It tries to find the key. If it succeeds, it puts the value into the out variable and returns true. If it fails, it returns false and doesn't throw an exception.
    if (environmentUrls.TryGetValue("Production", out string? prodUrl))
    {
        Console.WriteLine($"Production URL found: {prodUrl}");
    }
    else
    {
        Console.WriteLine("Production URL not configured.");
    }           

Removing Items and Iterating

You can remove an item with .Remove(key). To iterate over a dictionary, you use a foreach loop on the KeyValuePair<TKey, TValue>.

foreach (KeyValuePair<string, string> user in testUsers)
{
    Console.WriteLine($"Username: {user.Key}, Password: {user.Value}");
}   

Dictionaries are the perfect tool whenever you need to associate one piece of data with another.

The HashSet – A Bag of Unique Items

What if you just need to store a collection of items, but you want to guarantee that there are no duplicates? And what if your most common operation is to ask, "Is this item in my collection?" For these scenarios, a List<T> can be inefficient, as checking for existence requires scanning the list. The perfect tool for this job is the HashSet<T>.

A HashSet<T> is an unordered collection that contains no duplicate elements. Its primary superpower is its incredibly high-performance Contains() method for checking if an item exists.

For instance, managing processed orders efficiently is crucial to avoid duplicates and quickly verify which orders have already been handled. A HashSet<T> is ideal for this scenario because it guarantees unique entries and provides instant lookups, making order management seamless.

using System.Collections.Generic;
 
var processedOrderIds = new HashSet<string>();
 
// Adding items
processedOrderIds.Add("Order-A123");
processedOrderIds.Add("Order-B456");
 
// Trying to add a duplicate - this will do nothing, and Add() returns false.
processedOrderIds.Add("Order-A123");
 
Console.WriteLine(processedOrderIds.Count); // Output: 2
 
// The superpower: incredibly fast checking for existence
if (processedOrderIds.Contains("Order-B456"))
{
    Console.WriteLine("Order-B456 has already been processed.");
}   
HashSet<T>: A collection that contains a set of unique values. It is optimized for high-performance set operations, especially checking for item existence.

Because it's optimized for this "contains" check and uniqueness, a HashSet<T> does not maintain the order of its elements.

Set Operations with HashSet

Because HashSet<T> represents a mathematical set, it provides high-performance methods for standard set operations, like finding the union or intersection of two collections. This can be very handy for comparing groups of data.

var smokeTestBrowsers = new HashSet<string> { "Chrome", "Firefox" };
var fullRegressionBrowsers = new HashSet<string> { "Chrome", "Firefox", "Edge", "Safari" };
 
// IntersectWith: Modifies a set to contain only elements that are in both collections.
var commonBrowsers = new HashSet<string>(fullRegressionBrowsers);
commonBrowsers.IntersectWith(smokeTestBrowsers);
// commonBrowsers now contains {"Chrome", "Firefox"}
 
// ExceptWith: Modifies a set to remove all elements that are in another collection.
var nonSmokeBrowsers = new HashSet<string>(fullRegressionBrowsers);
nonSmokeBrowsers.ExceptWith(smokeTestBrowsers);
// nonSmokeBrowsers now contains {"Edge", "Safari"}

Other methods like UnionWith (combine all unique items) and IsSubsetOf are also available for more complex data comparisons.

Use Cases in Test Automation

Both Dictionaries and HashSets are extremely valuable tools in a test automation engineer's toolkit.

Dictionary Use Cases

  • Test Data Management: A dictionary is perfect for storing test data where you can look up a full user object by a simple test user ID (e.g., Dictionary<string, UserAccount>).
  • Configuration Settings: Store application settings like URLs, usernames, or passwords for different environments, looked up by a key like "QA_URL" or "STAGING_PASSWORD".
  • Mapping Expected to Actual Results: You can map an input value to an expected output value, making it easy to look up the correct expected result in a data-driven test.

HashSet Use Cases

  • Verifying Uniqueness: After getting a list of product IDs from an API response, you can add them all to a HashSet. If the HashSet's final Count is less than the original list's Count, you know you have duplicate IDs.
  • Checking for Presence of Elements: You can get all the links from a navigation bar on a webpage, put their text into a HashSet, and then quickly check if all your expected links (e.g., "Home", "About", "Contact") are present using multiple .Contains() calls.
  • Comparing Collections: Using set operations like ExceptWith is a very efficient way to find the difference between two collections, for example, to verify that applying a filter to a list of items correctly removed the expected items.

Choosing the Right Tool

Think about your primary need:

  • Need an ordered list that can have duplicates? Use List<T>.
  • Need to look up a value instantly based on a unique identifier? Reach for a Dictionary<TKey, TValue>.
  • Need to store a unique set of items and quickly check if something exists within that set? A HashSet<T> is your best bet.

Choosing the right data structure for the job not only makes your code more readable but can also significantly improve its performance.

Key Takeaways

  • Dictionary<TKey, TValue> is a powerful collection for storing key-value pairs, offering extremely fast value lookups using a unique key.
  • Always use methods like .ContainsKey() or the preferred .TryGetValue() for safe access to dictionary data to avoid exceptions from non-existent keys.
  • A HashSet<T> is a high-performance collection designed to store a set of unique elements.
  • The primary strength of a HashSet<T> is its incredibly fast Contains() method for checking if an item exists in the set.
  • Both Dictionaries and HashSets are fundamental tools for managing test data, configuration, and performing complex state verifications in test automation.

Advanced Data Management

What's Next?

You've now added two powerful data structures to your C# toolkit! Being able to choose between a List, Dictionary, or HashSet is a key skill for writing efficient code. While these cover most scenarios, there are other specialized collections designed to enforce a specific processing order. Next up, we'll explore Stacks and Queues and the unique testing scenarios where they are the perfect tool for the job.