I recently encountered a task at work involving reading and writing XML, which led me to explore the performance implications of different approaches. I’d like to share my findings with you.
For our demonstration, we’ll use a sample XML text from W3Schools:
| |
Our goal is to read the values of all <name> elements and store them in a List<string>. Let’s implement this using several different methods.
Using XmlDocument
XmlDocument represents the traditional, DOM-based approach. It can be used in a couple of ways: one is by using methods like GetElementsByTagName to traverse the document tree, and the other is by using XPath expressions to select nodes directly. Let’s look at the first method:
| |
Now, let’s try it with an XPath expression:
| |
Using System.Xml.Linq
System.Xml.Linq (XDocument) offers a more modern and developer-friendly API. Here’s how to implement our task with it:
| |
While XDocument also supports XPath, its fluent API is already so concise that using XPath often isn’t necessary for simpler queries.
Using XmlReader
XmlReader is a forward-only, stream-based parser. While its API is more complex to use, it is extremely efficient in terms of speed and memory.
| |
Using Regular Expressions (Regex)
Since our task is relatively simple and the XML structure is predictable, we can also use regular expressions.
| |
Using Traditional String Methods
Finally, we can fall back on basic string manipulation methods.
| |
Spoiler alert: this method is surprisingly inefficient, much slower than the other approaches. This leads us to our secret weapon: Span<T>.
Using Span<T>
Span<T>, introduced in C# 7.2, allows for highly efficient, allocation-free memory manipulation.
| |
Performance Test
We used BenchmarkDotNet to test the performance of these methods. Here are the results:
| Case | Mean | Min | Max | Allocated Bytes |
|---|---|---|---|---|
XmlDocument | 10.65 µs | 10.40 µs | 10.79 µs | 17,608 |
XPath | 5.74 µs | 5.72 µs | 5.76 µs | 18,912 |
LINQ | 3.95 µs | 3.94 µs | 3.97 µs | 15,054 |
XmlReader | 3.04 µs | 3.02 µs | 3.06 µs | 11,776 |
Regex | 2.13 µs | 2.11 µs | 2.14 µs | 2,632 |
StringOps | 37.16 µs | 37.08 µs | 37.32 µs | 448 |
SpanOps | 134.48 ns | 133.89 ns | 135.11 ns | 448 |
So, were you impressed by the power of Span<T>? It’s in a league of its own, reaching nanosecond-level performance. From these results, we can draw a few conclusions:
- If the content we need to extract isn’t complex and the structure is consistent, we can use Regex for a very fast and low-allocation solution.
- When the task requires robust XML parsing, we should rely on dedicated XML APIs. Their performance ranking is:
XmlReader>XDocument(LINQ) >XmlDocument. - From a practical standpoint,
XDocumentis often the best choice. It’s significantly faster than the traditionalXmlDocumentand not much slower thanXmlReader, but with a much more pleasant and productive API, making it the optimal choice for most scenarios. - Using
Span<T>can lead to massive performance gains, especially when your logic involves frequent string operations likeIndexOfand slicing (Substring). It’s the clear winner for performance-critical scenarios that can be solved with string manipulation.
