Checking XML for Semantic Equivalence in C#

I was writing a bit of code for a small project and it was creating some XML that I need to pass to another application. So in order to test this functionality, I needed to compare the XML generated by my API against some hard coded XML. I started off with this:

var expectedXml = @"<add><doc><field name=""test"">testvalue</field></doc></add>";

var actualXml = MyAPI.DoSomeStuff().GenerateXml();

Assert.Equal(expectedXml, actualXml);

But I quickly found out that this wasn’t going to scale. Once the XML got too large, it would carry over too far making my tests read horribly. So, I did this:

var expectedXml = @"<add>
                        <doc>
                            <field name=""test"">testvalue</field>
                        </doc>
                    </add>";

var actualXml = MyAPI.DoSomeStuff().GenerateXml();

Assert.Equal(expectedXml, actualXml);

The problem was that now the XML wasn’t equivalent. Well, it is semantically equivalent, it just isn’t equivalent for a string comparison. The reason for this is that all of that extra white space and EOL characters screws up the comparison. You might be thinking, well, just strip out white space and EOL characters. It ain’t that easy. What happens when that white space is inside of an xml element. Well, at that point it becomes meaningful for comparison purposes.

So I didn’t want to write my own comparison code (who wants to write that?) so I started hunting around. Since I was already using the .NET 3.5 XElement libraries, I started looking there first. I came across a little method on the XNode class called DeepEquals, and guess what, it does exactly what I want. It compares a node and all child nodes for semantic equivalence. I’m sure that there are probably a few gotchas in there for me, but after preliminary tests, it appears to work perfectly.

I created a little method to do my XML asserts for me:

private void AssertEqualXml(string expectedXml, string actualXml)
{
    Assert.IsTrue(XNode.DeepEquals(XElement.Parse(expectedXml), XElement.Parse(actualXml)),
        String.Format("{0} \n does not equal \n{1}", actualXml, expectedXml));
}

There you have it. It loads the expected and actual XML into XElements and then calls “DeepEquals” on them. Now I can write my XML to compare is the most readable fashion and not worry about how they are going to compare.

Be Sociable, Share!

7 comments

  1. Nice…I was unaware of this.

    thanks
    Sunit

  2. You may want to be aware that the DeepEquals function has a couple bugs where it returns false on semantically equal trees.

    http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=400469

  3. @Adam Thanks! Yeah, I knew there was a few quirks, thanks for the link.

  4. Hey Justin;

    Any reason you didn’t use XML Canonicalization? It’s designed to address just this situation.

    Just curious.

  5. @beefarino You know of an easy way to get a canonical XML format in C#? I’m virtually positive it isn’t in the framework and but I haven’t used any good libraries that will do it. I honestly haven’t spent too much time looking for libraries to do it.

  6. Meh. It’s there, but the implementation classe are internal to system.security. You can’t get the canonical representation directly, but you can get a hash value of the canonical XML.

    Pardon my hackish code…

    [Test]
    public void C14NExample()
    {
    var hash = new List<byte[]>();
    var inputs = new string[]
    {
    "<test><sample value=’1′ anotherValue=’abcdef’ /></test>",
    "<test><sample anotherValue=’abcdef’ value=’1′/></test>",
    "<test><sample value=’1′ thisOneIsNotLikeTheOthers=’true’ /></test>"
    };

    foreach( var input in inputs )
    {
    using (var stream = new MemoryStream())
    {
    using (var writer = new StreamWriter(stream))
    {
    writer.Write(input);
    writer.Flush();

    stream.Position = 0;

    XmlDsigC14NTransform xfrm = new XmlDsigC14NTransform(false);
    xfrm.LoadInput(stream);

    hash.Add(xfrm.GetDigestedOutput(new SHA1Managed()));
    }
    }
    }

    CollectionAssert.AreEqual( hash[ 0 ], hash[ 1 ] );
    CollectionAssert.AreNotEqual( hash[ 0 ], hash[ 2 ] );
    CollectionAssert.AreNotEqual( hash[ 1 ], hash[ 2 ] );
    }

  7. @beefarino Hmm, interesting. Thanks for sharing!

Leave a comment