Friday 9 December 2011

Overloading the meaning of configuration settings is a smell

Is there anything wrong with this?

if (settings[“mysetting”] != string.Empty) 
{ 
    DoSomethingWith(settings[“mysetting”]);
}

The meaning of the configuration setting "mysetting” is overloaded. One of its possible values is special, and is used to turn some behaviour on, or off. It’s like overloading the return value of a counting method to handle error codes as well as real values; like returning –1 on failure to count, and the real count if successful.

Notionally, although the empty string is in the valid set of possible strings, it may not be in the possible set of valid values for this particular setting, in which case you might argue that it’s reasonable to take one of the invalid values and overload on it. However, that hides the fact that there’s a logical difference between turning a behaviour on, or off, and configuring that behaviour. Generally it leads to branching in the code, which makes it more complex.

That’s apart from the fact that the empty string might actually be a valid value one day.

Instead, it makes sense to think about why you need to turn the behaviour on and off, rather than just configuring it. We hit this problem recently when we wanted to add some configuration that allowed us to replace a specific string with another string[*]. Specifically we wanted to override the name of a callback numerical solver in a task definition. Originally it was using “NewtonRaphson”, and we wanted to use “SomeNAGMethod”.

I mulled over two approaches with a friend – first, hardcoding the new value and having a bool configurable flag to use it or not; and a single string configurable value for which string.Empty turns the behaviour off, and all other values actually configure the behaviour.

<add key=“FixSolverMethod” value=“false”/>
...
settings[“FixSolverMethod”] = “true”;  // to override

or

<add key=“SolverMethod” value=“”/> 
... 
settings[“SolverMethod”] = “SomeNAGMethod”; // to override

Both were tied to the idea that this was behaviour that needed to be turned off an on, and as a result implied the need for decision making at the point of use, as well as actual use of the value.

if(setting[“FixSolverMethod”]) FixSolverMethod(); 
    // which then uses a hardcoded method name

or

if(setting[“SolverMethod”] != string.Empty) 
    OverrideSolverMethod(setting[“SolverMethod”]);

I even toyed with the idea of configurable configuration items:

<add key=”SolverMethod” value=”SomeNAGMethod” enabled=”false”/>

but that way madness lay.

It felt wrong to me, and I couldn’t figure out why until I spotted this smell; that the meaning of the value was being overloaded. I realised then that I could simply extend the notion of what we were doing. Instead of turning the behaviour on or off, I could just turn leave it on and have the default value for the configurable setting set the same as the existing value.

// at initialization – gives us a default
<add key=(“SolverMethod”, value=“NewtonRaphson”/>  
...
// some time later, when we want to reconfigure. alternatively we could
// just set the new method name in the xml above ...
settings[“SolverMethod”] = “SomeNAGMethod”;  
...
// finally, just use it
UpdateSolverMethod(setting[“SolverMethod”]); 

In this case no branching is needed – the solver method name is always overwritten. At worst it’s a wasted operation, as it’s overwritten with the string that’s already there.

Once you get to that point you start thinking about the code more generally. Is this the right place to be doing this? If we’re having to fix this up in every case, can’t we do it where these definitions are created? That led us to explore parts of the codebase that we didn’t have direct experience of, and what did we find? That we had control over the tasks definitions directly – we were creating them, but in a part of the code neither of us knew about. We could keep the configuration setting we’d created, but apply it at source, logically and consistently, to configure the tasks instead of patching them up downstream.

Overloading configuration settings is a code smell. It’s worth thinking – can I set a default that matches the existing behaviour, and apply it all the time? And if I can, does that mean it’s better done somewhere else?

[*] Ah, the glamour of corporate development!

  

Friday 2 December 2011

Out (and ref) parameters in Actions, Funcs, delegates and lambdas

I ran into a couple of wrinkles I wasn’t aware of before when coding up a facade for the ThreadPool for tests today, along the lines of the Path, Directory etc facades I’ve done before.

I wanted to insert a replacement ThreadPool so I could test the GetAvailableThreads method, which takes two out parameters.

You can’t quite specify the lambda as you might expect for the method implementation. This initial attempt, for example, doesn’t work:

public static class ThreadPool
{
    public static Action<int, int> GetAvailableThreadsImpl =
        System.Threading.ThreadPool.GetAvailableThreads;

    public static void GetAvailableThreads(out int workerThreads, out int ioThreads)
    {
        GetAvailableThreadsImpl(out workerThreads, out ioThreads);
    }
}

I suspect this is to do with the calling style here -- but need to understand why it happens in more detail. You can work around it by just creating your own delegate class:

public delegate void GetAvailableThreadsDel(out int x, out int y);
 
public static class ThreadPool
{
    public static GetAvailableThreadsDel GetAvailableThreadsImpl =
        System.Threading.ThreadPool.GetAvailableThreads;

    public static void GetAvailableThreads(out int workerThreads, out int ioThreads)
    {
        GetAvailableThreadsImpl(out workerThreads, out ioThreads);
    }
}

Then follows a small wrinkle when defining a replacement behaviour in your test. You can, of course, create your own method with the right signature:

[Test]
public void Blah()
{
    int callCount = 0;
    ThreadPool.GetAvailableThreadsImpl = GetAvailableThreadsStub;
}

private int callCount = 0;
private void GetAvailableThreadsStub(out int x, out int y)
{
    if (callCount == 0)
    {
        x = 5;
        y = 5;
    }
    else
    {
        x = 10;
        y = 10;
    }

    callCount++;
}

It turns out you can define out parameters in a lambda however -- but you must also explicitly state the parameter type as well, unlike normal:

[Test]
public void Blah()
{
    int callCount = 0;
    ThreadPool.GetAvailableThreadsImpl = (out int x, out int y) =>
        {
            if (callCount == 0)
            {
                x = 5;
                y = 5;
            }
            else
            {
                x = 10;
                y = 10;
            }

            callCount++;
        };
    }
}
Again, I'm not entirely sure how it works here, and need to investigate further. For now, just a wrinkle to remember.

Friday 25 November 2011

Wrapping locking structures in tests

Over the last week I’ve been writing some locking code around a task queue, and wondering generally how to test whether locks are taken.

The straightforward thing to do is to take the locking calls as a seam and introduce a proxy, behind which I can add my own behaviour. I’ve not thought about the practicality of what followed once I started – it may be supremely impractical – but out of interest’s sake I thought I’d see where it went.

I’ve been working with ReaderWriterLockSlim a fair amount, so that was a natural place to start. I added a hook that meant I could introduce a new implementation for EnterWriteLock. There are a couple of wrinkles over the work I did with static classes recently – in this case I’m explicitly proxying a class instance, so I can’t rely on static hooks into external code, and default them with a static method group.

Neither can I specify a lambda, as it would rely on calling methods on an instance that wasn’t available at compile time, leading to errors like “cannot access non-static member in static context”. This just means I need to resolve any default implementation at runtime.

namespace Threading.Abstractions
{
    public class ReaderWriterLockSlim
    {
        private readonly System.Threading.ReaderWriterLockSlim _lock = 
            new System.Threading.ReaderWriterLockSlim();

        public ReaderWriterLockSlim()
        {
            EnterWriteLockImpl = (Action)(() => _lock.EnterWriteLock());
        }

        public void EnterWriteLock()
        {
            EnterWriteLockImpl();
        }

        public Action EnterWriteLockImpl;
    }
}
Then in a test I might replace the EnterWriteLock implementation:
[TestFixture]
public class Tests
{
    [Test]
    public void SimpleLockTest()
    {
        var locker = new Threading.Abstractions.ReaderWriterLockSlim();

        var count = 0;
        locker.EnterWriteLockImpl = (Action)(() => count++);

        locker.EnterWriteLock();

        Assert.AreEqual(1, count);
    }
}

So, that's good, I can tell if the lock has been taken or not. I can take it a little bit further though. One thing I might want to do is check log which locks had been taken when – something that might notionally be useful even in production code. If I add an OnEnterWriteLock hook then I can log an entry if needed:

namespace Threading.Abstractions
{
    public class ReaderWriterLockSlim
    {
        private readonly System.Threading.ReaderWriterLockSlim _lock = 
            new System.Threading.ReaderWriterLockSlim();

        public ReaderWriterLockSlim()
        {
            EnterWriteLockImpl = (Action)(() => _lock.EnterWriteLock());
        }

        public Action OnEnterWriteLock;

        public Action AfterEnterWriteLock;

        public void EnterWriteLock()
        {
            if(OnEnterWriteLock != null) OnEnterWriteLock();

            EnterWriteLockImpl();

            if (AfterEnterWriteLock != null) AfterEnterWriteLock();
        }

        public Action EnterWriteLockImpl;
    }
}

After that I can add any logging that I might want to – I could append part of the stack trace for example; the name of the object that’s being called; or any number of things that help us track how it’s used:

[TestFixture]
public class Tests
{
    [Test]
    public void LockTest()
    {
        var locker = new Threading.Abstractions.ReaderWriterLockSlim();

        locker.OnEnterWriteLock = 
            (Action) (() => Console.WriteLine("Entering write lock") );
        locker.AfterEnterWriteLock = 
            (Action)(() => Console.WriteLine("Entered write lock"));

        locker.EnterWriteLock();
    }
}

Another interesting thing to try might be to actually create a timeline of all locks being taken and released. If I create a simple lock log:

namespace Threading.Abstractions
{
    public static class LockHistory
    {
        private static object locker = new object();

        public readonly static 
            System.Collections.Generic.List Entries = 
                new System.Collections.Generic.List();

        public static void Add(string lockEvent)
        {
            lock (locker) Entries.Add(lockEvent);
        }
    }
}

I can then add a log entry every time a lock is taken or released, and play that log back at the end of the test.

[TestFixture]
public class Tests
{
    [Test]
    public void LockTest()
    {
        var locker = new Threading.Abstractions.ReaderWriterLockSlim();

        locker.OnEnterWriteLock = 
            (Action) (() => Threading.Abstractions.LockHistory.Add("Entering Write Lock") );
        locker.AfterEnterWriteLock = 
            (Action)(() => Console.WriteLine("Entered write lock"));
        
        locker.EnterWriteLock();

        foreach (var entry in Threading.Abstractions.LockHistory.Entries)
        {
            Console.WriteLine(entry);
        }
    }
}

The log history could then be parsed or analysed in some way. All good fun, although, as I say -- I'm not sure how practical it might be.

Wednesday 16 November 2011

The distinctiveness of is, be, has; add

Tripped up today in a discussion of a code change in which I thought an interface implementation was being proposed, where in fact the proposal was to serve up an instance implementing the new interface.

We were discussing an existing class that acted somewhat like a registry, with which the population of instances to be served up was defined at construction time. We needed it to delay resolution of one of the types it was serving up until relatively late – at least, well after construction of the registry.

The phrase that tripped me up was along the lines of: “Add a new interface to the registry, named ISomeFactoryInterface”.

I interpreted this to mean that the registry should implement the factory interface, and had a hard time persuading my counterpart that this was a bad idea.

Of course, it was hard because he was using it to mean has, not is. He thought the registry should be serving up instances of this new interface. That was actually the position I came to too, so in the end we converged anyway.

Interestingly, if he’d used the word has explicitly I might still have misinterpreted to start with. In normal English saying that something “has a certain interface” might imply that it implements that interface.

It reminded me that in the absence of actual code being written down, we have to be really very careful about how we use certain words.

Tuesday 1 November 2011

Comment on Mark Seemann’s post on Abstract Factory and Service Locator

Mark Seemann has a post that I’ve been puzzling over, because the patterns he presents didn’t make sense to me. It’s about a year old, but the issues seem to come up again and again. I thought I’d hash over it for my own benefit.

He’s summarizing the differences between Abstract Factory and Service Locator, saying they’re easily confused but are actually easy to differentiate.

He shows a condensed Abstract Factory interface:

public interface IFactory<T>
{
    T Create(object context);
}

and states that Abstract Factory is a generic type that can only return instances of one type -- the type given by the generic parameter.

To me that’s not an Abstract Factory. The original Design Patterns book states that the pattern’s intent is creating families of related or dependent objects; it can therefore a single factory type can create instances of multiple types. Mark’s factory can only create one type, dictated by the closed type of the factory.

A real abstract factory interface might looks more like:

public interface IEcosystem
{
    IHumanoid CreateHumanoid();

    ICreature CreateCreature();
}

This does look a little like a Service Locator, in that it each IEcosystem-derived type can return multiple types. A Service Locator also doesn’t need to be generic – it might have a distinct method per service type returned.

He also posits a simplified ServiceLocator interface:

public interface IServiceLocator
{
    T Create<T>(object context);
}

(again, slightly condensed by me) that has a method named Create. Again, this interface doesn’t look right. A Service Locator doesn't handle instantiation for you, it simply hands you references to single already-instantiated services; it's a structural pattern. I'd have expected a method like T Locate<T>(object context).

Specific instances of Service Locator might be created -- if you relax the requirement that it's a singleton -- that are then populated with distinct sets of services for different domains, but typically you wouldn't need multiple Service Locator types.

For me the key difference between Abstract Factory and Service Locator is that the first is a creational pattern, and the second a structural pattern.

The Factory exists in single instances of multiple distinct types, one for each family of related classes. The Locator is traditionally a singleton at global scope, with which single instances of each type supported are registered; it also supports types that are contextually related but not necessarily as closely related as those supported by the Factory.

Sunday 16 October 2011

Wiping the state clean: static stateful classes in NUnit tests

I’m automating some tests that are currently run manually in a batch script that spins up the same executable multiple times, with varying options. As always, I’d like to get the tests started without modifying existing code, and preferably without adding any new code.

Unfortunately the application poses a few problems. The entry class has a curiously-implemented singleton behaviour to ensure that only one is running in a process. It also keeps a static ref to a set of command line options, but doesn’t give the client any way of clearing those options to restart processing. I also need to inject some behaviour that would allow the test to sense that the correct thing had been done. The class looks a bit like this:

namespace AppDomainInNUnit
{
    internal class BaseApplication
    {
        public static BaseApplication Instance = null;

        public BaseApplication()
        {
            if(Instance == null)
            {
                Instance = this;
            }
            else
            {
                throw new InvalidOperationException("ctor fail");
            }
    }

    class Application : BaseApplication
    {
        static int Main(string[] args)
        {
            var app = new Application();
            return app.Run();
        }

        int Run()
        {
            _options.Add("option", "added");
            foreach(var option in _options.Keys)
                Console.WriteLine("{0} {1}", option, _options[option]);
            return 0;
        }

        public static Action<string> PostFinalStatus = s => Console.WriteLine(s);

        private static readonly Dictionary<string, string> _options = new Dictionary<string, string>();
    }
}

I could add methods to the existing code to clear the options and to handle the singleton behaviour, but that means changes. In principle there’s no need, as I can instantiate the classes I need in a separate AppDomain, and unload that AppDomain (and dump any static state) when I’m done.

To get this under test I need to do several things. First, I need to create a facade for the Application class that can be marshalled across AppDomain boundaries. Second, I need to consistently create and unload AppDomains in NUnit tests – I want to unload at the end of each test to dump the class’ static state. Finally, I want to inject some behaviour for the PostFinalStatus delegate that will allow the test to sense the state posted by the class.

So, first I need a facade for Application. This is partly because I need to create a class that can be marshalled across AppDomain boundaries. To use marshal-by-reference semantics, and have the instance execute in the new AppDomain, it’ll need to inherit MarshalByRefObject. Unfortunately I need to break the no-changes constraint, and add an internal hook to allow the facade to kick the Application class off.

The TestMain bootstrap class is a basically a copy of the existing Main( … ):

        internal static int TestMain(string[] args)
        {
            return Main(args);
        }

Then I have a simple Facade:

    public class Facade : MarshalByRefObject
    {
        public Facade()
        {
            Console.WriteLine("Facade default ctor in {0}", 
                        Thread.GetDomain().FriendlyName);
        }

        public void Run(string[] args)
        {
            Console.WriteLine("Calling to {0}.", 
                        Thread.GetDomain().FriendlyName);
            
            Application.TestMain(args);
        }
    }

Creating an AppDomain in an NUnit test is fairly straightforward, although it confused me for a while until I realised that the tests are run under a different application – in my case either the JetBrains TestRunner, or nunit-console. In either case the code is not actually available under the executable’s application base directory. The solution is to explicitly set the application base directory for the new test AppDomain:

        [Test]
        public void Test()
        {
            var callingDomain = Thread.GetDomain();
            var callingDomainName = callingDomain.FriendlyName;
            
            Console.WriteLine("{0}\n{1}\n{2}", 
                callingDomainName, 
                callingDomain.SetupInformation.ApplicationBase, 
                callingDomain.SetupInformation.PrivateBinPath);

            var domain = AppDomain.CreateDomain("test-domain", null, null);
            
            Console.WriteLine("{0}\n{1}\n{2}", 
                domain.FriendlyName, 
                domain.SetupInformation.ApplicationBase, 
                domain.SetupInformation.PrivateBinPath);
            
            AppDomain.Unload(domain);

            var setup = new AppDomainSetup() { 
                ApplicationBase = callingDomain.SetupInformation.ApplicationBase };
            
            domain = AppDomain.CreateDomain("test-domain", null, setup);
            
            Console.WriteLine("{0}\n{1}\n{2}", 
                domain.FriendlyName, 
                domain.SetupInformation.ApplicationBase, 
                domain.SetupInformation.PrivateBinPath);

            var assembly = Assembly.GetExecutingAssembly().CodeBase;

            var facade = domain.CreateInstanceFromAndUnwrap(
                assembly, 
                "AppDomainInNUnit.Facade") as Facade;
            
            facade.Run(new [] { "some", "args" });

            AppDomain.Unload(domain);
        }

There’s a lot of cruft in there, but the intent is to show which AppDomain the class is actually being instantiated in, for clarity.

Finally, I want my tests to be informed of the Application’s final status when it posts it. This is tricky; I’ve been able to do it by passing in a delegate that sets some property data on the test AppDomain, then querying the AppDomain property data during the test assert stage. First the facade needs a method that allows me to set the post behaviour for Application:

        public void SetPostBehaviour(Action<string> behaviour)
        {
            Application.PostFinalStatus = behaviour;
        }

Then I need to pass a delegate with the right behaviour in during the test, and assert on the value of that data before unloading the test AppDomain:

            ...
            var assembly = Assembly.GetExecutingAssembly().CodeBase;

            var facade = domain.CreateInstanceFromAndUnwrap(
                assembly, 
                "AppDomainInNUnit.Facade") as Facade;
            
            var posted = false;
            facade.SetPostBehaviour(s => Thread.GetDomain().SetData("posted", true));
            
            facade.Run(new [] { "some", "args" });

            Assert.IsTrue((bool)domain.GetData("posted"));
            ...

So, it's possible to work around these static classes without too many intrusive changes to the existing code. Generally:

  1. To instantiate and execute a class in another AppDomain the class should derive from MarshalByRefObject (see Richter, CLR via C#). You’ll get a proxy to the instance in the new domain.
  2. Don’t tag with Serializable instead of deriving from MarshalByRefObject – you’ll get a copy of the instance in the other domain, and will still run in the original domain.
  3. You can, but you shouldn’t invoke static methods or fields on your class that derives from MarshalRefByObject. It won’t give you the effect you expect: all static calls are made in the calling domain, not the separate domain. To make calls on static methods you need to create a facade or adapter that can be marshalled across the domain boundary, and have that make the static calls for you.
  4. In the various CreateInstance… methods on AppDomain remember to use a fully qualified type name for the class you want to instantiate. This generally isn’t clear from examples like that in Richter, who dump all their classes in the global namespace (tsk).
  5. If you’re creating an AppDomain in an NUnit test, remember that you might need to set the ApplicationBase for the domain to the directory holding your test code.
  6. I wanted to get AppDomain.ExecuteAssembly(string, string[]) working, which would mean that I wouldn’t need a TestMain hook in the existing code. I’ve not managed to get it working yet, though. In any event, I’d still need a facade that I could marshal across the boundary in order to set the status post behaviour.

Saturday 8 October 2011

Preventing inadvertent changes to static classes

This is a followup to these posts, where I developed a way of opening up a seam along filesystem access in C#.

The last post highlighted a problem with the approach: when behaviour is changed by injecting a new delegate it is changed for all subsequent users of the class.

This is a small problem in tests – you just need to make sure that you reset your static class state in test setup and tear down.

In a production system however it could be a much larger problem. A developer could change the behaviour of the class without realising the consequences, and change the behaviour of the system in an unexpected way. Equally, they could deliberately subvert the behaviour of the system – but if they’re going to do that, they’d probably find a better way of doing it.

You could avoid this by treating this technique as only a short term measure to open a seam while refactoring legacy code, and rapidly refactor again to something more robust, perhaps using a factory or IOC container to inject different implementations for test or production code. This is entirely reasonable; the technique gets very quick and easy results, and you could definitely use it while characterizing legacy code and rapidly refactor away from it.

Alternatively you could make it somewhat lockable, and have it warn if code tries to change behaviour without. I’m musing now, because I tend to think about this technique in the sense I’ve just described. In principle though you could guard against careless changes in behaviour by having the class “locked” by default, and only allow changes to behaviour after first unlocking the class.

Tangentially this raises a question – is a physical lock a form of security by obfuscation and awkwardness?

To give an example, consider this test code for a notional File class:

    [TestFixture]
    public class FileTests
    {
        [SetUp]
        public void SetUp()
        {
            File.Unlock();
            File.CopyImpl = null;
            File.Lock();
        }

        [Test]
        [ExpectedException("System.InvalidOperationException")]
        public void File_ShouldWarn_IfBehaviourChangesWhenLocked()
        {
            File.CopyImpl = null;
        }

        [Test]
        public void File_ShouldAllow_BehaviourChangeWhenUnlocked()
        {
            File.Unlock();
            File.CopyImpl = (src, dst, ovrt) => { };
        }
    }

I don't see the lock as a strong lock -- just a flag to make a developer stop and think about what they're doing, and to allow a mechanism to raise an alert if it's done without thought. An implementation that matches these tests could look like:

    public static class File
    {
        private static Action<string, string, bool> _copyImpl = System.IO.File.Copy;

        private static bool _locked = true;

        public static Action<string, string, bool> CopyImpl
        {
            get
            {
                return _copyImpl;
            }
            set
            {
                if(_locked)
                {
                    throw new InvalidOperationException("IO.Abstractions.File is locked to disallow behaviour changes");
                }
                _copyImpl = value;
            }
        }

        public static void Unlock()
        {
            Trace.WriteLine("IO.Abstractions.File now unlocked for behaviour changes");
            _locked = false;
        }

        public static void Lock()
        {
            Trace.WriteLine("IO.Abstractions.File now locked for behaviour changes");
            _locked = true;            
        }

        public static void Copy(string source, string destination, bool overwrite)
        {
            CopyImpl(source, destination, overwrite);
        }
    }

The underlying implementation is the same as I presented before, with a couple of changes. There are now obvious Lock and Unlock methods; I'm using these to just set or unset an internal flag. I'm wrapping the point of injection of new behaviour for CopyImpl in a property, and using the set method to check the locking status. If the class is locked, it'll throw an exception to indicate that you're not meant to be changing it. The class is locked by default.

Exactly how useful this is in the long term I'm not sure. It's tempting to keep the Abstractions in place, as the class structure is simpler than the normal interface-and-factory style approach that you might use out of choice. It's possible that an IOC container might be abused in the same way as this technique, by a developer who wants a specific local behaviour but doesn't understand that their changes might have global consequences.

And I've not even touched on any possible threading problems.

Friday 7 October 2011

Wrinkles with overloaded methods when using static classes in tests

This is another followup to these posts.

When mimicking the interface of an existing class you may need to provide behaviour for a set of overloaded methods. At that point you hit a subtlety of the way we’re injecting behaviour – you can’t overload on the delegate type and retain the same delegate name.

So, ideally for some replacement File class I’d like to do this, but unfortunately I can't:

public static class File
{
    public static Action<string, string> CopyImpl = System.IO.File.Copy;

    public static Action<string, string, bool> CopyImpl = System.IO.File.Copy;

    public static void Copy(string source, string destination)
    {
        CopyImpl(source, destination);
    }

    public static void Copy(string source, string destination, bool overwrite)
    {
        CopyImpl(source, destination, overwrite);
    }
}

There are a couple of alternatives. I could rename the second delegate to something like "CopyWithOverwriteImpl", which feels very clunky:

public static class File
{
    public static Action<string, string> CopyImpl = System.IO.File.Copy;

    public static 
        Action<string, string, bool> CopyWithOverwriteImpl = System.IO.File.Copy;

    public static void Copy(string source, string destination)
    {
        CopyImpl(source, destination);
    }

    public static void Copy(string source, string destination, bool overwrite)
    {
        CopyWithOverwriteImpl(source, destination, overwrite);
    }
}

Alternatively I could implement both in terms of the operation with the broader signature.

public static class File
{
    public static Action<string, string, bool> CopyImpl = System.IO.File.Copy;

    public static void Copy(string source, string destination)
    {
        CopyImpl(source, destination, false);
    }

    public static void Copy(string source, string destination, bool overwrite)
    {
        CopyImpl(source, destination, overwrite);
    }
}

I prefer the latter approach, because the code seems simpler. However, it glosses over some detail -- what if our intent is truly to mimic System.IO.File, and the behaviour of System.IO.File.Copy changes to allow overwriting? Admittedly, this particular change would have quite widespread ramifications, but it might happen with other classes.

Realistically I think there's two steps here: the first opens up a seam in legacy code -- in which you want to replicate the existing System.IO.File behaviour. As long as you write the method to replicate this behaviour at the time you open the seam, then you're ok.

The next step comes when you start using these classes by default in new code. In that case, you get to choose the behaviour you want; you don't need to care about the behaviour of a particular overload of System.IO.File.Copy.

The latter approach seems best to me.

Using static classes in tests

This is a followup from my previous post on mocking out static classes and injecting behaviour.

The disadvantage to injecting test specific behaviour into a static class is that if you’re not careful you can leak behaviour from test to test. This certainly happens in NUnit, in which a static class will be instantiated once for a test session – not once per test instance.

Take these psuedo-tests for example:

[Test]
public void FirstTest()
{
    IO.Abstractions.Path.CombineImpl = (path1, path2) => "some\\default\\path";

    Assert.AreEqual("some\\default\\path", IO.Abstractions.Path.Combine("some", "path"));
}

[Test]
public void SecondTest()
{
    // assume we're using the default implementation -- System.IO.Path.Combine

    Assert.AreEqual("some\\path", IO.Abstractions.Path.Combine("some", "path"));
}

FirstTest will succeed. If FirstTest runs first, then SecondTest will fail; if SecondTest runs first, SecondTest will succeed. the problem is that FirstTest sets the behaviour for Path.Combine and SecondTest doesn't override it.

The trick is to ensure you reset the behaviour in test setup and teardown. You can just reset to default behaviour in either:

[SetUp}
public void SetUp()
{
    IO.Abstractions.Path.CombineImpl = System.IO.Path.Combine;
}

or you might want to set the behaviour to null, and just set the defaults on teardown:

[SetUp}
public void SetUp()
{
    IO.Abstractions.Path.CombineImpl = null;
}

[TearDown}
public void TearDown()
{
    IO.Abstractions.Path.CombineImpl = System.IO.Path.Combine;
}

This latter approach has the advantage that -- if you null out all possible behaviour across the static classes used -- you immediately get to sense which methods are being hit in any given test, and where in the code under test it's being hit from.

Sunday 25 September 2011

Injecting behaviour for static method calls (filesystem mocking redux)

This is a direct evolution of my last post, which dealt with decisions faced in testing code that hits the filesystem. The solution I posted was a bit clunky, and inconsistent; this is better, and works generally if you’re faced with a static method whose behaviour you want to change but can’t, immediately.

A lot of the following process has been informed by Michael Feathers’ “Working Effectively With Legacy Code”. Go read it if you haven’t.

So, imagine you have this idealized legacy code:

using System.IO;
using System.Xml.Linq;

namespace AbstractionExamples
{
    public static class TradeFileLoader
    {
        public static ITrade Load(string rootFolder, string file)
        {
            ITrade trade = null;
            var filePath = Path.Combine(rootFolder, file + ".trade");
            if (File.Exists(filePath))
            {
                trade = Trade.Load(filePath);
            }
            return trade;
        }
    }

    public class Trade : ITrade
    {
        public static ITrade Load(string file)
        {
            var xml = XDocument.Load(file);
            return new Trade(xml);
        }

        public Trade(XDocument xml) { /* ... */ }
    }
}

I want to get it under test, changing as little of the existing code as possible in the process.

The first thing to look at is the use of Path, and then File, as they can both be treated the same way. Path and File are static, and I don’t have the code; this means I can't just derive from them. What I want to do is create a class that provides a pass-through to the existing IO code, without modifying any parameters or behaviour in flight by default.

So, I mimic the existing Path class in a new namespace. In the new class I provide delegates for desired behaviour that defaults to the original System.IO behaviour. First I write a few simple tests like:

    [TestFixture]
    public class Tests
    {
        [Test]
        public void PathTest()
        {
            var path = @"root\file";

            var actual = IO.Abstractions.Path.Combine("root", "file");

            Assert.AreEqual(path, actual);
        }
    }

and then the new classes:

using System;
using System.IO;

namespace IO.Abstractions
{
    public static class Path
    {
        public static Func<string, string, string> CombineImpl = Path.Combine;

        public static string Combine(string first, string second)
        {
            return CombineImpl(first, second);
        }
    }

    public static class File
    {
        public static Func<string, bool> ExistsImpl = File.Exists;

        public static bool Exists(string file)
        {
            return ExistsImpl(file);
        }
    }
}

Now I can introduce these new classes along this seam by just swapping the namespace:

//using System.IO;  // no longer needed
using System.Xml.Linq;
using IO.Abstractions;

namespace AbstractionExamples
{
    public static class TradeFileLoader
    {
        public static ITrade Load(string rootFolder, string file)
        {
            ITrade trade = null;
            var filePath = Path.Combine(rootFolder, file + ".trade");
            if (File.Exists(filePath))
            {
                trade = Trade.Load(filePath);
            }
            return trade;
        }
    }

    public class Trade : ITrade
    {
        public static ITrade Load(string file)
        {
            var xml = XDocument.Load(file);
            return new Trade(xml);
        }

        public Trade(XDocument xml) { /* ... */ }
    }
}

The Trade-related changes are slightly more complicated – because we want to replace XDocument. XDocument is not a static class, and we do actually use instances of it.

If I want to avoid all changes to the existing code then I’ll need to turn my as-yet unwritten IO.Abstractions.XDocument into a proxy for System.Xml.Linq.XDocument. Unfortunately System.Xml.Linq.XDocument doesn’t implement an interface directly; fortunately it isn’t sealed, so I can derive from it. Even better, System.Xml.Linq.XDocument also provides a copy constructor so I don’t need to wrap an instance and present the correct methods myself. I just need to derive from System.Xml.Linq.XDocument, and construct using the copy constructor.

namespace Xml.Abstractions
{
    public class XDocument : System.Xml.Linq.XDocument
    {
        public XDocument(System.Xml.Linq.XDocument baseDocument) : base(baseDocument)
        {
        }

        public static Func<string, XDocument> LoadImpl = 
            f => new XDocument(System.Xml.Linq.XDocument.Load(f));

        public new static XDocument Load(string file)
        {
            return LoadImpl(file);
        }
    }
}

I’m a little concerned with the amount of work going on in construction here; I’ll have to take a look at how the copy constructor behaves.

Finally I can introduce the new XDocument along the seam in TradeFileLoader, so we end up with exactly what we started with, except for the changed namespace directives:

//using System.IO;
//using System.Xml;
using IO.Abstractions;
using Xml.Abstractions;

namespace AbstractionExamples
{
    public static class TradeFileLoader
    {
        public static ITrade Load(string rootFolder, string file)
        {
            ITrade trade = null;
            var filePath = Path.Combine(rootFolder, file + ".trade");
            if (File.Exists(filePath))
            {
                trade = Trade.Load(filePath);
            }
            return trade;
        }
    }

    public class Trade : ITrade
    {
        public static ITrade Load(string file)
        {
            var xml = XDocument.Load(file);
            return new Trade(xml); 
        }

        public Trade(XDocument xml) { /* ... */ } 
    }
}

So, the process I've gone through when faced with these is:

  1. Trace static calls until you run out of code you own (e.g. to Path.Combine, XDocument.Load).
  2. Create an abstracted class in a new namespace that mimics the existing static interface.
  3. Provide the new class with delegated behaviour that defaults to passing through to the original.
  4. (And naturally you'll have tests for this new code to verify that).
  5. If the class you’re overriding is not static, derive from the original class and hope that there’s a copy constructor, or be ready for potentially lots more work.
  6. Swap in new behaviour on the seam by just replacing old namespace directives (using System.IO, &c) with the new.

Finally, if I want to test the new class I just need to inject my desired test behaviour into Path, File and XDocument -- after adding a method to ITrade and Trade to return the root element of the trade XML. It turns out that the arrange, act, assert flow looks quite natural:

namespace AbstractionExamples.Tests
{
    [TestFixture]
    public class Tests
    {
        [Test]
        public void ReturnsValidTrade_WhenGivenAValidFile()
        {
            File.ExistsImpl = 
                f => { if (f.Equals(@"root\file.trade")) return true; return false; };
            XDocument.LoadImpl = 
                f =>
		{
                    var doc = new XDocument(
                        System.Xml.Linq.XDocument.Parse("<root></root>"));
                    return doc;
                };

            var trade = TradeFileLoader.Load("root", "file");

            Assert.AreEqual("root", trade.Root.LocalName);
    }
}

Tuesday 20 September 2011

Hitting the filesystem; mock it out, or abstract as a Store?

In legacy code it's entirely possible that code directly touches the filesystem, and there's no seam at which to modify behaviour. Assuming you want to get the code under test then you need to decide whether you’re going to abstract out the filesystem, or alternatively introduce an abstraction to isolate filesystem operations. Typically I'd look to isolate the filesystem at high level with some sort of Store that abstracts access to underlying persistence tech.

Over a long period either approach might lead to code with both solutions present.

Chris Oldwood and I have been discussing whether to start by introducing a Store, or with abstracting out the filesystem. To me the decision should be based on limiting the number of changes required to meet your goal; and limiting the number of goals.

Introducing a Store meets three goals: one, to abstract access to the technology in which you persist entities; two, to simplify the existing code; and three, to give you a seam along which to inject new behaviour.

On the other hand abstracting out the filesystem really only provides a seam.

It’s a fine point of difference. In principle both introduce a layer of abstraction under which you could swap new technologies. However, to use a Store you change the semantics of the interface, which involves changes in the existing code. If you abstract access to the filesystem then the existing code can remain unchanged, and filesystem semantics don’t map well to other technologies.

So using a Store involves more changes to existing code, and is also meeting more goals than you might immediately need – do you need to abstract the technology used at this point in time?

Take this code for example:

using System.IO;

...

public ITrade DecorateTrade(string rootfolder, string decorationType, ITrade trade)
{
    ITrade decoratedTrade;
    var file = String.Format("{0}.Decoration.dat", decorationType);
    file = Path.Combine(rootFolder, file);

    if(File.Exists(file))
    {
        decoratedTrade = Trade.LoadFromFile(file, trade);
    }
    
    return decoratedTrade
}

Here DecorateTrade knows a lot about the filesystem – how to create a Path, what a file is and how to check that a file exists, and that it needs a filename to load from. We want to test this code but are wary because it touches the filesystem.

We can refactor by introducing a Store that abstracts access to the underlying entities. An example would be:

public class DecorationFileStore : IDecorationStore
{
    public DecorationStore(string rootFolder)
    {
        _rootFolder = rootFolder;
    }

    // Interface method
    public ITrade Decorate(string decoration, ITrade trade) { ... }

    private bool DecorationExists(string decorationType) { ... }

    private string _rootFolder;
}

This tends to force some changes on the existing code, however, which becomes:

//using System.IO; // no longer needed

...

public ITrade DecorateTrade(string rootfolder, string decorationType, ITrade trade)
{
    var store = new DecorationFileStore(rootFolder);

    return store.Decorate(decorationType, trade);
}

As you can see the original code must change to meet the new semantics of the Store interface. It's not bad -- it's just that as well as creating a new class, you've changed the existing code. The new code is actually much simpler; in some sense we’ve swapped local complexity for coarser type complexity. It could be simpler still, as the Store would become a member of the class, and wouldn't need to be instantiated as a local instance. Still, if your intent is to characterize the behaviour of the class you’ll want to make as few changes as possible.

Swapping local complexity for type system complexity like this might also lead to a proliferation of Store types, each operating as an abstraction of the filesystem, which may be unnecessarily complicated.

Alternatively then we could abstract the filesystem operations. Take these two new classes:

namespace IO.Abstractions
{
    public class File : IFile
    {
        public bool Exists(string path)
        {
            return System.IO.File.Exists(path);
        }    
    }

    public class Path
    {
        public string Combine(string first, string second)
        {
            return System.IO.Path.Combine(first, second);
        }
    }
}

These are intended to be direct passthroughs to the filesystem. By decorating the original class with new properties,

public IFile File { get; set; }

public Path Path { get; set; }

and by replacing references to System.IO with IO.Abstractions, we introduce a seam in the existing code without having to make any changes in that code. Instead of calling the static methods on System.IO classes, we call instance methods on IO.Abstractions classes through local properties. This means that we can now write tests for the system without changing the existing code.

// using System.IO; // no longer needed
using IO.Abstractions;

...

public IFile File { get; set; }

public Path Path { get; set; }

public ITrade DecorateTrade(string rootfolder, string decorationType, ITrade trade)
{
    ITrade decoratedTrade;
    var file = String.Format("{0}.Decoration.dat", decorationType);
    file = Path.Combine(rootFolder, file);

    if(File.Exists(file))
    {
        decoratedTrade = Trade.LoadFromFile(file, trade);
    }
    
    return decoratedTrade
}

That is, almost -- the story's not quite finished. We still have a call on Trade.LoadFromFile(...) which hits the filesystem -- and we don't have any seam within Trade to use to isolate the filesystem ops. In theory Trade probably shouldn't know about the filesystem at all, but typically we want to minimize the number of changes needed to get under test.

In this case I tend to heal around this dependency. There's a number of ways to do it -- extracting the access as a public method and inheriting to create a test proxy class works, if your class isn't sealed or static. If it is sealed then you can introduce a seam by wrapping the access in another type, or by publishing a delegate that the existing code will call; either gives you a seam at which point you can override behaviour.

Friday 16 September 2011

Rhino.Mocks and manual fakes

Habitually, I use Rhino Mocks. Chris Oldwood and I were discussing fakes and mocks, and he commented that he strongly disliked the Rhino syntax. He much preferred writing manual stubs with delegates for test-specific behaviour because the syntax was clearer and more portable.

He may have a point.

To compare, I knocked up an example test; it just checks the number of times that a method is called on some mocked component:

[Test]        
public void UsingRhinoMocks()        
{
    var mockComponent = MockRepository.GenerateMock<component>();
    
    int expect = 10;            
    mockComponent.Expect(x => x.Operation(null))
        .IgnoreArguments()                
        .Return(new Result())
        .Repeat.Times(expect);                        

    var svc = new Service(mockComponent);            
    svc.ServiceMethod(expect);             

    mockComponent.VerifyAllExpectations();        
}

It’s brief, but somewhat hard to parse. In contrast, here’s the same test as Chris would have it:

[Test]        
public void UsingAManualFake()        
{            
    var fakeComponent = new FakeComponent();             

    int expect = 10;            
    bool called = false;            
    int calls = 0;            
    fakeComponent.Operation = () =>              
        {                  
            called = true;                  
            calls++;                  
            return new Result();              
        };             
    
    var svc = new Service(fakeComponent);            
    svc.ServiceMethod(expect);             

    Assert.IsTrue(called);            
    Assert.AreEqual(expect, calls);        
}

The points here are that the specific tests are clear – you don’t need to reparse the expectations by eye to see what the result should be; and that the setting of expectations is clearer – it looks like a coherent body of code, rather than the notionally fluent style used by Rhino.Mocks. Clearly I’ve left out the definition of the FakeComponent class, among other things – and there lies the principle problem. Fakes can become brittle, especially when shared between different test sets. You have to maintain that class as well as any callsites in the tests as the interface changes.

I took the point about clarity and thought: most of the FakeComponent is actually boilerplate on top of the interface. It’d be easy to dynamically instantiate something using our own dynamic proxy.

But then, Rhino.Mocks is basically creating a dynamic proxy for us.

It turns out we can use Rhino.Mocks to do the same thing, with a helper method to instantiate the mock instance:

private static T GetMock<T,A>(Action<T> method, A del) where T : class        
{   
    // yuck ... but we don't have generic delegate constraints!         
    var action = del as Delegate;              
    
    var mockComponent = MockRepository.GenerateStub<T>();            
    mockComponent.Stub(method).IgnoreArguments().Do(action);            

    return mockComponent;        
}

and then the modified test looks like this, with specific asserts at the end, and a coherent non-fluent code body:

         
[Test]        
public void UsingRhinoMocksLikeAFake()        
{            
    int expect = 10;            
    bool called = false;            
    int calls = 0;            
    var method = new Action<IComponent>(x => x.Operation(null));
    var action = new Func<string, Result>(
        (x) => {
            called = true;
            calls++;
            return new Result();
         });             

    var mockComponent = GetMock(method, action);

    var svc = new Service(mockComponent);            
    var ret = svc.ServiceMethod(expect);             

    Assert.IsTrue(called);            
    Assert.AreEqual(expect, calls);        
}

One wrinkle is that now we need to explicitly define the method that will be called on IComponent, and also the behaviour we want to see when the method is called; it’s like a union of the Rhino and Fake approaches. It’s not quite as clear – but it’s only a line longer.

Tuesday 13 September 2011

Collections of anonymous types

A friend at work today wondered if we could make a list of anonymously-typed objects. It feels like we should be able to -- judging by LINQ -- but there didn't seem to be a mechanism for doing it direct. However, you can create a seed array and generate a list from that:
using System;
using System.Linq;

namespace ListOfAnonymous
{
    class Program
    {
        static void Main(string[] args)
        {
            var seedArray = new[] { new { Foo = "dingo", Bar = "bingo" } };
            var theList = seedArray.ToList();
            theList.Add(new { Foo = "ringo", Bar = "zingo" });
            theList.ForEach(x => Console.WriteLine(x.Foo));
            
            Console.ReadKey();
        }
    }
}
Better, you get compile-time checking of the instance you're adding to the new list -- this doesn't work:
            theList.Add(new { Foo = "ringo" });    // fail ....
And using a similar extension method you can create a dictionary, choosing a certain property as the key:
            var theDict = seedArray.ToDictionary(x => x.Foo);
            theDict["ringo"] = new {Foo = "ringo", Bar = "zingo"};
            theDict.Keys.ToList().ForEach(x => Console.WriteLine(theDict[x]));
after doing which you end up with a dictionary mapping your key property to instances of the anonymous type. This is a bit fiddly -- having to specify the keyname twice -- but if you have an anonymous instance already created then you can just use the right key property explicitly.

Wednesday 24 August 2011

Nice LINQ Path.Combine trick

It’s a simple thing – but it’s a nice thing, instead of multiple explicit calls to Path.Combine:

using System;
using System.Linq;

namespace PathAggregation
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine(
		AggregatePath("some", "interesting", "path")
            );
            
	    Console.ReadKey();
        }

        static string AggregatePath(params string[] pathParts)
        {
            return pathParts.Aggregate(System.IO.Path.Combine);
        }
    }
}

Wednesday 17 August 2011

System.BadImageFormatException with NUnit on 64bit Windows

This wasted a couple of hours until inspiration struck. In this case it was very useful to have a standalone app that ran the code through its paces without running unit tests.

I’d been developing tests for an x86-targeted library. The tests were fine on our 32bit workstations, but broke when we ran on our 64 bit build machine.

System.BadImageFormatException : Could not load file or assembly ‘XXX, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null' or one of its dependencies. An attempt was made to load a program with an incorrect format. 

Tests run on dev workstations went fine.

peverify for the assembly indicated that it was fine.

A simple app that just called LoadLibrary for the assembly indicated it was fine.

A sophisticated proprietary test app that ran the code through its paces behaved as expected.

Build machine was 64 bit OS, compared to 32 bit workstations. I assumed – because of the above – that the library was fine, so I couldn’t figure out why the test was failing. I couldn’t hit the right search on Google for it until it struck me that of course – it was just the nunit tests that were failing. Searching for:

nunit badimageformatexception 64 bit

immediately brought up a wealth of information. On 64 bit machines you need to run the specific x86-targeted nunit apps to test x86-targeted assemblies.

http://stackoverflow.com/questions/208985/nunit-exe-cannot-work-on-vista-64bits-if-x86-build

Tuesday 16 August 2011

Today’s subtle waste of time

I hit a wrinkle with a TeamCity build today – the build was just hanging, unexplained.

TC was actually quite useful, showing that the build agent was running a script which had then spun up (basically)

rundll.exe shell32,openas_rundll some_file.ico

and it took me a little while rooting around to work out that something was opening the “Open as … “ dialog for the file some_file.ico. No wonder the build was hanging – there was no-one to acknowledge the dialog box.

I could not figure out why it was being spun up in the first place. The build was running a dos script that was simply zipping up a set of files, some_file.ico being one of them.

7za.exe <some args>^
second_file.dll^
third_file.dll^ 
some_file.ico^
some_other_file.dll^
...

That all seemed fine – and yet it’s almost as if some_file.ico was being opened, not appended to the zip that 7za was creating.

The trick lies in the line before. See it?

The build script running it had ‘^ ‘ at the end of a line instead of ‘^’.

(Frustratingly 7za actually thinks it has finished at that point and writes out “Everything is Ok”.)

Tuesday 9 August 2011

Tools for diagnosing library load problems

Today I had one of those days where you engage in an extended exercise of hide and seek trying to work out why it works on our machines but not on yours, Yes, I was hunting dependencies for a software deployment.

As is often the case, today the dependencies were hidden underneath some third party code and that third party code was using LoadLibrary to load dependencies at runtime.

Furthermore we wipe out and reset the path – to include certain third party libraries – with a cmd we use to open our VS solution. If you happen to forget that for a moment then tracking down dependencies becomes that little bit harder.

Now, in general it’s hard to give a one-size-fits-all process for tracking down library dependencies (Yes, Tess had the right idea with sets of debugging tutorials). On the other hand, a couple of tools didn’t quite spring to mind as quick as they should have, and some were unknown to members of the team I’m on, so I thought I might write myself a list of tools useful in diagnosing these problems.

Visual Studio In debug, use the Immediate Window to run

System.Environment.GetEnvironmentVariable(“PATH”)

to work out whether your path is being managed unexepectedly, and

System.IO.Directory.GetCurrentDirectory()

to work out where you are.

You can drag a module/dll onto VS and open up the manifest as well.
Fuslogvw Helps you monitor managed assembly binding – and tells you where the runtime is trying to load assemblies from.
Depends Dependency walker determines the (native) dependency tree of a module, and tells you if anything’s missing. If there’s an hourglass next to a dependency it means it’s late loaded – and may not be a problem.
Windbg If you run an executable in windbg (Open Executable, ‘g’ to go) it’ll tell you which modules it loads, and where from.

To debug LoadLibrary fails:

x MyModule!*LoadLib*

locates symbols in MyModule that reference LoadLib.

bu kernel32!LoadLibraryExW ".echo LoadLibraryExW for ->; du dwo(@esp+4); g"

sets a breakpoint for LoadLibrary calls and displays the name of the library that’s meant to be loaded.

These examples are drawn from this page of common windbg commands, which is a good one to have bookmarked.
Process Explorer Can show you which modules have been loaded by a still-running process, and from where it loaded them.

Frequently Depends.exe says that ieshims.dll and wer.dll are missing – if I recall correctly this is associated with an upgrade to Internet Explorer 8, and can basically be discounted in most situations (google search for ieshims and wer.dll should clarify this).

If Depends.exe shows up missing MSVC*{80,90,100}.dll then you’re missing a Visual C++ runtime redistributable. If you’re missing one titled DebugCRT then you’ll probably need to rebuild, as these versions are only available on machines with VS installed. Remember that you shouldn;t assume that these redistributables are installed on machines in a cluster – you might need to package them up alongside your app code to make sure they get out there at deployment time.

Thursday 4 August 2011

Getting to github

Remember – if you have a git branch you’ve been committing to locally then all you have to do to get it to github – assuming you’ve already registered with github etc – is:

1. Log on to github on the web and create a suitable repository – like Alembic.Tools.

2. Add a remote, and push:

git remote add origin git@github.com:timbarrass/Alembic.Tools.git
git push origin master

Thursday 28 July 2011

Azure: pretty expensive for web hosting

I can almost hear the response now: No! Really? Anyhow …

I thought I’d take a look at prices for hosting a small website on random vanilla web hosts, and for the same on Azure and Amazon.

Random web hosts appear to sit around £2.49 a month for a small website, at least to start with.

An Azure Extra Small instance costs $0.05 per deployed hour – or a whopping $36 a month. You get an order of magnitude more storage at 20GB, but still .. I’m unlikely to need it.

Interestingly I could also dump the whole site into Azure storage. At $0.15 per GB per hour, with transaction fees. I don’t know for sure, but am pretty sure it’s not pro-rata-d for usage less than a GB.

For comparison an Amazon Micro instance costs $0.03 an hour, or about $22 a month.

Or, I guess I could host something at home from a dynamic IP, which probably fits somewhere in the middle in terms of cost.

At any rate, at 10 times the cost Azure doesn’t look like a rational option.

Wednesday 27 July 2011

Constructing a task occupancy time series from discrete task start and end times

When finding my way around a new distributed system I often want to see a chart of the number of tasks running on a system at the same time but only have discrete task start and end time data. Even if time series data already exists, it’s often not broken down in a meaningful way – say by task type, or final status.

So: how do I build a time series like this from a set of discrete start and end times?

I tackle this in three steps. First, I build and sort lists of start and end times. Secondly, I process these two lists to get a task event dataset: every start takes a running count of tasks up, and every end takes it down. Finally I process that event dataset to build a time series based on discrete time points.

Building sorted lists of task start and end times is pretty straightforward. I typically dump start and end time data from the database, which might look like this:

2011-07-19 04:02:45.0730000,2011-07-19 04:03:21.6000000
2011-07-19 04:02:45.0730000,2011-07-19 04:03:44.7030000
2011-07-18 22:58:52.1800000,2011-07-18 22:58:52.6670000
2011-07-18 22:58:53.3500000,2011-07-18 22:58:57.7700000
2011-07-18 22:58:52.1800000,2011-07-18 22:58:52.7200000
2011-07-18 22:58:53.3500000,2011-07-18 22:58:58.4030000
2011-07-18 22:58:52.1800000,2011-07-18 22:58:52.7100000
2011-07-18 22:58:53.3500000,2011-07-18 22:58:57.7700000
2011-07-18 22:58:52.1800000,2011-07-18 22:58:52.7630000
2011-07-18 22:58:53.3500000,2011-07-18 22:58:57.7700000

and then use the excellent FileHelpers library to load them in. I make some assumptions about the data – the earliest start is earlier than the first and point, at least.

After that, sort the lists in place in ascending time order.

var starts = new List<DateTime>();
var ends = new List<DateTime>();
foreach(var task in tasks)
{
	starts.Add(task.start);
	ends.Add(task.end);
}

starts.Sort((a, b) => 
	Convert.ToInt32(new TimeSpan(a.Ticks - b.Ticks).TotalMilliseconds));
ends.Sort((a, b) => 
	Convert.ToInt32(new TimeSpan(a.Ticks - b.Ticks).TotalMilliseconds));

Merging the two lists to create an event dataset is a little trickier. I do it by processing either the start or the end list in order at any time, swapping between them when the timestamps in one jump ahead of the timestamps in the other. I keep a running count of tasks (the occupancy) – every start takes it up, and every end takes it down.

int currentIndex = 0;
int otherIndex = 0;
int runningCount = 0;
var currentList = starts;
var otherList = ends;
var incdec = 1;

var timestamp = new List<DateTime>();
var value = new List<int>();

while (currentIndex < currentList.Count - 1)
{
	runningCount += incdec;
	timestamp.Add(currentList[currentIndex]);
	value.Add(runningCount);

	currentIndex++;
	if (currentList[currentIndex] > otherList[otherIndex])
	{
		var tempIndex = currentIndex;
		currentIndex = otherIndex;
		otherIndex = tempIndex;
		var tempList = currentList;
		currentList = otherList;
		otherList = tempList;

		incdec *= -1;
	}
}

This generates a relative time series, with offsets from the number of tasks running at the start of the dataset. To constrain this I’d need to work out what that initial number of tasks is. Generally I’d count the number of tasks that started but didn’t end before my time series starts.

Finally, create a discrete timeseries with evenly spaced points. I just generate a list of timestamp points and for each take the nearest previous recorded occupancy value. Generally I’ll also take a highwater mark, as otherwise you lose a feel of any peaky behaviour within between points.

var index = 0;
var dt = new TimeSpan(0, 5, 0);
var current = timestamp[0];

while (current < timestamp[timestamp.Count - 2])
{
	var next = current.AddTicks(dt.Ticks);

	var highwater = 0;
	while (index < timestamp.Count - 1 
		&& timestamp[index + 1] < next)
	{
		index++;
		if (highwater < value[index]) 
			highwater = value[index];
	}

	Console.WriteLine(current + "\t" + value[index] + "\t" + highwater);
	current = next;
}

You’ll notice this is quite loose; I’m not really worrying about interpolating to get values at exact time series points, and so on.

And that’s basically it. When building your series you can fold in extra information – about the task type for example – and use that to show how the system treats different task types.

Tuesday 26 July 2011

Fakes and mocks, and swapping between them

While working on some existing tests – they were building some chunky classes that I needed for some new tests -- it took me a while to spot that program to an interface applies just as much in tests as in real code, and that any objects used – and more particularly reused – in tests should be instantiated and configured in test setup, not in each test.

But then I thought – the whole way of doing testing brings this on us. To test, we need a) provide instances that implement interface a that the classes under test use, and b) provide instances whose behaviour we can modify and assert arbitrarily.

The second point means that it gets hard to combine different sorts of fake, mock and stub in shared non-trivial structures. As soon as you share, you’re typically constrained to using common interfaces in your shared structure – the same required by the classes under test.That immediately breaks all tests, because all tests need to know about some decorating test interface to modify and assert behaviour.

It seems that even if you have a Builder-like mechanism that fully configures instances of either fakes or mocks or whatever for you – you still need something to assert that the desired behaviour was seen.

Sometimes, you might want to peek under the hood

So, another day, another thing unlearned.

Today, I faced a method that wasn’t under test. A private method. A big private method that was buried under a couple of layers of indirection. I could see at least one refactoring that would make it easier to manage – but first I wanted to get it under test.

The test I wrote is straightforward but does pretty much what I decided I wasn’t going to do in the last post: inject stuff to peek under the hood. I wanted to get a handle on the implementation.

Difference? This is legacy code. I want to characterise the behaviour in a test. Enough behaviour to make sure that any refactoring I do isn’t changing that behaviour. I want enough points of reference so that when I refactor the assertions I make on currently private behaviour become assertions on public behaviour.

Hmm: determining whether you should use existing implementation behaviour as a way of guiding your refactoring is probably worth a discussion in itself.

Anyhow: in my previous post I was talking about new code. Code where your tests are driving the development, and you don’t need to build up monster private methods, and you can test behaviour using public interfaces, happy in the knowledge that you can ignore any private implementation details.

In the meantime, I think I’m going to carry on examining implementation behaviour if it gives me some confidence in refactoring chunky legacy code.

Wednesday 13 July 2011

TDD tests aren’t unit tests; don’t be tempted to peek under the hood

Realise that to simply focus-test methods in isolation is wrong. The problem is, you probably don’t realise you’re doing it.

First of all, the assumption of isolation is a fallacy – you can’t just call a method; typically you have to call a constructor as well, at least. Kevlin Henney expresses this well.

Second, it encourages you to add backdoors to implementation so that you can bring an instance up to testable state:

public class CuriousList
{
    private IList<string> _underlying;

    public CuriousList(IList<string> underlying)
    {
        // Here the natural relationship between CuriousList and _underlying
        // is of composition, not aggregation. It's an implementation detail.
        // There's no need for me to be injecting it, and it exposes internal
        // state ...
        _underlying = underlying;
    }

    public void Add(string newString)
    {
        _underlying.Add(newString);
    }

    public void RuinListElements()
    {
        for(int i = _underlying.Count - 1; i >= 0; i--)
        {
            _underlying[i] = i.ToString();
        }
    }
}

When it comes to testing RuinListElements(), I might be happy to inject the underlying collection just so I could add some reasonable state to act on. Like this:

    
[TestFixture]
public class CuriousListTests
{
    [Test]
    public void CuriousList_AfterAnElementIsAdded_CanRuinIt()
    {
        var u = new List<string>() { "Hello" };

        var curiousList = new CuriousList(u);

        curiousList.RuinListElements();

        Assert.AreEqual("0", u[0]);
    }
}

Tempting as it is for tests, it leaves a big backdoor into what your object considers private state. Even if you add a helpful comment saying “this injection ctor to be used for testing only” it’s asking for trouble; you might mislead yourself by using it to reconstruct state that can’t be arrived at by your public methods. Chris Oldwood talks about something similar.

Avoid using injection for compositional relationships; only use it for aggregation. 

Instead, use public methods that bring the object neatly into the state you need in order to test what you want to test. Don’t be concerned that those same public methods are being tested in other tests. You need all your tests to be passing in any case; you don’t gain much by uncoupling individual tests.

Afterwards, use public methods to test the object’s state. Don’t peek under the hood to see what happened. Naturally you can query the state of any mocks you used; that’s an aggregational relationship and what they’re designed for.

Remember that you might well end up with tests that map to single functions – especially for functions that don’t change state, or move the object from no-state to some-state. The tests will appear to be the same as those produced by simply focus-testing method, but the intent is quite different.

All this frees you up to think about testing behaviour, not testing methods. A specific behaviour of your class will often, if not always, be an aggregate of method calls.

Tuesday 5 July 2011

Comparing SQLServer schema -- using Visual Studio

This is so darned useful I just had to make a note to remind myself of it. One of the trickiest things to automate is database continuous integration. In many cases you want to check that a from-scratch deployment of your DB looks the same as a DB created by incrementally applying changes to some production DB. One way to do that is to do both, and compare schema. It turns out that Visual Studio can be of great help to you in doing that.

Friday 4 March 2011

Comparing fluent interfaces for guard clauses [part 4]

In the last post on fluent guard checks I showed an interface that aggregates state, removing the trapdoor effect associated with multiple guards. It works, but might in some environments be an unnecessary contributor to memory pressure.

Performance-tweaked fluent chained guards

Rick Brewster came up with an alternative syntax that only instantiates a state object on the first fail. Otherwise, it just passes null. Again, the guard checks are extension methods that hang off the state object – even (I didn’t realise this worked) if the state object is null:

Validate.Begin()
    .IsNotNull(message, "message")
    .SomeOtherCondition(message, "message")
    .Check();

The state object looks like (in this case, a simple equivalent – Rick’s is more sophisticated):

public class Validation
{
    private _exceptions = new List<exception>(1);
    public List<exception> Exceptions
    {
        get { return _exceptions; }
}

Validate.Begin() looks like:

public static Validation Begin()
{
    return null;
}

An example guard check looks like:

public static Validation IsNotNull(
    this Validation state, 
    object value, 
    string name)
{
    if(value == null)
    {
        if(state == null)
        {
            state = new Validation();
        }
        state.Exceptions
            .Add(new ArgumentNullException(name));
    }
    return state;
}

Check() just checks for any exceptions, and throws a ValidationException if there are any, which wraps the Exception list -- just like in part 3.

So, because the argument value and name aren't in the state object they need to be explicitly passed into each guard check; the check code is also more complicated because it does more than one thing.

This makes the syntax a little less clear to me, with a would-be-good-if-it-were-redundant Validate.Begin() always starting a guard chain, and with argument value and name sometimes appearing repeatedly.

Roundup

So, the decision about whether to avoid creating an unnecessary state object determines to some extent how clear that syntax is. Deciding whether to break fast or accumulate guard state just means that you might need to have something to terminate the chain.

Whether you really need to avoid the state object creation is really app-specific; you’d need to think and test. I’ve hit the need for chained guard checks a number of times though, so I’d definitely look to implement that. My ideal solution is somewhere between the two.

Also interesting: www.bikeshed.org.

Comparing fluent interfaces for guard clauses [part 3]

A fluent interface helps simplify the expression of multiple guard checks, but doesn’t avoid the trapdoor effect in which the first guard fail causes a break – because none of the guard state is propagated from check to check.

Aggregating state

To avoid the trapdoor problem and aggregate the guard result state as it’s passed down the chain, then break if needed in some chain terminator.

The state object could just contain a list of exceptions to add to:

public class GuardState
{
    private _exceptions = new List(1);
    public List Exceptions
    {
        get { return _exceptions; }
    }

    public object Value { get; set; }
    public string Name { get; set; }
}

Typical guards would then just add an exception to the list and pass it on:

public static GuardState IsNotNull(
    this GuardState state)
{
    if(state.Value == null)
    {
        state.Exceptions
            .Add(new ArgumentNullException(state.Name));
    }
    return state;
}

Finally, we need to terminate the guard chain and take action:

public static void Check(this GuardState state)
{
    if(state.Exceptions.Count > 0)
    {
    	var ex = new GuardStateException(state.Exceptions);
	throw ex;
    }
}

This means you can now gather all useful state in one go before throwing, which then means you have a chance of solving multiple problems before releasing a fix. This is a rough draft – for example you might want to only instantiate the exception list on the first fail to limit memory usage – but it’s usable.

If it’s used in a high throughput environment memory pressure might be important, even though the guard state objects wouldn’t make it out of gen0. Rick Brewster came up with a syntax that sacrifices a little clarity to tweak the memory use of his guard interface – which I’ll talk about briefly in the next post.

Comparing fluent interfaces for guard clauses [part 2]

So Microsoft provide a Guard class that simplifies guard checks. It has a simple interface, but is a bit restricted and can look a bit clunky if you need to do multiple checks -- assuming you've extended the Guard class:

Guard.ArgumentNotNull(message, "message");
Guard.Contains("XYZ", message, "message");
// ...

We can simplify these chains a lot by using a fluent interface.

Fluently chaining guards

A fluent interface can be used to chain checks. It’d look something like this:

Guard.Argument(message, "message")
    .IsNotNull()
    .SomeOtherCondition();
Guard.Argument(count, "count")
    .IsPositive();

The guard checks are extension methods hanging off the underlying state object that’s passed from check to check. The state object is populated by Argument(…), and holds the argument value and name:

public class GuardState
{
    public object Value { get; set; }
    public string Name { get; set; }
}

and an example guard check looks like:

public static void IsNotNull(this GuardState state)
{
    if(state.Value == null) 
    { 
        throw new ArgumentNullException(state.Name);
    }
}

So with this technique I could simplify chained guard checks, but the code would break on the first guard fail. CuttingEdge.Conditions is a freely-available library that does much the same thing, in, I’d guess, the same way.

In some sense there’s not much difference between using this, and just using repeated calls to specific IsNotNull(value, “name”)  methods. To make it more useful, we’d need to start aggregating state and avoiding the guard trapdoor effect.

Comparing fluent interfaces for guard clauses [part 1]

Urgh, not again:

public void SomeMethod(string message)
{
    if (message == null)
    {
        throw new ArgumentNullException("message");
    }
}

Guard clauses are clunky. They’re also like a series of trap doors – you break on the first guard, patch and release only to find that you break on the next.

For the clunkiness, we could Extract Method, making things clearer:

public void SomeMethod(string message)
{
    IsNotNull(message, "message");
}

public void IsNotNull(string argumentValue, string argumentName)
{
    if (message == null)
    {
        throw new ArgumentNullException(argumentName);
    }
}

Eventually we might collect a few and gather them under a static class, as they're not state-dependent:

Guard.IsNotNull(message, "message");

And helpfully, there is a lightweight simplifying Guard class in the Microsoft.Practices.Unity app block: the Guard static class provides a series of methods for basic guard checks.

Guard.ArgumentNotNull(message, “message”);

The Guard class only provides a limited range of checks – although it’s partial so could be extended. The interface is pretty simple though, and provides a good starting point.

However, if you need to make a series of guard checks the interface gets cluttered with a series of Guard references and reused arguments. That can be made a bit clearer with a simple fluent interface.

Thursday 24 February 2011

A simple paged IList, arrays of doubles and the LOH, part 3

In these posts I looked at creating a paged IList after finding that large arrays of doubles are stored on the LOH rather than the normal heap, and gave a simple first-cut implementation. Here I thought I’d compare the add-item behaviour with a comparable List<double>.

just-adding-amortized-perf-results-1

There’s a lot of data here but there’s a couple of things I want to draw out.

Where the page size is large compared to the array size, performance is comparable to List<double>. Conversely, small page sizes are really not suited to large arrays, even though the PagedList has an advantage in that it doesn’t need to recopy all the data, as a List would.

I want is for this chart to be flat, with similar performance for all PageList total sizes. I want to knock out that big peak to the left hand side and make it perform more consistently.

One way to handle this is to follow List’s lead an allow the amount of capacity added to vary, doubling in size every time it fills, achieving additions in constant amortized time. PagedList can do the same by doubling the pageSize every time the last page is filled.

The fundamental change comes in the AddPage method, where I double the size of _pageSize after adding each new page.

private void AddPage()
{
    var t = new T[_pageSize];
    _underlyingList.Add(t);
    _currentPage++;
    _currentPageHighwatermark = 0;
    CurrentPage = t;
    _pageSize *= 2;
}

A new constructor lets the PagedList be instantiated with a default page size of 4.

private const int DefaultPageSize = 4;
private int _pageSize = DefaultPageSize;

public PagedList(int initialPageSize)
{
    _pageSize = initialPageSize;
    _underlyingList = list;
    _underlyingList.Add(new T[_pageSize]);     
    _currentPage = _underlyingList.Count - 1;  
    CurrentPage = _underlyingList[_currentPage];
}

Various consequences then ripple through the code, leading to changes in Count() and the indexer this[], which needs to map a global index to a page location and page index. I’ve left this as a pretty simple implementation for now:

private PageLocation PageLocationForIndex(int index)
{
    var l = new PageLocation();
    l.desiredPage = 0;

    int lower = 0;
    int upper = 0;
    foreach(var page in _underlyingList)
    {
        upper = lower + page.Length;
        if (index >= lower && index < upper)
        {
            break;
        }
        l.desiredPage++;
        lower = upper;
    }

    l.desiredPageIndex = lower == 0 ? index : index % lower;

    return l;
}

where PageLocation is simple struct holding ints desiredPage and desiredPageIndex.

Looking at the performance then:

just-adding-amortized-perf-results-final-debug-release

That’s significantly better. Even in debug the peak is knocked down from ~10 to ~2 times, and in a release build it’s more comparable to List<double>.

However, I’m a little concerned that the relative times are increasing with array size for the PagedList<double>, which might indicate some O(n) or worse behaviour that needs looking at.

Also, I’m no longer meeting my main requirement; the PagedList<double> will, at some stage, store a double array larger than 1000 elements. I could fix that by capping – but I’ll look at that in a later post.

Wednesday 16 February 2011

A simple paged IList, arrays of doubles and the LOH part 2

So, first of all it might be interesting to check that large double arrays are put on the LOH, in case the internet lies. Bring on CLR Profiler to check allocations.

Tangentially: >=1000? Not >1024?

So, if I allocate a number of double arrays of size = 100 everything allocated on the normal heap (in these CLR Profiler diagrams, System.Double[] allocations are in red):small-double-arrays-objects-by-address

… and with array size = 1000 everything is allocated on the Large Object Heap.1000-double-arrays-objects-by-address

Finally, I can show that PagedList<double>s of size 1000 but a page size of 100 use space on the normal heap, not on the LOH:paged-100pagesize-1000arraysize-oba

Not sure about all those System.Double[][] – that might be something to look into; but with the allocations briefly confirmed, I want to do some very brief performance comparisons in the next post.

Friday 28 January 2011

Mocks and dependencies

On rationalizing class dependencies -- John Sonmez is exploring a number of these ideas at the moment, and is building a useful list of patterns for dealing with common scenarios.

Thursday 27 January 2011

First, check your dependencies; avoid breaking encapsulation

Recently I was bogged down attempting to mock out half the world when testing a method, only to find that a little thought would have made it irrelevant.

You don’t have to mock to isolate – sometimes you just need to understand your dependencies better.

This example is a case in point. I went down a couple of blind alleys because I didn’t consider the coupling between the method under test and other classes. It’s this one:

public void MethodUnderTest()
{
    UtilityClassProvider.Get<UtilityClass>().SomeUsefulMethod();
}

In this abstracted example the problem’s pretty easy to spot. I started mocking a UtilityClassProvider and a UtilityClass, which was all rather more involved than necessary because of the way UtilityClassProvider instantiates a class, and because UtilityClass was sealed, and couldn’t be unsealed. Rhino Mocks complains if you attempt to mock a sealed class, and I couldn’t extract an interface and use that as UtilityClass needs an instantiatable type to create – and so on.

But then -- UtilityClassProvider looks like it’s meant to be a builder. Surely it could be refactored into one, standardising the code a bit too. Refactoring UtilityClassProvider to a builder pattern would be worthwhile, even if it did mean the refactoring started to affect more than the class under test.

Probably best to limit the refactoring to begin with.

I looked at it and realised that this method didn’t depend on UtilityClassProvider at all. At worst, it relies on a UtilityClass, and at best only on the effects of SomeUsefulMethod. The encapsulation-breaking of UtilityClassProvider.Get<UtilityClass>.SomeUsefulMethod() is simply ugly, and let me lead myself up the garden path.

So, in this case I extracted an IUtilityClass property with public accessors, and extracted an IUtilityClass interface from UtilityClass. This shifts the emphasis for creating (and injecting) the IUtilityClass instance to the calling code.

public sealed class UtilityClass : IUtilityClass
{
    public void SomeUsefulMethod()
    {
    }
}

public interface IUtilityClass
{
    void SomeUsefulMethod();
}

public class ClassUnderTest
{
    public IUtilityClass UtilityClass { get; set; }

    public void MethodUnderTest()
    {
        UtilityClass.SomeUsefulMethod();
    }
}

I could have gone a step further and removed the dependency on UtilityClass, and just extracted a method that proxied SomeUsefulMethod instead – but it would have required a call to an IUtilityClass anyway, and going this far removes the more awkward problems associated with the dependency on UtilityClassProvider.

With either solution I’ve removed direct dependencies from the method under test, meaning that the method is a) simpler and b) easier to isolate. The extracted method(s) represent logic that might be better provided by another object or structure – but that’s another step. I might recover a UtilityClassProvider from a container like Unity, for example.

Finally, now I can test the method either with a mock IUtilityClass or an actual UtilityClass, if the class is amenable to it (in this simple example it is; in real life it wasn’t, so mocking helped). I can then inject the mocked IUtilityClass, call MethodUnderTest and voila.

[TestFixture]
public class Tests
{
    [Test]
    public void Test()
    {
        var mockUtilityClass = MockRepository
            .GenerateMock<IUtilityClass>();
        mockUtilityClass
            .Expect(o => o.SomeUsefulMethod());
        
        var cut = new ClassUnderTest();
        cut.UtilityClass = mockUtilityClass;

        cut.MethodUnderTest();

        mockUtilityClass.VerifyAllExpectations();
    }
}

So, the moral is: if you find yourself refactoring half the world just so you can then mock it – check your dependencies first, as you might not be dependent on half of what you think you are.