Thursday 28 July 2011

Azure: pretty expensive for web hosting

I can almost hear the response now: No! Really? Anyhow …

I thought I’d take a look at prices for hosting a small website on random vanilla web hosts, and for the same on Azure and Amazon.

Random web hosts appear to sit around £2.49 a month for a small website, at least to start with.

An Azure Extra Small instance costs $0.05 per deployed hour – or a whopping $36 a month. You get an order of magnitude more storage at 20GB, but still .. I’m unlikely to need it.

Interestingly I could also dump the whole site into Azure storage. At $0.15 per GB per hour, with transaction fees. I don’t know for sure, but am pretty sure it’s not pro-rata-d for usage less than a GB.

For comparison an Amazon Micro instance costs $0.03 an hour, or about $22 a month.

Or, I guess I could host something at home from a dynamic IP, which probably fits somewhere in the middle in terms of cost.

At any rate, at 10 times the cost Azure doesn’t look like a rational option.

Wednesday 27 July 2011

Constructing a task occupancy time series from discrete task start and end times

When finding my way around a new distributed system I often want to see a chart of the number of tasks running on a system at the same time but only have discrete task start and end time data. Even if time series data already exists, it’s often not broken down in a meaningful way – say by task type, or final status.

So: how do I build a time series like this from a set of discrete start and end times?

I tackle this in three steps. First, I build and sort lists of start and end times. Secondly, I process these two lists to get a task event dataset: every start takes a running count of tasks up, and every end takes it down. Finally I process that event dataset to build a time series based on discrete time points.

Building sorted lists of task start and end times is pretty straightforward. I typically dump start and end time data from the database, which might look like this:

2011-07-19 04:02:45.0730000,2011-07-19 04:03:21.6000000
2011-07-19 04:02:45.0730000,2011-07-19 04:03:44.7030000
2011-07-18 22:58:52.1800000,2011-07-18 22:58:52.6670000
2011-07-18 22:58:53.3500000,2011-07-18 22:58:57.7700000
2011-07-18 22:58:52.1800000,2011-07-18 22:58:52.7200000
2011-07-18 22:58:53.3500000,2011-07-18 22:58:58.4030000
2011-07-18 22:58:52.1800000,2011-07-18 22:58:52.7100000
2011-07-18 22:58:53.3500000,2011-07-18 22:58:57.7700000
2011-07-18 22:58:52.1800000,2011-07-18 22:58:52.7630000
2011-07-18 22:58:53.3500000,2011-07-18 22:58:57.7700000

and then use the excellent FileHelpers library to load them in. I make some assumptions about the data – the earliest start is earlier than the first and point, at least.

After that, sort the lists in place in ascending time order.

var starts = new List<DateTime>();
var ends = new List<DateTime>();
foreach(var task in tasks)
{
	starts.Add(task.start);
	ends.Add(task.end);
}

starts.Sort((a, b) => 
	Convert.ToInt32(new TimeSpan(a.Ticks - b.Ticks).TotalMilliseconds));
ends.Sort((a, b) => 
	Convert.ToInt32(new TimeSpan(a.Ticks - b.Ticks).TotalMilliseconds));

Merging the two lists to create an event dataset is a little trickier. I do it by processing either the start or the end list in order at any time, swapping between them when the timestamps in one jump ahead of the timestamps in the other. I keep a running count of tasks (the occupancy) – every start takes it up, and every end takes it down.

int currentIndex = 0;
int otherIndex = 0;
int runningCount = 0;
var currentList = starts;
var otherList = ends;
var incdec = 1;

var timestamp = new List<DateTime>();
var value = new List<int>();

while (currentIndex < currentList.Count - 1)
{
	runningCount += incdec;
	timestamp.Add(currentList[currentIndex]);
	value.Add(runningCount);

	currentIndex++;
	if (currentList[currentIndex] > otherList[otherIndex])
	{
		var tempIndex = currentIndex;
		currentIndex = otherIndex;
		otherIndex = tempIndex;
		var tempList = currentList;
		currentList = otherList;
		otherList = tempList;

		incdec *= -1;
	}
}

This generates a relative time series, with offsets from the number of tasks running at the start of the dataset. To constrain this I’d need to work out what that initial number of tasks is. Generally I’d count the number of tasks that started but didn’t end before my time series starts.

Finally, create a discrete timeseries with evenly spaced points. I just generate a list of timestamp points and for each take the nearest previous recorded occupancy value. Generally I’ll also take a highwater mark, as otherwise you lose a feel of any peaky behaviour within between points.

var index = 0;
var dt = new TimeSpan(0, 5, 0);
var current = timestamp[0];

while (current < timestamp[timestamp.Count - 2])
{
	var next = current.AddTicks(dt.Ticks);

	var highwater = 0;
	while (index < timestamp.Count - 1 
		&& timestamp[index + 1] < next)
	{
		index++;
		if (highwater < value[index]) 
			highwater = value[index];
	}

	Console.WriteLine(current + "\t" + value[index] + "\t" + highwater);
	current = next;
}

You’ll notice this is quite loose; I’m not really worrying about interpolating to get values at exact time series points, and so on.

And that’s basically it. When building your series you can fold in extra information – about the task type for example – and use that to show how the system treats different task types.

Tuesday 26 July 2011

Fakes and mocks, and swapping between them

While working on some existing tests – they were building some chunky classes that I needed for some new tests -- it took me a while to spot that program to an interface applies just as much in tests as in real code, and that any objects used – and more particularly reused – in tests should be instantiated and configured in test setup, not in each test.

But then I thought – the whole way of doing testing brings this on us. To test, we need a) provide instances that implement interface a that the classes under test use, and b) provide instances whose behaviour we can modify and assert arbitrarily.

The second point means that it gets hard to combine different sorts of fake, mock and stub in shared non-trivial structures. As soon as you share, you’re typically constrained to using common interfaces in your shared structure – the same required by the classes under test.That immediately breaks all tests, because all tests need to know about some decorating test interface to modify and assert behaviour.

It seems that even if you have a Builder-like mechanism that fully configures instances of either fakes or mocks or whatever for you – you still need something to assert that the desired behaviour was seen.

Sometimes, you might want to peek under the hood

So, another day, another thing unlearned.

Today, I faced a method that wasn’t under test. A private method. A big private method that was buried under a couple of layers of indirection. I could see at least one refactoring that would make it easier to manage – but first I wanted to get it under test.

The test I wrote is straightforward but does pretty much what I decided I wasn’t going to do in the last post: inject stuff to peek under the hood. I wanted to get a handle on the implementation.

Difference? This is legacy code. I want to characterise the behaviour in a test. Enough behaviour to make sure that any refactoring I do isn’t changing that behaviour. I want enough points of reference so that when I refactor the assertions I make on currently private behaviour become assertions on public behaviour.

Hmm: determining whether you should use existing implementation behaviour as a way of guiding your refactoring is probably worth a discussion in itself.

Anyhow: in my previous post I was talking about new code. Code where your tests are driving the development, and you don’t need to build up monster private methods, and you can test behaviour using public interfaces, happy in the knowledge that you can ignore any private implementation details.

In the meantime, I think I’m going to carry on examining implementation behaviour if it gives me some confidence in refactoring chunky legacy code.

Wednesday 13 July 2011

TDD tests aren’t unit tests; don’t be tempted to peek under the hood

Realise that to simply focus-test methods in isolation is wrong. The problem is, you probably don’t realise you’re doing it.

First of all, the assumption of isolation is a fallacy – you can’t just call a method; typically you have to call a constructor as well, at least. Kevlin Henney expresses this well.

Second, it encourages you to add backdoors to implementation so that you can bring an instance up to testable state:

public class CuriousList
{
    private IList<string> _underlying;

    public CuriousList(IList<string> underlying)
    {
        // Here the natural relationship between CuriousList and _underlying
        // is of composition, not aggregation. It's an implementation detail.
        // There's no need for me to be injecting it, and it exposes internal
        // state ...
        _underlying = underlying;
    }

    public void Add(string newString)
    {
        _underlying.Add(newString);
    }

    public void RuinListElements()
    {
        for(int i = _underlying.Count - 1; i >= 0; i--)
        {
            _underlying[i] = i.ToString();
        }
    }
}

When it comes to testing RuinListElements(), I might be happy to inject the underlying collection just so I could add some reasonable state to act on. Like this:

    
[TestFixture]
public class CuriousListTests
{
    [Test]
    public void CuriousList_AfterAnElementIsAdded_CanRuinIt()
    {
        var u = new List<string>() { "Hello" };

        var curiousList = new CuriousList(u);

        curiousList.RuinListElements();

        Assert.AreEqual("0", u[0]);
    }
}

Tempting as it is for tests, it leaves a big backdoor into what your object considers private state. Even if you add a helpful comment saying “this injection ctor to be used for testing only” it’s asking for trouble; you might mislead yourself by using it to reconstruct state that can’t be arrived at by your public methods. Chris Oldwood talks about something similar.

Avoid using injection for compositional relationships; only use it for aggregation. 

Instead, use public methods that bring the object neatly into the state you need in order to test what you want to test. Don’t be concerned that those same public methods are being tested in other tests. You need all your tests to be passing in any case; you don’t gain much by uncoupling individual tests.

Afterwards, use public methods to test the object’s state. Don’t peek under the hood to see what happened. Naturally you can query the state of any mocks you used; that’s an aggregational relationship and what they’re designed for.

Remember that you might well end up with tests that map to single functions – especially for functions that don’t change state, or move the object from no-state to some-state. The tests will appear to be the same as those produced by simply focus-testing method, but the intent is quite different.

All this frees you up to think about testing behaviour, not testing methods. A specific behaviour of your class will often, if not always, be an aggregate of method calls.

Tuesday 5 July 2011

Comparing SQLServer schema -- using Visual Studio

This is so darned useful I just had to make a note to remind myself of it. One of the trickiest things to automate is database continuous integration. In many cases you want to check that a from-scratch deployment of your DB looks the same as a DB created by incrementally applying changes to some production DB. One way to do that is to do both, and compare schema. It turns out that Visual Studio can be of great help to you in doing that.