December 31, 2004
TDD Pattern: Do not cross boundaries
(Note: This was originally drafted on 11/18/2003! I am finishing up old drafts for the new year)
TDD is fundamentally about writing tests (or 'checked examples' as Brian Marick calls them) before you write the code. One of the benefits of this is that the resultant code is decoupled from the rest of the system. This becomes particularly important when the code is near a boundary (e.g. acesses a database, a queue, another system, or even an ordinary class if that class is "outside" the area your trying to work with or are responsible for).
However this benefit doesn't just happen, you have to want to find a way to structure the code so that it can be tested without resorting to resources beyond the test.
I get asked about how to do this a lot when working with people learning TDD, so I thought I'd write a bit about how you can meaningfully test at these boundaries.
Code written test-last (or test-not-at-all) tends to have this kind of dependency pattern:
a.methodOne creates a reference to class b and calls b.methodTwo which creates and calls c.methodThree -- which then accesses some external system:
a.1 -> b.2 -> c.3 -> external system.
However, TDD'd code tends to look like this:
methodA <- passed real object b <- constructed with fake object standing in for c
Its often difficult for people new to TDD to find ways to avoid having to use the actual external system (e.g. they want to use a test database). The difficulty usually manifests as:
"I can't see any way to write this code without creating the reference to the accessing object -- which won't work without the other system there."
or:
"I can see that I could mock or otherwise stub the interface, but then the test becomes meaningless."
In either case, the code they envision (and sometimes have already written) has this form (in this case a database call written in Universal Pseudo Java (UPJ)):
void UpdateCustomer(data) {
db = new Database("SystemName")
cmd = db.CreateCommand("ModifyCustomer")
cmd.SetParam("name", data.Name)
cmd.SetParam("address", data.Address)
...
cmd.Execute()
}
And so they try to write a test like this:
void TestUpdateCustomer(data) {
data.Name = "John Smith"
data.Address = "123 Main St." //new address
obj.UpdateCustomer(data)
results = cmd.Execute("SELECT * FROM Customer WHERE name = 'John Smith'")
AssertEquals("123 Main St.", results["address"]);
}
And the first thing they say is: "How can I do that without the other system?" And they procede to write setup code that populates a copy of the database, and generally create a test maintainence nightmare.
Now I know what your thinking: "Yeah, Bill this is old hat (even back when I wrote this in 2003). Your going to say that they should stub out the database call, and populate a 'mock database' with some test data."
The problem with this approach, is that you end up spending a lot of time creating and maintaining a simulated database. Give it the ability to set data, query data, and respond to different queries, and pretty soon you are writing your own In Memory DB.
Using a real IMDB is an option that some use, but I generally prefer a different solution -- and thus a different design (remember TDD is about design, that's why the tests go first).
The real goal of our code is to call a stored procedure on a database -- not to make sure that the database gets updated (that presumably is the job of the stored procedure). So let's start with a test that is concerned with what we really want to test: that our code correctly configures and calls the SP:
void UpdateCustomerCallsAppropriateStoredProcedureWithProperParameters() {
Customer data = new Customer()
data.Name = "John Smith"
data.Address = "123 Main St." //new address
FakeDatabase db = new FakeDatabase() //FakeDatabase implements IDatabase
obj.UpdateCustomer(db, data)
Assert(db.LastCommand != null)
Assert(db.LastCommand.Type = StoredProcedure)
Assert(db.LastCommand.Text = "ModifyCustomer")
Assert(db.LastCommand.Params["name"] = "John Smith")
Assert(db.LastCommand.Params["address"] = "123 Main St.")
}
(Note: The long ugly name serves as documentation. In fact, run a tool like TestDox on your tests and the name will be documentation).
Now our Update method takes a reference to its data accessor instead of creating it (this is the familiar IOC/Dependency Injection pattern).
This is a small change in our design, but a huge change in our test. Now that the test can control what db the call is made on, we can dispense with looking at the results of our database call, and focus our attention on what our code is actually doing.
The benefit of this small change is that our Fake Database doesn't need much logic at all now; it just needs to provide a way to get at the last Command, and override the 'Execute' method so that LastCommand is null if its not called:
class FakeDatabase : IDatabase {
Command LastCommand { return _cmd }
void Execute(cmd) {
_cmd = cmd
}
}
This is much easier to work with than a "Mock" that manages data. (a similar approach can do away with the need for an Object Mother that creates a whole structure of domain objects that you wish to test against).
Incidentally, I've been playing a bit fast and loose with Fake vs Mock here. I tend to favor the view that Mocks are fancy Fakes that make internal assertions. Others make further distinctions between stubs, fakes and Mocks. For the purposes of this example, we have a fake object that allows us to query state.
Other mocking approaches allow us to query behavior. Our test could also look something like this:
void UpdateCustomerCallsAppropriateStoredProcedureWithProperParameters() {
Customer data = new Customer()
data.Name = "John Smith"
data.Address = "123 Main St." //new address
FakeDatabase db = new FakeDatabase() //FakeDatabase implements IDatabase
db.Expect("SetCommand", data.Name)
db.Expect("SetCommand", data.Address)
db.Expect("Execute")
obj.UpdateCustomer(db, data)
db.Verify()
}
Now we've described our expectations in terms of what UpdateCustomer will do rather than describing the resulting state of those actions.
There are libraries available for some languages that allow you to generate mocks that let you query behavior like this (e.g. JMock and NMock both do this).
Regardless, on which mocking approach/philosophy/religion, you decide (in practice I use both), you can use either to test your code without having to cross a boundary to sense the results.
Database access code is just one kind of boundary coding problem, but the above pattern generally holds -- you can either mock the external facing interface (as above) or wrap it, if it doesn't provide a convenient interface (Michael's book has some great examples of how to do this).
A remaining objection you might have: "OK, I see how that might work -- but I don't want my caller to have to pass in the wrapper reference, I don't want my callers to have the ability to make that decision." It's not an uncommon requirement, and you have several options.
For example, you can wrap this call with an external call, then pass the reference to your 'real' method. Your clients then call the wrapping call:
void UpdateCustomer(data) {
UpdateCustomerImpl(data, new RealDatabase("SomeSystem")
}
You then test the Impl method as we did the wrapper method before. Check out the earlier Dependency Injection link for other ideas and patterns (and Michael's book -- I am not getting a kick back, its just that I am reading it right now, and it really is relevant, which is why I chose this entry to de-draft first).
In short, being sensitive to boundaries (at whatever level of abstraction) can help us identify those times when we might want to start looking for ways to stub/fake/mock/simulate. Changing our thinking from testing the result of the call, to finding out whether the call is correctly formatted/executed makes testing in isolation easier. The result is a piece of code tested without crossing the boundary -- and not coincidentally code that's highly decoupled, cohesive, and programmed to an interface. Another example of how the goals of TDD encourage us to write code that is well-designed.
Posted by wcaputo at December 31, 2004 05:29 PM
Nice. You explained very clearly points I've been trying to explain to a client for a month. I've sent the link to them, hope they read it.
How do we pound home the point that TDD is about design not validation?!?
Posted by: Curtis Cooley at January 5, 2005 11:03 AMI find that my effectiveness in teaching others is inversely proportional to the amount of time I spend working with them, and directly proportional to their own desire to learn.
I can only (somewhat) control the former. They control the latter. No one controls the outcome.
TDD is a practice. To learn a practice you have to practice it.
Or as Thich Nhat Hanh says: "If you want to garden, you have to bend down and touch the soil. Gardening is a practice, not an idea."
So, I practice TDD, and I invite others to do so. That's how I 'pound' home the point that TDD is about design.
Posted by: Bill at January 8, 2005 12:38 PMGreat post Bill. I like the discussion on questions people have while learning. Parallels my experience fairly closely.
Posted by: Wayne Allen at February 17, 2005 05:17 PMThanks Wayne. This has been coming up a lot recently in my next engagements too. Funny how patterns emerge in lots of different places at the same time.
Posted by: bill at February 23, 2005 01:38 PMNice post Bill. You described exactly my experience with TDD and database. Hopefully (for me) I came at your same conclusion :)
Anyway, for some simple application, I dont dislike the use of a memory database (like HSQLDB).
Isnt one of the Agile mantra: "do the simple thing..."?
Posted by: Marco Trincardi at March 8, 2005 12:29 PMHi Marco, good to hear from you again!
I have nothing against IMDB's, especially for integration-level testing. But even for so called TDD style unit tests, I wouldn't necessarily rule it out if it wasn't painful. However, IME (which granted is mostly with larger systems) This approach may be easier initially, but it creates a tight coupling in the data layer that I find harder to work with after a few dozen tests. Perhaps more importantly, I find the above mentioned approach simpler to do, and get started than using any DB, so there's that too.
As for the mantra "do the simple thing" (and I have a problem with it being a mantra for this very reason) remember that includes the second claus "can possibly work" and is meant as a reminder not to overengineer, not as a license for us to do what is merely easy, simple-minded or otherwise innappropriate.
In this context however, I actually find the use of a real database the harder thing, because of how quickly I have to start setting up and using dummy data sets to test code that is merely formatting and executing calls on that data.
I did come to this position gradually however (i.e. gradually reducing the number of cases where I would look to a data access test first) because I kept seeing teams get burned by leaning on the database (in memory or otherwise) and then having a very hard time with data layer testing and refactoring as a result. YMMV.
Posted by: bill at March 17, 2005 02:45 PMBad Links (January 19, 2006)
Visual Studio Team System Jumpstart (January 18, 2006)
Aligining Value (January 17, 2006)
Lisp Again (January 16, 2006)
Getting It Right (January 13, 2006)
Efficiency vs Productivity (January 12, 2006)
Stubbornness (January 10, 2006)
Writing To Annoy Yourself (January 9, 2006)
Due Process In The Workplace (January 5, 2006)
(All Entries...)