Why a Baseline Is Harder Than It Looks Here

A household baseline looks like a solved problem: draw a sample, write a questionnaire, send out enumerators, clean the data. In a stable setting with a recent census and reliable address frames, that is roughly true. In a federal context where the last full enumeration is old, settlement patterns shift with displacement, and some districts cannot be visited on a fixed schedule, every one of those steps needs a deliberate decision behind it.

The cost of getting it wrong is not abstract. A baseline sets the denominator against which a whole programme will later be judged. If the sample frame is quietly biased toward the accessible parts of a region, every midline and endline comparison inherits that bias — and no amount of careful analysis at the end can undo a flawed frame at the start.

“The honest test of a baseline is not how clean the final dataset looks. It is whether someone two years from now can trust the comparison you set up today.”

What works is treating the sampling design, the field protocol and the indicator definitions as one connected problem, decided up front and documented, rather than three separate tasks handed to different people.

Building a Sample Frame You Can Defend

With no single reliable address register, we built the frame in layers: settlement and enumeration-area data cross-checked against recent satellite imagery and local knowledge, then a probability-proportional-to-size selection of clusters, then a listing exercise inside each selected cluster before any interviews began.

Each step was anchored to a specific safeguard:

  • Cluster selection was documented with its random seed and selection probabilities, so the frame can be reconstructed and audited later.
  • In-cluster listing happened as a separate visit before interviewing, so households were not selected by enumerators on the day — the single biggest source of convenience bias.
  • Replacement rules were written down in advance: which substitutions were allowed, and which would instead be recorded as non-response rather than quietly swapped.
  • Inaccessible clusters were flagged and reported, not silently dropped — so the coverage limits of the baseline are visible in the final report.

That last point matters most. A baseline that is honest about where it could not go is more useful than one that hides the gap.

Survey instrument and field data
Field data work always precedes analysis. You cannot write a credible indicator until you know, concretely, how the question behind it was asked and who was able to answer it.

What GPS-Enabled Enumeration Actually Buys You

Moving from paper to GPS-enabled digital collection is often sold as a speed gain. The real value is verifiability. Three drafting moves made the difference on this study:

1. Time-and-Place Stamps

Each interview carried a timestamp and a location. That let supervisors check, the same evening, that interviews happened where and when the protocol said they should — catching problems while there was still time to re-visit, not after demobilisation.

2. Built-in Validation

Logic checks and range limits were built into the instrument, so the most common data-entry errors were caught at the point of collection rather than weeks later in cleaning. The dataset that came back was closer to analysis-ready than any paper exercise we have run.

3. A Defensible Audit Trail

When a client or an evaluator later asks “how do you know this figure is real?”, the answer is a documented chain from question wording to enumerator to timestamp to record. That trail is what holds up under scrutiny — and it is exactly what an access-constrained context makes hardest to produce by hand.

What We Are Deliberately Not Claiming

To be explicit: a baseline measures a starting point. It does not, on its own, prove that any later change was caused by a programme — that is the job of the evaluation design that follows, and we keep the two clearly separated. Where a cluster could not be reached, we report it as a coverage limit rather than smoothing over it.

That posture — rigorous on method, honest about limits — is what makes the data usable. A programme team is better served by a baseline that names its gaps than by one that presents a tidy national figure the underlying sample cannot actually support.

What Comes Next

The dataset from this baseline feeds directly into the results framework and indicator definitions for the programme it supports, and into the design of the midline that will follow. Where the same regions appear in later assignments, the frame and listing work done here is reused — so each study makes the next one faster and more comparable.

If you are a Somali institution or a partner planning a study in a similar context and would like the longer methodological note, get in touch via the contact page.