Pilot studies

Pilot studies are mini-versions of what is to be the main experiment. It can help test out the environment or variables and try to fine tune the final experimental design. Pilot studies are useful as part of very large experiments that cannot afford to make silly errors. They can also provide some insight when a full-scale experiment would be too complex or costly.

Pilot studies can provide proof-of-principle for a certain experimental approach at a much lower expense than the fully fledged project would. They can also crystallise the appropriate range of values for certain variables. This minimises the different trials required in the full experiment.


Variables are factors present in an experiment. Often, they are the factors that we are primarily interested in e.g. testing a food variable against an age variable. Due to the nature of experiments taking place in the actual world, many confounding variables can present themselves, more or less clearly, in experiments. Confounding variables are those variables which can overlap so that it cannot be obvious whether a result is due to one variable or another. For example, in looking at people's diets which contain both fat and sugar, these components are confounding variables of each other in a study that wants to only look at the effect or fat or sugar.

In order to tease away confounding variables from each other, a technique called randomised block design is used to categorise data into groups, and then carry out the experiment independently in each. For example, participants can be split by weight, diet, sex, height, etc. so that any findings can be said to not be due to any of these pre-split factors.

Variables vary by the type of data they can produce. Discrete variables give rise to individual data points that cannot be connected e.g. colours, whole numbers of things, blood groups. Continuous variables give rise to data that is possible on a spectrum of connected values e.g. height, width, solute concentration.

As such, the data derived can be qualitative (green, blood group B), quantitative (1.55 m, 65 nm, 50 nM NaCl) or ranked (low, moderate, high intensity).

Different data can be displayed and analysed in different ways - graphically, statistically, etc. Data can be represented in many ways e.g. bar charts, scatter plots, images, photos, tables. The type of data determines what method can be used to represent and analyse it.

Experimental design

Things to consider during the process of developing an experimental design are the controls, dependent and independent variables. Controls are experiments that establish a baseline or use a default, standard or neutral state to ensure the actual test experiment doe not merely produce background results, or results that are not due to the tested variable.

Dependent variables are the target variables - these are observed against the independent variable to check what is happening. The independent variable is a known variable that can be a constant e.g. time, distance, concentration. The dependent variable is what is tested against it e.g. colour, substrate breakdown, growth.

As such, the independent variable is controlled and set up by the experimenter to be able to test and observe the progress of the dependent variable against it.

Simple experiments use one independent variable only. These are often carried out in vitro i.e. in the lab using simple reagents like chemicals or cells. The downside to these relatively easier, more straightforward experiments is that their findings may not be as applicable to a wider real life context as in vivo experiments. These are carried out in living organisms.

Complex experiments, such as those in vivo, present multifactorial experimental designs. They have more confounding variables, including independent variables.

In observational studies, there is no independent variable, as groups already exist in the wild as they are. These studies can reveal correlations between variables, but are less useful for revealing causation.


Positive and negative controls are used for experiments. Controls ensure that the outcome of the experiment is what it seems to be.

Positive controls give a reference point for what the result would look like if it worked, while negative controls give a reference point for what the result would look like if it didn't work.

positive control for PCR might be a PCR reaction identical to the one we are running as an experiment, but instead of the test template DNA we add a different template DNA that we know will definitely work based on previous data. If the experiment fails, but the positive control works, we can be sure that the PCR reaction was correct but there was an issue with the test template DNA.

negative control for PCR requires a little less sophistication, and might involve using the same PCR reaction while omitting any template DNA at all. If we seem to get something that looks like it worked in our experiment using our template DNA, but it looks the same as the negative control, then we can be sure that it actually hasn't worked, and the result is because of another reason e.g. contamination, background signal, PCR ingredients themselves, etc.


When sampling things from the field, it is key to ensure representative sampling. This depends on the variability in the population. The more variable a population, the larger the sample would have to be in order to be representative.

The mean of, and variation around a sample should closely match those of the actual population. It is impractical to aim to collect each member of a population, so sampling is necessary.

Random sampling involves an equal probability of any member of the population being chosen. Systematic sampling involves sampling that takes place at intervals e.g. using transects. Stratified sampling involves splitting the population into categories and then sampling from the categories. This is useful when these categories already exist, or there is a reason why they might be confounding variables unless split, as seen in randomised block design.

Ensuring reliability

Reliability refers to the extent to which the same results would be obtained if an experiment were repeated independently. Data collected must be reliable. If variation is found in data, it is critical to know whether this is because of the data itself, or due to the measuring technique, equipment, human handling, etc.

Repeated measurements have two key properties: precision and accuracy. Precision refers to the inherent statistical variability data can be expected to have; while accuracy refers to the closeness of data obtained to the actual "real" value it is supposed to represent.

Data can be very precise but off the mark of the actual things that are happening in the experiment. Data can also be accurate in terms of representing what is happening in the experiment, but lack precision.

Very precise but not accurate data reflects a systematic error e.g. broken instrument that no longer measures its variable, but still outputs precise data points referring to a non-existent phenomenon.

Accurate but imprecise data reflect a random error where the data do represent the real phenomenon that is happening, but the measurements have random errors that impact precision.

Means are used to pool these variable data points to obtain a true reflection of the overall measured phenomenon.

Reliability can be checked by repeating whole experiments. Reliable experiments and data should play out the same way as originally, while unreliable data could even be impossible to reproduce. Many technically complex, expensive experiments e.g. Large Hadron Collider experiments are likely to not be reproducible for some time, or by certain people or countries, or ever.

Ok byeeeeeeeeeee