One aspect of my job I never really expected to be involved with is study design. Early in my career I worked with publically available data, so had virtually no insight into how the experiment had been run or any experience of what might actually happen whilst working in a lab.
The nice thing about being in a group that generates a lot of data is the opportunity to be involved at the conception of an idea and have an input in how the project proceeds. I can imagine this may make some data analysts rather jealous as they get handed a dataset and a question to answer with no obvious link between the two, or a technical flaw that scuppers the proposed analysis.
There is no such thing as the perfect experiment. There are so many variables that may influence the outcome either grossly or subtly: quality of sample going in, temperature, batch of reagents, individual(s) doing the experiment, day of the week, time of day, the list is endless. In larger studies you will inevitably need to perform the experiment multiple times over days, weeks or months. This will lead to batch effects. I don’t like the word batch as I think it used very loosely to cover a range of different factors. Broadly it means a group of samples that have something in common that may mean their data is more similar to each other than samples included in other batches. Often this means they were processed at the same time (think about a batch of cakes) and refers to technical factors relating to the experiment.
The challenge is to organise your samples prior to the experimental procedure so that these technical variations do not influence the statistics you want to do. If you are doing a case control study, you want to randomly allocate each sample so that in each batch there is a mix of each group. What you don’t want is the cases to be processed as a group and all the controls to processed together, as then you can’t be sure that the differences you see are due to the disease or the fact the experiment was run by two different people.
There are times when you want to make sure what you comparing is from the same batch. For example we do a lot of work with discordant twin design. Here we are looking at the differences between the two members of twin pairs, so we want to be sure that they are not an artifact of the fact that they were processed two months apart.
While I have no desire to go into the lab to run any experiments, I have learnt a lot by having day to day interaction with my colleagues who generated the data. That knowledge can really help when it comes to processing the data. Comparing notes with the person who ran the experiments to identify why something doesn’t look like you were expecting invariably gives you confidence when it can be resolved. This is the kind of interaction I always wanted out of a job. I enjoy that I bring my skills and having responsibility for certain parts of a project whilst others with different skill sets are responsible for something else.
There is enough data out there that as a Bioinformatician I don’t have to work with a group who generate data. However I would strongly recommend spending some time in that environment as it is always beneficial to understand a bit more about how and where your data came from.