All posts by

Questions and more questions

Within my direct team I am the only mathematician. The rest of my colleagues have come from biological backgrounds and have spent a lot of their careers in lab generating data.  While they have lots of experience in analysing this data, they have fully embraced the addition of a statistical mind to expand their skill set. One of the benefits of this is that I get to work across the group and am involved in a wide variety of projects.

The level of my involvement varies from getting stuck in and doing some of the analysis, explaining particular methodologies, making suggestions or providing a sounding board for other people’s ideas. Our office is a very open, social environment where we can discuss problems and ask questions as and when they occur.

Having the confidence to ask questions is very important. If you work in academia it is presumed that you are very intelligent and therefore know everything about everything. It can therefore be a daunting environment for a student, as you feel that any question you ask may inadvertently expose your weaknesses. However, not asking for help when you need it is a weakness in itself and will only hold you back.

The breakthrough for me came when biologists started asking me maths questions. It made me realise that we all had different skill sets and most importantly we were here to learn from each other, in return the maths student could ask the biologist biology questions! What you start to realise that everyone has gaps in their knowledge, it just may be hidden behind a good poker face.

I really enjoy sharing my knowledge and the challenge of trying the explain a concept clearly. It also gives me confidence that I do know what I’m doing, if someone goes away understanding something that boggled them previously. However, as the resident statistician, initially I felt a lot of responsibility to answer every question about statistics completely,  correctly and succinctly. What’s more I also felt that I should be able to answer any question posed. But as with asking questions, you shouldn’t be ashamed to admit you don’t know something when answering them too. Many of my answers are prefixed with ‘I am not an expert in this but if it was me I would …’ Sometimes my offering is that I know where to find the answer (using every Bioinformaticians best friend – the internet) and then help explain what it means.

It can be reassuring when someone else acknowledges that they are not 100% sure about something, as it helps remove any unrealistic expectations of perfection. On top of that – and as I have to constantly remind myself – it wouldn’t be science, and we wouldn’t be here doing this job if we knew all the answers…

Bioinfo what?

So I thought I’d spend some time explaining a little more about the field of Bioinformatics.

If you wikipedia it you will discover that it is where statistics, programming and biology meet. You may then be wondering when would this happen? Although the relevance of maths in biology has been present ever since its first outing, the need for mathematicians or programmers is much more recent.

It has mainly arisen as technology has improved to produce ever-increasing amounts of data, and more complex data at that. In 1990 they started sequencing the first genome, finishing in 2003. These days, the data can be generated and analysed in around a day. Whats more we can now generate data on not just the genome, but the epigenome, transcriptome, metabolome and proteome. These are collectively known as the ‘omics. What they all have in common is lots of data points (generally hundreds of thousands) each representing different parts of your DNA or the resulting chemical molecules.

It would be completely implausible to try to analyse each data point one by one with a pen and paper. Therefore, some knowledge of programming is needed to manage the data and implement analyses efficiently.  The role of statistics is to ensure that the data is analysed appropriately and results are not chance findings. While the biologist is needed to run the experiment and do the interpretation. This is a simplication of how these skills come together, but there is huge variety in the type of projects requiring a bioinformatician.

Bioinformatics is now a field in its own right. I am not aware of any undergraduate course in the UK, although many bioscience departments are starting to offer modules in it. Often the first chance you would have to study it is at a masters level. Courses will accept biology, mathematical/statistical or computer science graduates, but my experience is that the vast majority of the intake have studied biology or related disciplines. I mainly put this down to no-one telling mathematics or computer science undergraduates that this is an option, as these course are predominately based in bioscience or medical schools. It may also be that biologists realise that to remain competitive in the jobs market you need to have some of these skills (particularly if you want a career in genetics). I have seen non-biologists initially really struggle on these courses, as it is a steep learning curve from GCSE or A level  with lots of new vocabulary, concepts and mechanisms to get your head around. It can be demoralising and seem like a daunting task, but when it comes to the analysis side you will find the tables flipped and everyone looking at you wishing they could do what you can so it is worth being patient and sticking with it.

So if this appeals you are probably wondering, which of these subjects should you study at undergraduate level? Well, as all of them can lead to the same outcome it has to be a choice based on were you think your strengths lie, what you will enjoy, and remain motivated to study for three or more years. What I would say is look for opportunities to broaden your skill set across the three domains, can you do a computer science module or learn some programming as part of your final year project in your Maths degree. Does you department offer a bioinformatics or mathematically modelling module in your biology degree? Can you develop some software to help a field biologist collect and store their data. My break came when I spent 10 weeks doing a computational biology project in Edinburgh during the summer of my Maths degree. This was my first chance to learn to programme and learn about the data biologists were working with.

The reality is that most bioinformaticians have a particular strength and positions call for different combinations of skills. You may not need to be the whizziest programmer but have a good analytical mind to decide which statistical approaches should be used. You may not know a huge amount about what the data is but you do know how to store data in an efficient and secure manner, or how to set up and manage high-powered computing systems.

Which ever way you approach it you will have many opportunities to work on different projects with different teams all over the world. Almost all industries are increasing reliant on data and informaticians to stay ahead, so if you decide that biology isn’t for you there are many other opportunities out there with these skill sets, so it worth thinking about!

Taking maths beyond the classroom.

In this blog post I want share my thoughts on the differences between Maths taught in the classroom and that used in professional life.What often appeals about Maths is the routine of applying a clear set of instructions. Regardless of the context, this remains a large part of any mathematicians career. The big difference once you have left the classroom, is what comes before and after.

Generally in the classroom who are taught a particular statistical test: its assumptions, how to apply it and how to interpret the output. Then when it comes to the end of year exam, the question specifically asks you to perform said test, often on data generated (by a computer) to give a particular answer.

Now, once you are employed as a statistician (or any role where statistics forms part of the job description), the question no longer guides you in exactly what to do. More likely you will be given a data set, and some premise of what you required to extract out of the data. The level of detail of your task is highly variable and likely dependent on the statistical ability of the person asking. The less they know the more vague or far-fetched the question, whereas a fellow statistician is likely to set out a clear hypothesis having already worked through much of the thought process you would have gone through.

Before you can actually start any number crunching, you need to deduce what the hypothesis is. Then you need to decide whether it is actually testable in the data you have. If you think the data can’t answer the question in hand you may have to adjust the question, and present your superior with what you can establish sometimes leading to a protracted negotiation until you are both happy. Once you have finalised the hypothesis, you can then think about which statistical test to use and how.

What I think is missing in the classroom is this thought process of deciding what procedure to use and when. In my experience, I was always explicitly told what I was going to need to do. As a results of this I remember stressful interactions, when at university, friends on other courses would ask for advice on what statistics to use in their dissertations and I would grapple through what I knew to try to advise them. Since my degree, I have had to learn how to answer these questions for my own work, but also to help colleagues in their projects. It can be challenging to convert their biological question into the underlying hypothesis and the mathematical concept that may be represented by.

Experience is the key, but communication is what is going to get you through. Being able to decompose their question into the relevant parts (if they are asking you for help, they probably have overcomplicated it) allows you start thinking in terms more familiar to you. Keep asking them questions with the aim of getting them to refine their question into a testable hypothesis. While there may be moments of utter confusion or complete miscommunication, these interactions are good for both parties and can lead to some novel ideas neither party would have come to on their own.

The  other main difference I want to discuss, is that the way Maths is taught can be quite limited.  Not only what has been decided should be on the curriculum which is true of all subjects, but also in the structured style of exams. This means you can only do what you are directly asked to, and once you get to the require answer that’s the end. There is no opportunity to show off additional skills or explore further, the way you can with an English or History assignment. Within my role, I am given a lot of freedom to explore datasets beyond the primary purpose. I can generate and test additional hypotheses and try out more advanced or new routines. This creativity really helps improve my skill set and gives me confidence in adopting new areas of statistics I have not encountered before.

The reality is, I have probably learnt more about how Maths really works outside of my initial education and training. You can never discount the value of experience. I would advocate, therefore, that more Maths assignments or assessments take on a more flexible framework. We should give students a chance to follow a project through from design to completion, rewarding the thought process as much as the ability to compute the answer. The skill most employers value from Maths is problem solving, and how can we really teach that if when we set the question we tell them how we want it answered too?

Simplicity is the key.

So for all the budding mathematicians out there I want to share with you more details of which statistical tools I use day-to-day.

So the first thing to say is that, long gone is the pen and squared paper.  Here showing your working involves creating a document with a series of the commands or functions you have run in your statistical computer package of choice. Generally I am interested in seeing if there is a relationship between two measures. I would say that the most common methodology I use is regression, perhaps more familiar to you as fitting a line to data. We routinely have to deal with a range of confounders (that is additional factors such as gender or age that may induce an association between the two variables of interest) and linear models have the flexibility for this.

There is often an expectation that as we deal with complex data, we must use super complicated mathematical formula to cope. This isn’t always necessary (at least not initially), so why make it harder than it needs to be? Keeping the analysis simple, helps makes the interpretation easier. Implementing a more advanced test (likely accompanied by an impressive name such as ‘dynamic time warping’) may give you a great sense of achievement. But this is often short-lived, lasting until someone asks, “So what does this mean?” and you try to translate the underlying hypothesis into a biological concept.

The most important factor,  is to keep in mind what scenarios the statistical test is designed for and understanding or recognising its limitations. Unfortunately, there are many biological measures (genetics being a particularly good example of this) that flaunt common statistical assumptions. This can be the biggest challenge as often an appropriate test does not exist, so you have to get creative to see how far you can stretch the one you are using. However I think this is whether statisticians need to think a little bit more like other scientists, who routinely accept that no approach is perfect and every experiment has limitations, the key is to acknowledge them.


Oh I’m rubbish at maths.

Few people look back at school and list Maths lessons as a highpoint. There is also a fairly automatic response to classify ourselves as either good or bad at maths, with the majority of people assigning themselves to the second category.

Part of the problem is that we associate Maths with being put on the spot. In school we had to recite times tables or answer quick fire mental arithmetic questions, with the pressure of then finding out whether we were right or wrong.  In adult life, we have to navigate the complicated set of rates and fees during an interview with the bank or calculate the 10% tip while the waiter or waitress watches over.

What attracts a lot of people to Maths is there is always a right answer. But this can be a double-edged sword as there are plenty of wrong answers too. These days you often also get credit for showing how you get to the answer. Although you are still expected to provide this in the pressurized environment of exam conditions, within a time limit and with no help from anyone else, textbooks or notes.

In my mind being good at Maths is not the same as reciting Pi to 14 decimal places, or listing the first 20 square numbers. Maths, like all sciences, is about understanding concepts and applying routines. There is no rule that you must hold all this information in your head at one time.

As I admitted in my previous post, I use the Internet on pretty much a constant basis to support my work. Often, I like to check that the test I have in mind is appropriate and that I can remember how to do it correctly, or sometimes I want to try something new and I want a step by step break down of how to implement this. None of this takes away from my mathematical ability.

So if you don’t have the best memory or respond well to exam situations, don’t let this cloud your judgement.  If you enjoy Maths but didn’t do so well in your exams, don’t write yourself off. Yes, it may take a bit of effort but there are plenty of great resources out there to help you. And, like with most things, if you repeat it a few times, sometimes it starts to stick.


It’s all out there.

My job is essentially a office job, I spend most of everyday sat in front of my computer. The reality is I do most of my maths with the help of statistical programming packages, however that is not to say you won’t find scraps of paper with hand written algebraic derivations littered around my desk – it just helps me think!

Predominantly, I work with one called R, which is free to download. Programming is an important part of my job and is a natural progression for anyone mathematically minded as it is essentially based on logic, and you get the same sense of satisfaction creating a working computer programme as you do solving an equation. I would strongly encourage anyone interested in a career in statistics to take a look at the tools out there as it may put you one step ahead in the jobs market.

I and most of my collegues are self taught programmers. Intially small things can be incrediably fustrasting, what really flummoxed me early on was working out how to read my data from an excel spreasheet into my R session. But, this should not deter as,  your ability accumlates quickly once you have made the initial breakthrough.  Further, these skills are so transferable (once you understand the principles of programming in one language, picking up a second, third, fourth etc is much easier) and valued by employers, it’s worth the early pain as it can open up so many alternative careers.

There is so much advice and many tutorials online, one I would recommend is which is great starting point for beginners, there is no reason why anyone can’t give them a go as all the material is accessible and FREE. Google is an essential resource for any programmer, it’s often quicker than looking up functions or commands in reference books and can save you a lot of time in debugging errors. ‘Have you Googled it?’ is a common retort when presented with an unseen before error message. The challenge is sometimes knowing what to search for, as the terminology may not be obvious, particularly if you don’t have any formal trainning but you will pick it up. It can also be helpful to know others are struggling as well. Stumbling across forums where people are publically declaring that they have hit the same wall as you, reaffirms you are not completely inept and on the right track. Remember we learn more from the mistakes we make than from our successes – which is a good thing as you will get lots of errors in your programming career.

Who am I?

I am a mathematician.

It’s not my job title, nor do I work in a Mathematics department but that is what I am.

If you wanted me to be more specific, I would say I am a statistician. And I work in a biological field, so maybe biostatistician would be more accurate. But I use computer programming to do my maths so that makes me a bioinformatician, and hey presto we’ve reached my actual job title. Underneath it all though I am a mathematician, that is my fundamental skill set, but I have always seen it as that a toolkit designed to be applied to a range of fields – in fact anything you fancy (energy output, retail, population demographics, economic trends, sports performance, elections,…). I chose genetics and now work in the School of Medicine amongst predominantly biologists.

Through this blog, I will discuss what it is like to continue doing maths beyond the classroom and hopefully encourage a few future mathematicians to stick with it.