Category Archives: Academia

Diversity brings value; how might we get there?

Below is a copy of a post I wrote for the Software Sustainability Institute.

This post summarises a discussion with Lawrence Hudson, Roberto Murcio, Penny Andrew and Robin Long as part of the Fellow Selection Day 2017.

The question of how to improve diversity is suitably broad and vague to initially induce silence in a group, but eventually, true to its name, it promotes a wide-ranging discussion. Sometimes the task is divided up to target particular under-represented groups, as it starts to become a bit of a minefield to develop a scheme that improves diversity in general. What opens the door to some parts of society can simultaneously close the doors to others. Hackathon events are a common and successful method of attracting young people to computer science; however, if they take place over the weekend and are marketed as providing beer and pizza for sustenance, you start to exclude anyone with caring responsibilities or discourage anyone who doesn’t drink.

Before we can think about trying to improve diversity, it is helpful to consider what exactly do we mean and what are the benefits. It is easy to see how a varied workforce can lead to a larger pool of ideas, skills, and experience, as well as a more harmonious environment where differences are embraced, minimising direct comparisons and competition between colleagues. It can also lead to a broader outreach, either exposing your product or brand, attracting new audiences, or inspiring the next generation. Diversity is often quantified in demographics (gender, age, religion, sexual orientation etc.); however, in a working environment it should also include background or previous experience. Your team may be culturally diverse but if you have all got the same degree qualification from the same institution trained in the same school of thought, where are the new ideas going to come from?

To improve diversity, it is important to recognise where the variety is lost. Was there a great selection of applicants from different backgrounds that got filtered out before the interview stage? If you can identify which factors caused the potentially diverse new team members to be excluded from consideration, this can be used to formulate more open criteria. For example in a university, ranking candidates on number of publications tends to favour men over women, focusing on quality over quantity may prevent this bias. With this in mind, developing a range of metrics of equal merit rather than focusing on a single criterion will also favour a broader range of applicants. This may mean moving away from the standard template for job descriptions which requires some time and effort on the part of the employer, but using a structure that allows potential employees to be creative with how they might meet the criteria creates opportunities for those with less traditional career paths.

Institutions can play their part by celebrating and promoting successes at all levels, as purely focusing on the achievements of the most senior employees often reinforces existing typecasts. In academia, there is a lot of truth in the stereotype of Professors as white, male and middle aged, so only covering the publications, media appearances and grant money brought in by these individuals may deter anyone who physiologically cannot aspire to this demographic. Alternatively publicising both work (software developed, new recruits, promotions) and personal achievements (charity events, sporting triumphs or bake sales) of all members of staff starts to showcase the variety underlying the workforce and may inspire a broader scope of applicants. Active involvement in the wider community raises the profile and generates positive feelings towards an organisation.  Having a creative recruitment and outreach strategy with roles such as community managers, public engagement officers and more tailored positions such as artists in residence can promote a welcoming environment and reach previously untapped employment streams.

Employers need to be open and flexible to new ways of working in order to appeal to a more varied pool of applicants. While many employers recognise the value of diversity and would always embrace a broad range of applicants to choose from, when it comes to the final decision, it can take a brave individual to select a candidate that differs from their usual employee. Pressure to hit the ground running creates barriers for individuals with great potential but who require a little more training or time to adjust to a new environment.

With increasing variety in backgrounds, training opportunities and career paths, the diversity we know will benefit us is continually expanding in the working population. A more open, flexible recruitment strategy will provide the opportunities for those looking for a change, both for employers and employees. However, diversity cannot be enforced. For the benefits to be realised, it needs to be an organic experience where the individuals involved recognise its value.


Most bioinformatician posts are based in Universities and often require a PhD.

I did my PhD because it helped me get the job I wanted, so I see them as the graduate training scheme for working in scientific or medical research.

I didn’t really know what a PhD was until I did mine, so I admit I went into it slightly clueless and picked up what it was I was supposed to be doing as I went along. This blog post should help anyone thinking about a PhD but not 100% sure what exactly that means.

Firstly, some technical points:

  • The end point of your PhD is a written thesis (~50,000-80,000 words) of original research conducted by you, which you have to defend (or convince that you did and understand) in front of two examiners at a viva.
  • They can be started straight after an undergraduate degree,  but if your degree wasn’t in a relevant subject, you didn’t do as well as you should have, or you are not 100% sure if 3+ plus years in a research environment is for you a Masters program in between may be appropriate.
  • You will be assigned to a supervisor, or more likely multiple supervisors, who will guide you and are responsible for ensuring you get your PhD.

Every PhD is a unique experience, however there are many commonalities. They are designed to be challenging, primarily educationally but also personally. The idea is to study something novel, so as you follow the road not previously traveled,  it is inevitable that there will be problems or challenges along the way where the answer is not obvious. For some some problems there may be no solution (and part of your research is to develop the answer) or there may be multiple paths and you have to decide which one to take.

So why do it? It is a great addition to your CV, even if you don’t see yourself staying in academia. You are recognized as an expert in your (perhaps niche) field of study, and have demonstrated the ability to manage and complete a project over a specific period of time. It is perhaps underappreciated how hard they can be to finish as while the broader research project may go on, the PhD is finite has a clearly defined end goal. Depending on your personalities, it can be either the student or the supervisor who struggles to make the distinction between the end of the PhD and the end of the research project. Ultimately, as the student you have to have to the tenacity to put in the work to meet the requirements and achieve the degree.

Essentially a PhD should be seen as an opportunity, you are a student (and in the UK paid a stipend to support your living costs and not a salary) and therefore should take advantage of learning as many skills, going on as many courses as possible even if they are not directly relevant (think of it as personal development) and generally maximizing the opportunities presented. You should also get the chance to present your work outside of your day to day environment at conferences, so depending on a) your consumables budget and b) reach of your research you may get some chances to travel to all over the world. As an informatician, my only real expense was a computer so the rest of my budget enabled me to do a lot of travelling compared to fellow students who had expensive experiments to fund. There are lots of funding opportunities available to PhD students for travel, so even if your scheme doesn’t have much money available for this, you should still be able to identify sources of money to help with this.

Communication skills are very important and inevitably will be developed throughout your time as a PhD student. You will need to be able communicate effectively with your supervisor and colleagues, to put together your thesis, to present your research internally and externally either as talk or poster and finally to explain and answer questions about your work in your viva. You shouldn’t be afraid to disagree or follow your own intuition, but it helps if you can explain why.

Ultimately you need to be self-motivated, resourceful, and open to new experiences. You will learn a lot about your area of study, yourself and how research/academia works. It can be highly rewarding and set you up with a range of skills applicable to many careers.

If you would like to read more, take a look at this blog post which may be particularly relevant if you are based in the US.


Dealing with unknowns.

Science is all about dealing with unknowns.

There are the big unknowns, ‘Can we eradicate cancer?‘, ‘Why do we forget things as we get older?‘, ‘ Can we grow replacement organs?‘. Then there are the day to day niggling unknowns. These are the ones that tend to cause the most anxiety. Perhaps because we never expect to completely answer the big questions and are simply looking to add to the body of knowledge.

Pretty much all of the day to day problems I deal with relate to ‘how’ are we going to test a particular hypothesis. Once you have data in hand, it is not uncommon for some technicalities or oversights to emerge. We have to accept that the perfect study design is often unobtainable, and instead are striving to control for as many external factors that may influence the result as possible. Where you couldn’t do so in the way the experiment was conducted, you have a second chance at the analysis stage. This is limited by two things: 1) knowing what all of these possible confounders are, and 2) actually having a measure or proxy for that confounder.

There are two routes taken when dealing with confounders: one option is you perform the initial analysis and then see if it changes with the addition of additional covariates, alternatively you include all the variables from the outset. Personally I don’t see the point of doing an analysis, if you are subsequently going to discount any of the results which you find later to be a product of some other factor. Of course this view, may reflect my ‘omics background, where, given the large number of features tested in every experiment, spurious results are expected as part of the course and the quicker you can discount them the better.

Recently I have been working with some data for which, we are aware of many possible confounders. Some of these were obvious at the start and we have the relevant information to include in the analysis. For some of the unknowns, we have calculated estimates from our data using a commonly accepted methodology – however we are unsure of how accurate these are, as there is little empirical evidence to truly assess them, and whether they are capturing everything they should.

An alternative in high dimensional data (that is when you have lots of data points for each sample), is to use methods to create surrogate variables. These capture the variation present in your dataset presumed to be reflecting the confounders we are concerned about (and those perhaps we haven’t thought of yet). I have always been cautious of such an approach as I don’t like the idea of not understanding exactly what you are putting into your model. What’s more there is a possibility that you are removing some of the true effects you are interested in. However, there is the opposing argument of, ‘What does it matter? If it prevents false positive results then that’s the whole point.’

At present it is somewhat an open question which way we should proceed. It is good practise to question your approach and test it until it breaks. Having tried a few ways of doing something – all of which produce a slightly different numbers, how do we decide which is the correct one? Part of the problem is that we don’t know what the right answer is. We can keep trying new things but how do we know when to stop? Because unlike school we can’t just turn the textbook upside-down and flick to the back pages to mark our effort as right or wrong. Instead we have to think outside the box to come up with additional ways to check the robustness of our result. But this is part of the course, research is all about unknowns. These are the challenges we relish, and maybe eventually we will start to convince ourselves that our result might be true!

Often the gold standard is replication, that is repeating the analysis and finding the same result in a completely independent sample. Sometimes you might have a second cohort already lined up, so this validation can be internal and give you confidence in what you are doing. Or you may face a nervous wait to collect more data or for another group to follow up your work.

Sometimes though, you just have to go with what you have got. Sharing your work with the research community is a great opportunity for feedback and may prompt a long overdue conversation about the issues at hand. Ultimately, as long as you are clear about exactly what has been done, your findings can be interpreted appropriately.


Questions and more questions

Within my direct team I am the only mathematician. The rest of my colleagues have come from biological backgrounds and have spent a lot of their careers in lab generating data.  While they have lots of experience in analysing this data, they have fully embraced the addition of a statistical mind to expand their skill set. One of the benefits of this is that I get to work across the group and am involved in a wide variety of projects.

The level of my involvement varies from getting stuck in and doing some of the analysis, explaining particular methodologies, making suggestions or providing a sounding board for other people’s ideas. Our office is a very open, social environment where we can discuss problems and ask questions as and when they occur.

Having the confidence to ask questions is very important. If you work in academia it is presumed that you are very intelligent and therefore know everything about everything. It can therefore be a daunting environment for a student, as you feel that any question you ask may inadvertently expose your weaknesses. However, not asking for help when you need it is a weakness in itself and will only hold you back.

The breakthrough for me came when biologists started asking me maths questions. It made me realise that we all had different skill sets and most importantly we were here to learn from each other, in return the maths student could ask the biologist biology questions! What you start to realise that everyone has gaps in their knowledge, it just may be hidden behind a good poker face.

I really enjoy sharing my knowledge and the challenge of trying the explain a concept clearly. It also gives me confidence that I do know what I’m doing, if someone goes away understanding something that boggled them previously. However, as the resident statistician, initially I felt a lot of responsibility to answer every question about statistics completely,  correctly and succinctly. What’s more I also felt that I should be able to answer any question posed. But as with asking questions, you shouldn’t be ashamed to admit you don’t know something when answering them too. Many of my answers are prefixed with ‘I am not an expert in this but if it was me I would …’ Sometimes my offering is that I know where to find the answer (using every Bioinformaticians best friend – the internet) and then help explain what it means.

It can be reassuring when someone else acknowledges that they are not 100% sure about something, as it helps remove any unrealistic expectations of perfection. On top of that – and as I have to constantly remind myself – it wouldn’t be science, and we wouldn’t be here doing this job if we knew all the answers…