Thursday, October 15, 2020

What Should Software Science learn from the Corona Crisis?

What Should Software Science Learn From the Corona Crisis?

The corona crisis does not only influence people's daily life, it also influences how people think about science. Suddenly, scientists are present in the news, scientific results influence new laws that appear because of the corona crisis, and the results of scientific studies become part of people's daily conversations. Actually, this new popularity of science is good.

And there are a number of people who doubt in scientific results. Actually, this is not that bad. Science requires doubt. Progress happens, because some people do not believe in commonly accepted theories and search for alternative explanations or new interpretations for given phenomena. 

What's bad is when people ignore results or invent new theories without having any evidence for them. And what's even more bad is, that there are people who follow such new theories without even demanding evidence. Such people can be fooled too easily and for other people fraud becomes a profitable business.

People's typical reaction on skeptics of the corona crisis is that education would help. If people would be better educated, their knowledge about science helps them to distinguish between serious interpretations of scientific results and rather wild guesses based on personal anecdotes. But while this statement is probably true in general, we cannot assume that every discipline provides such a profounding knowledge. 

Taking into account that the scientific foundation of software science is rather low it actually makes sense to think the other way around: what can software science learn from the corona crisis? So, why not trying to find some "lessons learned" for software science from the ongoing crisis?

It is the numbers that do matter

It sounds stupid to point this out, but the first thing to be learned from the corona crisis is, that it is the numbers that do matter. 

The first and rather obvious number that directly comes to one's mind is the death rate. But other numbers such as infection rates, etc. do matter as well for medicine. For other disciplines such as economics numbers do matter, too. There, monetary aspects such as the costs of the crisis do matter. In the end, it is not a single number that matters. Each single number plays its role, but a number of different numbers need to be taken into account in order to get the big picture. 

But the important insight is not only that numbers matter. The important insight is, that hardly anything else than numbers matters. Even if there is a person who has a strong believe in the effectiveness of some medicine, therapy or vaccine, it does not imply that such statement should be taken too serious. Even if someone ignores the huge, negative impact of corona on the economy, it does not imply that this negative impact does not exist. Rhetorical skills of people might strongly work on some people. But rhetorical skills hardly change the reality.

In the end, the effectiveness of some treatment, or the validity of an argument requires numbers: we want evidence for statements and not only some famous peoples' believes. Software science should demand evidence. Software science should demand numbers - numbers that do matter.

Numbers are dirty

The second lesson to be learned is, that numbers are rarely pure. Numbers are dirty. Empiricists are aware that measurements are rarely as pure as people would want them to be. Measurements imply errors in measurements, measurement tools have their drawbacks. Although people want measurement tools to be as precise as possible, we have to accept that every measurement tool has problems. 

And empiricists are used to the problem that people, who do not accept empirical results, discredit the numbers. In the corona crisis, the death rates are discredited. People doubt, that the number of deaths is valid - and they have good reasons to doubt in the perfection of the reported numbers. Obviously, there is no independent institute that can analyze for every single case whether a person died just with or from corona. We do not necessarily speak about intentional lies. We speak about cases where it is not clear whether there was a causal relationship between the virus and a person's death. And we have to accept that even corona tests can fail. It is the nature of measurements that there are error rates - and it is the goal to reduce such error rates.

We also see that the numbers are attacked on different levels. For example, we find people who doubt whether the reported death cases are actually true, i.e. people argue that there could be additional corona cases that were intentionally not reported. Or some people argue that the reported number of infections are too low, because some governments are not interested in reporting high numbers. And even if numbers are accepted, people who are not willing to accept empirical results start new interpretations. For example, people argue that a high infection rate is not the result of an ongoing pandamia, but rather the result of a high number of tests. Or a high death rate is not the result of failing countermeasures, but rather the result of an extremely aggressive virus.

In the end, we have to accept that all numbers have their problems. This does not mean that we should blindly trust in all reported numbers. It is important to see how the reality is mapped to numbers and it is important to understand potential problems. But just stating "the reported numbers are wrong" is rarely a constructive criticism. There is a need to understand how problematic a number is, how measurements could be improved, etc. And it is always necessary to question the relevance of reported numbers. And it is important to identify people who discredit numbers for rather personal reasons and who hinder that way the process of knowledge gathering.

For software science, the lesson learned is that we should not be too quick to discredit reported numbers. We need to understand the process of data collection (and interpretation) and need to understand how large possible errors of certain measurements techniques are. This means that finally we need to identify relevant measurements for our discipline and we need to define measurement techniques in order to get valid measures. And we should be cautious with people who discredit numbers for the sake of discrediting numbers.

It is not a single study that matters, it is multiple of them

Another lesson learned from the corona crisis should be that scientific knowledge usually does not arise from a single study. 

Up to now, there are hundrets and hundrets of studies on corona from the field of medicine. And our knowledge on corona is the results of a combination of a large number of these studies. This does not means that each single study is fantastic. There are in the meantime a number of studies which are today considered invalid. And there are studies who just reproduce results that already have been already reproduced by others.

But the essential lesson learned is, that people in a mature discipline study the same phenomenon over and over again from different perspectives. Different experimental designs, different treatments, different measurements, different measurement methods, etc. -- the knowledge on the field consists of multiple tools and efforts in order to get the big picture.

Such effort is required in software science as well. Instead of celebrating novel ideas in our field, we should appreciate more studies that study given phenomena in depth. We should collect multiple studies on the same phenomena. We should encourage people to study phenomena, although there exist already some studies on such phenomenon.

The Need for Education and Demystifying Science

These corona times teaches us, how necessary it is that people understand non-subjective reasoning and how necessary it is that people distinguish between fact and fiction. Unfortunately, this requires education. It is not enough to argue for or against a statement by adding a phrase such as "scientific studies have shown" to it. Science is not a a magical process. Science just means to be as much non-subjective as possible. Science tries to run, collect, summarize and interpret studies without any agenda in mind. Education demystifies science and statements such as "there is a scientific study" start losing their authority -- which is good, because there is the need to understand studies and not only to accept an author's interpretation of a study.

People should doubt in the results of studies. Such doubts must not be some naive scepticism. It requires knowledge about the underlying procedures and it requires knowledge about the lines of reasoning built upon collected numbers. The necessary willingness to doubt in results also requires knowledge and recognition of valid results. Knowledge about methods teaches us where the limits of doubt are.

Infortunately, this is probably the biggest issue for software science. Actually, it is not clear whether software science provides to its actors enough knowledge for the mentioned kind of reasoning. There are even reasons to believe, that software science education, which is massively influenced by or based upon math, is counterproductive for understanding the results of empirical studies: when you are familiar with proofs by contradiction or with counter examples that disproof a general statements, it is hard to understand why a single case in an empirical discipline does not destroy a whole theory. When you are used to counter examples, it is hard to understand why a single person, who suffers from covid-10 for a second time, does not automatically falsify an immunity theory.

Summary

There is a lot that can be learned from the corona crisis. Software science can learn a lot from the corona crisis. We as software engineers or software scientists should not just read newspapers today and pretend that the process of knowledge gathering for corona is completely different to what needs to be done in our field. 

We should demand numbers. We need to provide such numbers. Our lines of argumentation should rely on numbers. And we need to accept the impurity of numbers - and use education as a weapon against wild speculations and naive scepticism in our field.