Building Trust in AI Part II: Trusting the Algorithm

3 minute read

There is a need to evaluate a machine learning model as a whole before deploying it “in the wild”. Users need to be confident that the model will perform well on real-world data; and according to the metrics of interest. Currently, models are evaluated using accuracy metrics on an available validation datasets. However, real-world data is often significantly different. Furthermore, the evaluation such metrics may not be indicative of the product’s intended goal. Thus the thorough inspection of individual predictions and their explanations before deployment is a worthwhile solution. There are aspects related to the design of machine learning model that can have inbuilt bias. One such bias is the bias function of an estimator.

Estimator bias function

Bias function of an estimator is a measure of central tendency. It is the difference between the estimator expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called unbiased. Otherwise the estimator is said to be biased. In statistics, “bias” is an objective property of an estimator, and while not a desired property, it is not pejorative, unlike the ordinary English use of the term bias. Bias can also be measured with respect to the median, rather than the mean expected value.

The average of a set of data points tells us something about the data as a whole, but it doesn’t tell us about individual data points. As such sometimes the mean is misleading. In order to give unusually large or small values, also called outliers, less influence on our measure of where the center of our data is, we can use the median. However the mode is most useful when you have a relatively large sample so that you have a large number. So naturally there might be good reasons to use a statistically biased estimator; most notably, it might be to facilitate a significant reduced variance on small sample sizes in this way the use of a bias function is a deliberate choice. We consciously choose to use a “biased” algorithm in order to mitigate or compensate for other types of biases. For example, if one is concerned about the biasing impacts of training data, then many algorithms provide smoothing or regularization parameters that help to reduce the possibility of over fitting noisy or anomalous input data.

Bias can also arise from the inappropriate uses or deployment of the algorithms and autonomous systems. when the algorithm or resulting model is deployed outside of those contexts, then it will not necessarily perform according to its intended use case or appropriate standards. For example, an owner of a self-driving car in the United States decides to import their new car to the United kingdom. The autonomous systems within the car when driven within the United Kingdom would perform in an unwanted –biased manner since in the united kingdom we drive on the left-hand side of the road and not on the right. This biased performance arises from inappropriate use outside of an intended contexts. A more subtle example of transfer context bias could arise in translating a healthcare algorithm or autonomous system from a research hospital to a rural clinic. Almost certainly, the system would have significant algorithmic bias relative to a statistical standard, as the transfer context likely have quite different characteristics. This statistical bias could also be a moral bias if, say, the autonomous system assumed that the same level of resources were available, and so made morally flawed healthcare resource allocation decisions. Subtle nuances such as these have turned out to be socio political and economic tripwires for big businesses when step on because in many of these cases the fall out could have completely been avoided with an increased investment in the diversity technology field and among the groups of people that are actually creating the algorithms. An increase in the diversity culture brings with it an increase in the diversity of thoughts, considerations, perspectives and values. This is of incredible important in the now increasingly shared hyper-connected world where businesses, lives and livelihoods and reputations can be dismantled overnight with a Youtube, Facebook or twitter message. Because without diversity biases can be baked into these algorithms, and so they’ll behave in a prejudiced way, just like a biased person would. An investment in diversity is only really 50% of the journey the rest boils down to documentation and standard operating procedures. Documenting the use cases and hence it’s limitations as well as justifying the statistical methods used in the design of our machine learning models and making these white papers publically available for appraisal will take us that much closer to build trust in the promise of AI.

Hope this helps…