QAL Podcast Adam Smith

Quality In A.I. (with Adam Smith from Dragonfly)

Related articles and podcasts:

We’re trying out transcribing our podcasts using a software program. Please forgive any typos as the bot isn’t correct 100% of the time.

Audio Transcription:


In the digital reality evolution over revolution prevails. The QA approaches, and techniques that worked yesterday will fail you tomorrow. So free your mind. The automation cyborg has been sent back in time. TED speaker Jonathon Wright’s mission is to help you save the future from bad software.


Hey, welcome to the show. Today I’ve got a very special guest, Adam Smith. I’ve been working with him for probably the last five or six years. He’s a global superstar when it comes to AI. He’s on the ISO committee, and also does some work with the European Commission. He’s also the CTO of Dragonfly. So we’re going to have to find out a little bit more about what is Dragonfly?


Hi Jonathon. Thank you for having me on your show. Loves the intro, by the way.


My pleasure. It feels like we’re going to need to be sent back in time to get rid of Coronavirus at this rate, so.


That’s right. So yeah, thanks for introducing me. So yeah, I’m Chief Technology Officer of Dragonfly.

So Dragonfly grew up as a software testing advisory and delivery company. And over the last four or five years, we’ve moved increasingly into artificial intelligence as well as people solutions, finding the best engineers to solve the problem.

So my background is quite varied, as you can imagine. And as you mentioned, I’m also involved in various international committees on the topic of AI, particularly with the focus on testing and quality, and also do a lot of work with the British Computer Society and their special interest group in software testing.


And you were saying you do, with Dragonfly, you’re doing AI solutions. So you must be, you’ve got both sides of the view then. The building, the standards, and the guidelines on ethics and how to successfully deploy and test an AI platform, as well as actually implementing one.

So what kind of started that journey?


That’s a great question. I guess we started developing a product that used machine learning in order to help us to make the right decisions on projects from day today. So who should fix this problem? Which tests should we run next? And we got really interested, being testing specialists, in how to test this.

And when we started looking at this, I realized that this was actually one of the biggest problems with the artificial intelligence space, the probabilistic nature of the technology making it difficult to prove whether results were correct or not. And this is a common theme if you talk to people who are integrating AI into InterSystems, is that the two biggest concerns are one, talking to customers and getting them to make decisions about what they need, and two, validating the solution.

So that’s kind of how we ended up getting involved in it. Now we are very involved in it. Lots of committee meetings, lots of work on standardization and lots of clients who have real questions, real problems around how to implement QA in an AI context.


And I guess, you just mentioned briefly on there, about the kind of the cognitive bias. I know you’ve, I’ve been to see one of your speeches before. Could you tell the audience a little bit about what you found in cognitive biases, some of the main themes?


Sure. I guess cognitive biases are one type of bias. They’re essentially biases that are present in humans, and they are performance shortcuts, essentially. Without gathering all the facts and making a decision based on all the information, shortcuts are taken for essentially performance reasons.

And when these cognitive biases start to impact system development, whether it is a case of someone making a decision that isn’t relevant for all users of the system, or records the data we processed, that bias can get manifested in that system forevermore. And that’s compounded by issues relating to data and statistical bias, where for various reasons which might be historical or might be relating to cognitive biases, data sets that are used to train AI systems don’t reflect reality. What that means is those systems then propagate these biases into future processing.

And there’s also another type of bias, which is another form of cognitive bias, which is all about how people react to interacting with systems. So a really good example of this is automation bias, where you’ve got a self-driving car and you may become overly complacent about the ability of the car to, say, avoid pedestrians, and then you may hit a pedestrian. So there’s been a number of well-documented self-driving car accidents where this is exactly what’s happened.

And in a more menial example, this is quite common in systems where things get prepopulated or made slightly easier for a human to do, and then they’re simply approving a result. They very quickly become complacent.

And from a testing point of view, this sort of thing’s really important because you have to understand not only that there may be biases in the system design in the data, but also in actually how it’s used.

So really interesting topic with frankly hundreds and hundreds of different ways that its, that is bias, can manifest in systems.


Absolutely. And I noticed you posted recently around doing BA’s autonomous wheelchairs at JFK. I thought that was quite fascinating in the sense of you combining what is, in essence, Car2X, and people are working, being, establishing autonomous vehicles to kind of smart buildings. So the smart airports probably looking at how they navigate around.

I remember, being, when I was in Silicon Valley, seeing the, I don’t know if you’ve seen the security guards, the little robots that walk around there. And there was this one incident which you might’ve seen in the press, where some kids ran up to it and so, therefore, it had a choice of going left or right. Decided to go right, and then fell into a fountain and drowned, which is a sad story.

But at the same time, it’s interesting, because we always look at the obvious ones, which is a car, autonomous cars driving along. On the left-hand side, there’s a, someone on a motorbike not wearing a helmet, which is irresponsible. On the right-hand side there’s a car with a “Baby on board sticker”, an SUV, high-end cap, which way does it choose now?

Obviously people always, you put those kinds of questions in, and we know based on Tesla’s autopilot, it’s not that advance. It’s not taking that kind of information and doing anything with it, it makes a decision based on the numbers it’s got.

So are you finding when you’re teaching your AI, a large amount of data sets that you need, with all those possibilities, is the biggest problem?


It is, but it’s also anticipating the risks, much like testing any kind of system, right? You’re trying to establish all the different things that can go wrong, and trying to come up with examples that prove that it doesn’t go wrong in that example,

With AI systems, particularly ones that actuate in the real world, there’s a lot of thinking you have to do around all the things that could happen. Whether it is complex ethical scenarios like the one you’re alluding there, the trolley problem, or whether it is more menial. What will happen if these two things happen at once, typical examples? But all of these things need to be identified with a critical thinking mindset before you can really set out a testing strategy.

And there are parts of this that are common with existing technology. You can look at some kind of system that has lots of hardware interfaces and say, “Well that’s processing real-world data. It’s just as complex.” But once you combine the amount of data that’s coming in with the speed and the volume of decision making that can occur, and the amount of feedback into the real world, it becomes a really complicated test environment planning problem, in away.


It’s amazing because this, I know you’ve worked and you’ve kind of managed global automation projects for large investment banks with thousands of projects, and test environments are always one of those complex things. Data’s always a difficult one. We had a Hugh Price on the show quite recently, and he was saying about even production only houses a very small subset of all possibilities and journeys through the systems. So being able to model out those ones without any cognitive bias and train your system, every … I notice you’d done a lot of work with Rex Black on the new A4Q AI and software-


That’s right.


Testing course. Are there any tips for listeners, I mean how they could potentially get started and learn more about the importance of AI in testing?


Yeah, absolutely. I mean a really good way if you are if you have kind of a technical lens on the world, is to do a $10 10 Euro introduction to machine learning star course. Because those courses will inevitably cover some of the quality problems. They’ll explain them in a different way to a qualified specialist would explain them, but from a statistical point of view, they’ll really help build that understanding. And that’s what QA specialists need.

It’s first of all, an understanding of all the different risks that can occur at different stages of the life cycle, and second is a list of tools and techniques they can use to mitigate or prove the absence of those particular quality problems.

Now the first one is those, as I say, you can do a bit of machine learning self-study, but also you can read quite a lot about this in the press in terms of things like facial recognition accuracy or ethical issues. And these issues are primarily fairly headlined grabbing things that are in the press, but you can quickly boil them down into more menial examples that you’d be likely to come across in testing the enterprise testing constructs.

The second piece of this, which is one of the tools and techniques available, there are some specific techniques out there that are designed for AI. So metamorphic testing springs to mind. There are also techniques that are borrowed, if you like, from the medical side of the technical world such as expert panels, ways of assessing whether a system has given the correct output when it isn’t clear what the correct output would actually be.

Within the actual model training and testing process, there’s a huge range of different statistical concepts that are useful to understand. They, again, they do require a bit of technical understanding, but those are really, really important because, in a probabilistic system, you cannot usually report a simple pass or fail on some of the core functionality. You have to report in terms of degrees of competence, in terms of false-positive positives, false negatives, the area under the curve, lots of different metrics that you wouldn’t be familiar with as a testing specialist.


Yeah. It sounds like for those people who want to dip their toe in, there are some good recommendations there and also worth I would look on the website as well because I think that’s a great course. That’s using a Jupyter Notebook to do the kind of the AI stuff. Have you found that there’s any kind of other books or resources that maybe help people get to grips with Jupyter?


Oh, I think Jupyter’s pretty intuitive, usually. It misbehaves sometimes, but I find it really useful in a training context because you can step by line by line, run each line one at a time, and you can actually if you want to have a go on it, it’s actually an online service. If you do Google Jupyter Notebook, I think it might be called Saturn, which allows you to spin up Jupyter Notebooks without actually installing any software. Which if you want to try out a bit of Python with some machine learning tools, it is the perfect platform to do it.


Yeah, absolutely. And I also would recommend Kaggle as well. The amount that I learned from just doing the Titanic example was huge because, as you said, you’re not going to get a direct answer out. You’re going to get, you’re going to build your confidence up the more and more information, and the further you get … I know some guys over there, I’ve got huge scores where they’re able to clarify which people die and which people didn’t die on the Titanic.

Fair enough, that’s probably not as a useful real-world example, but could you tell us some of the kind of real-world examples that you’re doing with your AI? It’s NERO, your AI, isn’t it?


It is, yes. But first actually on the Titanic example, there’s a great example of bias built into that, because if you took the survival data of the Titanic and tried to predict the survival rates on Louisiana, which I think sank two or three years later, you would predict people to survive very correctly. And one of the reasons is, people stopped, or people think people stopped, putting women and children first on the lifeboats. Whereas in the Titanic example, that was very much what they did.

So there’s a great example there of bias built into those two ships which show how by taking data from one particular event or scenario and then trying to use it to predict a different event or scenario, you can end up with different results based on immutable characteristics of people, like their gender or their social class.

But just to, you mentioned NERO there, just to talk more about NERO. So NERO really is about saving time for middle managers on large IT projects. It’s really about taking away the onerous information gathering from multiple systems, the processing of that data, identifying the outliers, identifying the so what, and trying to automate common tasks like deciding what to fix next, deciding which tests to run, and things like that. And as well as having taught us a lot about AI and allowed us to really become the UK leader in terms of AI and software quality, it’s a fantastic tool that we’re enjoying rolling out to our clients at the moment.

But when people realize how they can save that hour off their morning by not having to do menial things like a work assignment or a large project, their eyes light up realizing that AI can actually save time at the management layer, not just in Amazon warehouses, stuffing shelves and things like that.


And that’s one of the things I was really impressed when I first got a bit of a demo. I think Dan gave me a talk about NERO and some of the capabilities that he did, and some of the stats that you managed to achieve as far as your accuracy around assigning defects, predicting who to assign the defects to, and also being able to kind of use different data sources, whether it’s [inaudible 00:16:09] or ALLAB, as this wealth of structured data as well as a combination of odd structured data.

Do you find that it takes a bit of time for it to get up and the accuracy to start ramping up once you’ve started taking different data sources from different clients size?


Not unless it’s a new company. Per there’s usually we have some history. Normally a company has done a project before and they’ve already got a wealth of structured data that can provide that initial baseline for trading. It’s not like when you sign up for a Facebook account and Facebook doesn’t really know anything about you until you fill out your profile and start clicking on things. We can profile the team and the organization immediately based on the last thing they did. That really is the difference between consumer and enterprise machine learning.


Yeah. I think enterprise AI is a really interesting one, because the fact, the data visualization combined with, like you mentioning that the kind of the decision support automation and also analysis, that kind of capabilities give it a kind of a single pane of view so you know the truth based on real data as it’s happening. So you’ve not got a whole stack of, say, PMOs that are harvesting this data and as soon as they get it, it’s instantly out of date.

And also, what I class as a proxy, where they’re just passing the information across. They’re not actually adding any value on top of it. A system like this would save so much time within an organization.




So do you think you’re going to expand it past project management as well?


Yeah, absolutely. We do intend to. We started off in the testing and defect space, and now we’ve expanded to a full turn of SDLC. So we’ve got project management, cost management. We’re integrating with things like [inaudible 00:18:05] and Jenkins now to give more of a dev-ops view.

But our next stage is going to be expanding it to other sectors. So, you can obviously take the paradigm of building software and apply it to lots of different project constructs. So we can apply the same thing too, say, the construction industry. We can also apply it to non-change constructs. And so we’ve got some clients we’re talking to about building more, sort of operational monitoring or business information functionality that gives them insights into whatever their business is, rather than necessarily their change projects.


It’s really fascinating. And I know you do, you split your time between the kind of all these additional kinds of communities of practice, and I know you kind of, you’re key to some of the new standards that are coming through with the ISO.

What recommendations would you have to people to go out and get information, or at least understand what those standards, how they may potentially impact them, and how to kind of reach out to maybe you guys and help them with some of these challenges around testing and also AI?


Right. So I mean I guess first of all, in terms of standards, so I think standards are incredibly important. You hear a lot of people in the community who are not that impressed or don’t rate ISTQB, et cetera, very highly, which is fine, that’s their view. But my view is that everybody needs a common language. It’s crucial to productivity that people are able to move from one project to another and be able to use the same words to describe the same things and the same methods at a level. If everyone meant a different thing about performance testing, it would take a lot longer to get things done on an on change projects.

Now one of the ways that people learn these things is through training courses, and the way training courses develop their content is usually based on best practice and an international consensus on things like standards.

So I think these things are very important. I think it’s very important that there is a wide range of people involved in them. People that may detract from current standards in quality and testing, my ask to them is to come to contribute to help us improve standards, and to build out the next level of standards for the next set of technical problems that we’re going to be faced with.

So some of the standards that I refer back to with great regularity are SQuaRE. So I don’t know if you’re familiar with ISO/IEC 25000 series, which is all about language for specifying quality and quality requirements, and this doesn’t currently cater for AI, which it needs to.

So for example, a lot of people talk about AI robustness, and people mean slightly different things by robustness, depending on the angle they’re coming at it from. If you talk to a traditional software tester, they’re more likely to think about that being resilient over time. Whereas in an AI world, whilst it still means that, it also very specifically means things around specific types of data that can cause models to react badly, adversarial attacks and things like that.

So we all need to talk the same language. The only way we can get there is but through consensus, and the mechanism that exists in the industry to do that is international standards.

So I’ve been involved in, from an AI and from a quality and testing perspective, in that community. One of the things that I’m quite keen on is, as I implied there, is improving the ISO 25000 series SQuaRE so that it caters better for AI. There is a group working on a technical report, which is an extension of the other standard I look at a lot, ISO/IEC IEEE 29119, which is a set of standards that covers software testing.

Now the technical report that’s coming out is I believe part 11 of that standard, and there’s an extension upon it to cover AI. And they’ll probably be a future international standard, which is specifically about how you can form with requirements to verify AI quality.

In terms of how people can get involved in it, I mean this kind of thing is generally done through your national body. So in the UK, we have the British Standards Institute who decides who inputs to that process. But there are other ways to get involved, such as through a professional association. So as you know, Jonathon, we’re both on the committee for British Computer Society Special Interest Group In Software Testing. So we have the means to review new standards, input into standards development through that forum.

So for anyone that’s a member of that special interest group, they’re able to reach out to us and participate. Anyone that isn’t a member either needs to approach their national standardization body, or find a similar organization that represents their profession that has input.


I think that’s great advice. And I know you’re pretty much, you’ve revitalized the BCS, so you’ve kind of come back with a kind of fresh [inaudible 00:23:28] approach. You’ve really revitalized, it’s got the first event that we’re in this month, which is going to be a virtual event. How excited are you about the new lineup, the new format and may be new chapter for the BCS SIGIST?


So when I joined the testing industry, which was quite a long time ago, one of the first events I went to was a SIGIST event. And it really struck me as being fundamentally different from other events, in that it wasn’t full of people trying to sell me things. It didn’t have a commercial feel. It had a feel of experts sharing knowledge. And a lot of the testing conferences I go to don’t have that feel. There are some excellent exceptions, but I think there is a huge space for the non-commercial associations for this profession. And that’s not just so that we can run events, that’s so that we can do things like participating in the development of university syllabuses, give people like apprentices who are coming up through the modern apprenticeship scheme but probably haven’t yet developed beyond the initial understanding of testing, give them a way that they can get more involved in the community.

It’s also about including more people in the events. Now we had a conversation about doing an online-only event specifically to make sure that people have accessibility issues or communities that we weren’t getting to include in the event style, where I’m able to participate. Now we’ve been overtaken by events slightly, and I’m now at all events or online, but that’s the sort of thing that we’re able to do as a, not for profit that is much more difficult for a commercial organization to do.

And finally, as well as participating in standards, I really want to develop the links with academia and Ph.D. level study around the software testing topic. There are several universities in the United Kingdom that do have a specialization in testing that I really want us to start working with as a professional body.

So I guess that’s why I’ve gotten involved in SIGIST, again, if you like, is to drive some of these things that are not necessarily commercial in nature and to provide that not for profit forum for the profession.

In terms of revitalizing it, I think the team that had been running it and doing a great job of it for many years, but I think some fresh energy was needed, particularly given the style of events has moved on a little bit over recent years.

So we’ve got a whole bunch of people including yourself, Jonathon, who’ve joined with similar views to me. People who want to focus on things like inclusion and accessibility, people who want to develop the industry, or the profession I suppose, and its profile, and people who want to modernize things and get something back.


It’s absolutely excellent work. I’d recommend anybody who’s not checked out the BCS website for a while, go in type in BCS SIGRIST. You can apply easily. It’s actually quite a low cost. We’re trying to get inclusion for, also, for kind of university students. I know you work quite closely with universities because you’re in Spain at the moment. Your, one of your R&D locations is tied quite closely to the university, isn’t it?


Yeah, here in Barcelona we work with Barcelona Tech. In the UK we also work with a number of other universities. Our focus here in Spain is more about data science, whereas our focus in the UK is more on the quality and testing side.


I think it’s a great kind of push towards the kind of really changing the way that we, the new generation comes through and it doesn’t just fall into quality and testing. Actually has a curriculum, has the support.

How did you start, did you, was this something you did at university, or now how did you get into testing?


Oh, everyone’s got a story, right, behind that question I think. So I started, actually, I was working at a large financial company doing complex financial administration, and they needed somebody to go and do a UAT of a particular area that I specialized in. And I ended up shipping to the other end of the country and living in a hotel for the duration of UAT. And as we all know, things didn’t, things don’t go to plan. So ended up being a year in, it was actually the Norwich Airport hotel, which was exactly the same time as Alan Partridge was getting famous. So I got quite a lot of jokes from my mates.

But at the end of that long UAT phase, the IT department said “You seem pretty good at breaking stuff. Do you want to come and do this full time?” And that’s really when I started taking IT seriously as a career. I’d been coding since I was a kid, but hadn’t really considered the profession as what I wanted to do until this point.

And that, I think what made the difference for me as I realized that IT is not just about coding. There’s a lot more disciplines, a lot more things involved in that, and actually that’s managing the complexity and the ambiguity is actually super interesting.

From then on I studied various open university courses to get myself up to speed in various topics, and 20 years later, consider myself a pretty competent tester and developer.


I can definitely vouch for that, and for the Piccadilly Group. There’s some incredible talent that you guys have got there. And just before we kind of ramp down, is there any chance you could kind of share the best ways to get in touch with you, and also how to kind of find out more about things like NERO and what, and the Piccadilly side of things as well?


Sorry, I lost you there. The best way to get in touch with you?


Yeah, the best to get in touch with you and also for the listeners to find out more about what you’ve talked about today and also what you do, doing with NERO.


Yeah. Great. So the best way to get in touch with me is probably Twitter. So on Twitter, I’m adamleonsmith, or you can email me at Our website, will tell you about some of the things that we’re doing. It will tell you about NERO. It will also take you to the Piccadilly Group website, which is where we advertise our training, and we are actually doing an AI and software testing course in about three weeks that we’re going to do fully online. And we’re actually just doing it two hours a day, so 8 till 10 in the morning, over a series of sessions so that people can fit it in a bit easier around their job. And of course, I’m on LinkedIn as well.


Awesome. Well, even I might actually sign up for that course as well. So thanks so much, Adam, and it’s been an absolute pleasure. And you’re doing some amazing work, not only pushing the technology forwards in the AI space, but also for the community. So thanks so much again for being on the show.


No problem. Thanks for having me, Jonathon.