Photo of Huw Price

Your Data Quality Sucks (with Huw Price from Curiosity Software)

Related Links:

Related articles and podcasts:

We’re trying out transcribing our podcasts using a software program. Please forgive any typos as the bot isn’t correct 100% of the time.

Audio Transcription:

Intro

In the digital reality, evolution over revolution prevails. The QA approaches and techniques that worked yesterday will fail you tomorrow. So free your mind. The automation cyborg has been sent back in time. Ted Speaker Jonathon Wright’s mission is to help you save the future from bad software. 

Jonathon Wright

This podcast is brought to you by Eggplant, Eggplant help businesses to test, monitor and analyze their end to end customer experience and continuously improve their business outcomes. Hey, and welcome to the show! How’s it going?

Huw Price

Hey, Jonathon, how are you man? 

Jonathon Wright

Yeah, good, good. 

Huw Price

I mean everyone’s obviously hiding at home at the moment or some variation thereof. And I’m no exception. So time to talk so well I guess. 

Jonathon Wright

This is the perfect opportunity for practicing what we preach about collaboration and the ability to work remotely. So what have you been up to? It’s been a while. 

Huw Price

Yeah, I guess it’s sort of more of the same and quite different at the same time. You know, we’re still there in terms of trying to get CI/CD DevOps more efficient. We’re still trying to kind of improve the requirements documents. We’re still trying to get people to work kind of in parallel and not sort of, sequentially and we’re trying to kind of lop chunks off manual processes just as we did before. So I guess that’s the kind of old-world the same problems. I suppose the new world is that things have changed big time. I would say over the last couple of years after my sort of six months off with the sort of advent of containerization, DOKA is everywhere. You know, it’s completely changing the way that people think about developing the ability to do things in parallel. In your space, in terms of performance testing spitting up training environments, having quite a complex matrix of different APIs and different versions, and then trying to work out some way of being able to configure up your testing environments to test the compatibility across things. You know, it’s become really much more complex, even though a lot of software is just so cool and so easy to use and really, really funky. Unless you have a good structure sitting on top of it, I think it can very quickly get away from you so. So that’s what I’ve been up to, I guess. 

Jonathon Wright

Yeah, I know. It’s just been that kind of catastrophic failures, again which we keep on seeing in the news. And I guess looking at the current health crisis and a company like Cerner or some kind of healthcare platform, there’s just so much the information there. So I’m sure at the moment it’s probably at maximum capacity with GP, sending codes and maybe there’s some kind of a big data lake that a generation out that to see some trend information to help you predict where they’re going to where, where the virus is going. You know, do you see people at that level or do you still see people with spreadsheets and CSV files and try to load data in via overnight ETL  batches 

Huw Price

I think nothing has really…Essentially, the last thing that really impressed me was the Age of Empires, version 1 or 2. You know, I think there’s just still you know, we moved into the RPA space for a while, actually was just trying to sort of automating people’s processes. And the big things people actually didn’t really understand their own processes or this idea of kind of data flowing. And where the single point of truth is. I would say that you know, you and I used to talk a lot about we’d sort of talk about Data Lake or some sort of single sort of truce. But. But it really comes down to kind of a dependency map. So if I change something here, what’s the effect somewhere else? I mean, that’s really all of these problems that we see in the computer systems are based on that. You don’t know what’s going to happen when you change something, whereas we have all the tech to do it with all of the various from Fiddler to Splunk to all of them ALM Octane, all these things, they’re all great. But people don’t really seem to have got this idea of joining them up in and in a sort of coherent way so that when something happens when you check something and you have your code analyzers, but then you might well track some of the API calls and the REST calls and try and build up a try and build up a usage map. So that then when it comes to actually make a single line of code change or a test failing, you should be able to track that all the way across through not just a single API calling another API, but actually another system may be far away. 

And you know, if you talk about something like healthcare, in healthcare there are so many systems communicating backward and forwards, it’s almost impossible really to track your way across. So what if I just changed this code or this configuration file? What am I gonna break somewhere else? You know, so I still think that’s where the biggest problem problems lie and that’s where all the opportunities should come. But it is gonna take a little bit of people sort of really rolling their sleeves up and trying to put some decent tech in and trying to have had a goal which actually senior management buys into. You know you find some valiant teams trying to do some good stuff. You know you look at sort of Richard Jordan as I always put up as a good example, a nationwide. That’s great. 

But I think management really now needs to think about how do we glue this together? How do we know what’s going on at any one time? Not just a bunch of dashboards. Oh, we’ve done this check-in or we’ve done this. So we’ve hit these agile sprint points. It’s all a bit irrelevant. I think it’s more in terms of the impact of change, knowing potentially what’s going to happen before you do it, as opposed to it’s still very much subjective finger in the air. “What happened last time” kind of concept, you know? 

Jonathon Wright

Yeah, it’s something we keep on seeing. You know, I remember having a conversation with Jerome, the CTO of MicroFocus, and he was kind of saying, no, we can all these if we got 400 products and we’re trying to glue them all together, then put some underlining analytics like Vertica on top of there. And we get stuck because they’ve all got different metadata as far as a schema registry. I remember when we were trying to get this kind of adaptive data store. Avro schema registry to build to understand the mappings between the kind of what each data set meant within each system. And at the end of the month, I was supposed to be going to South Africa to talk about business process management, which I know is something we’ve talked about a lot around getting visual models. So things like Expedia or business process modeling notation so that the actual organization could understand the complexity. And I think with things like GraphDB coming in and things like Neo4J  You know, you started to see this kind of these models starting to appear. You know, what kind of things are you seeing as far as gluing the bits together? 

Huw Price

Yeah, spot on. I mean you mentioned a couple of products. I mean, where we’ve actually kind of used those fundamentally in our new tools, and they are just amazing. So I think there is some definite scope where you could sort of try and make it a bit easier for humans to make some impact assessments. And this really is, even if it’s just chatbots, even if it’s just—I’m not going to say AI because it’s just such a maligned term—but some kind of machine learning in terms of what’s happened before. You know, with the provenance of the information that’s in these tracking databases like GraphDB, etc. So I think that’s fine. 

But I think the core thing that’s changed with us is really the test data, all the data. You know, we’re doing work with a whole bunch of companies at the moment. And it’s quite interesting because we’ve created really powerful synthetic data generation engines and we attach a modeling engine to it. The best place to do is just talk to the users, kind of almost forget about I.T., just talk to the users who are trying to maybe implement a new system, maybe trying to implement a change. And I talk to them in business terminology. 

Now, the business terminology is really this, as you say, the business process models overlaid with business rules. People forget that you know, the business process model is a picture. And a picture has actually complexity behind it, which needs to be laid in there. But once you have a chat with me, say, well, okay, we’re starting to understand your business problems. So let’s start thinking about how we can provide environments for you to start almost exploring. You know, it’s sort of test-driven development or test-driven understanding, whatever you want to call it. So by putting together the idea of kind of the logical model, which is the business world, as I say, with the technical model, which is you know, it’s pushing data into an H.R. interface file for Oracle e-business or something like that or it’s a complex set of payroll configurations, something along those lines—building the widget to attach them now is much, much easier. Before, that used to be you need to create an SSI as ETL product and need it to plum it in. Now, we can do that in an hour, a couple of hours, maybe four hours for more complex stuff. And that then gives us that attack engine, something to get this stuff in. And then you go off and you talk to the users and you say, well, let’s start building different personas, different types of complexity in there. And the first thing that happens is you’ve moved user acceptance testing almost right upfront. 

So by being able to sort of hammer data in and also understand the data as it appears in the system, and they’ll say, well, actually, that doesn’t make sense because the reconciliation must come just before the approval—or whatever the particular business rules you go about, you might play with some time series and do that—you’re actually increasing your understanding of the system, but you’re doing it in a very structured way. You’re creating a common understanding of the data objects or business objects, business data objects, whatever you want to call them. And you’re also building a kind of a logical model of the business itself. 

Now, this kind of combo is actually really useful for everyone else in the I.T. team because it actually sorts of allows you to think about, “Right. So now I’m the guy that’s got to load the data from the old system into the new system.” All of a sudden, you’ve got a solid target system. 

You’ve also got pretty much all of the different technical data delivery capability because it’s already been used as part of the kind of slightly upside down development process. And then all they’ve got to do is overlay the old data onto the new data. And bingo, that part of the project has now gone much quicker, so sort of by doing things slightly backward in terms of what was traditionally done. That’s actually a really cool way of being able to work with users in a much more efficient way, getting them involved more early. And that, you could say, is user satisfaction, in theory. I think it’s somewhere in the agile manifesto. So I think those types of things are a big change. 

And I would also say that with the advent of containerization, it really has blown up the way that people operate. This whole idea of Dev, maybe QA1, QA2, QA3, you might have Dev QA 1-10,000, all of which are testing sort of APIs or combinations of APIs or combinations of data or whatever it takes in terms of that configuration management. 

And it’s the management of the manifest of that. It’s a classic manifest of what the combinations are of particular pieces of software and how they tie together so that you could do so many different things in parallel. And this is driving us massively into automation, but also kind of an RPA controlling automation to be able to tie this all this stuff together in a much more structured way. You know, I think that really is the opportunity. And some companies we’re talking to are actually doing this really well. And the thing that starts with them is ambition. They’ve spotted it and they’re ambitiously going after it. And that’s fantastic. It’s nice to hear. Previously it was always, “Well we’ll see how it goes” or “we’re going to change a few processes here and there” or “we’re gonna get this tool in”. And now they sort of some of the better companies are just going, “wow, this is a whole new world. We need to completely rethink the way we’re going to do development”. And it’s not just structuring teams, it’s structuring the technology and also the kind of connectivity between the different tools that you’re using, you know. So I think that really is a kind of exciting stuff. You’re sort of seeing the same types of things. How ambitious are you seeing companies being out there, Jonathon? 

Jonathon Wright

Yeah, I completely agree. And I know we used to joke about things like data archeology and this kind of data mining obviously is an incredibly powerful thing. You know, people like disco used to work with and integrate with incredibly powerful to be able to mine that, and especially if you try to move to some kind of target architecture. 

Jonathon Wright

I did see a presentation last week, though, and I guess one of the architects or solution architects said, “We shouldn’t always use the latest technology as the default pattern”. And I think one of my concerns, what I’m seeing with this kind of “low code movement” is—the big example I’m going to give you is Disney. And I know you guys, you’ve done stuff with them in the past. You know: Disney Plus goes live. Day one, the system goes down, right? Me and you, we both know Disney doesn’t have a data set, right. They don’t have a cloud. They’re not AWS, right? They’re not Amazon. They don’t have the underlining infrastructure. So it’s somebody else’s infrastructure that went down. But they must have had teams doing stuff really well doing the data builder stuff the modeling the data understanding there’s going to be different people from geo-locations. There’s going to be different people consuming that service and testing so many possibilities. But from day one, it goes down. Everyone just kind of goes, Okay. You know, it wasn’t fit for purpose”. And the brand damage, this is just so big. And I think this idea, what you’re saying about shift up. Move it back to the business, understand the domain-specific language, understand the context-driven side of things so you understand what are you trying to achieve from a business perspective? And then make sure you’ve got the right underlying technology to do the job. You know, that’s kind of where I I think it’d be great to see organizations really start understanding the data. And that data quality really is not something you’re helping your customers with. 

Huw Price

Yeah, absolutely. I think it is it is a bit of a mind shift. People tend to speak in a mixture of data and transform or a business process. So it’s about scaling up in a particular area. Someone arrives—there’s a really good consultant that comes in and it’s going to take them a month to six weeks to even get close to what people are talking about or how they can actually even join in to be sensible in terms of the discussions that are going on. Whereas if you say, “Well, let’s just break things down a little bit,” let’s say, “Well, what is actual what is a data? And what is a data state?” So let’s just break up our requirements. You know, just be a little bit more organized create glossaries of terms. If you have a term that refers to another term, just put a link underneath it. Then you say, “Well, what’s this? It’s almost like nouns and verbs, isn’t it?” So, okay, we’re going to open an account. Fine. It’s obvious you have an account. You have an open. But actually, if you look at a lot of the language that people use, they do tend to blur things and mix things in together. So it’s a lot easier then to understand if you can actually break stuff up and to break stuff up into these more structured components, which of course, makes life easier for the technician to come in, because all they have to do is say, “Well, what is one of those? OK. Well, that is fine. That is a set of data sitting in that small structure. And it lives in seven different databases, but it’s the same. What’s the definition of the X-File structure? Okay, fine. It’s this XSD which is managed by an XBRL structure. Great. Now I understand that. And then so that.”

Huw Price

So now you know where it is. You know what controls the definition of it. And then you say, “Okay, what can I do with it? Well, I could transform it”. What does transform mean? What does that mean? “Oh, well, I’m gonna go in and I’m going to I’m gonna close it”. Okay. Well, what does that mean? “So we’re going to apply transform, which is gonna do the following piece of activities to it”. Great. So now you understand specifically what’s happening to it, but then you say, well, we should be able to let people know what’s happening and what’s going to invoke that. And you know, the upstream and the downstream. 

Huw Price

So you start breaking things down into smaller chunks. But of course, you can build up chunks into—I might call it a person. A person actually contains maybe 50 to 100 smaller components. So I think that’s sort of classic data modeling type. But actually, what we’ve what we found is I was always a big advocate of having two or three really good people with some good tools can really, really do some powerful work. You know, we’ve seen some companies with six hundred people creating test data and we’ve seen some companies with two to three. In fact, Disney was a really good example. They had about four people and they were just awesome (you know, I think it was Accenture, actually. Jessica Totin was kind of running that team) and they actually were just incredibly efficient in terms of the way that they could sort of drive out that. The reason they did it, they were quite disciplined. They basically said, “If someone came to us and they don’t tell us what a data object is and the type of transform, then it’s impossible for us to do our job, so go away.”. 

Huw Price

So suddenly this sort of small circle of competence actually caused a sort of whirlpool of other people is having to be more structured around them. And I told that story 10 years ago. And I would say it’s probably even more true now because people are moving more into the idea of data being moved around between almost consumable objects. Now, if you think about it, whether that’s a set of APIs to load data into transforming, and companies can go out—as you well know, you can just go into Google and say, well, I need an AI engine and let’s find the cheapest one and I need to pass the data into and out of it, you know. So I think the structure is changing. And I agree with you about not going flat out for the new tech. But on the other hand, some of the new techs are so easy to use and it is just much easier and much faster, more structured to be able to push this stuff into a faster world. But without that ability to understand what the links are between everything, then it can become quickly become chaotic. And I suppose that’s better. I’m trying to keep a wary eye out for. We’d have just caused more chaos, as you know, with new toys, so to speak. 

Jonathon Wright

And yeah, I think that’s the challenge now is if it is so easy to consume. And you spin something up in you know, in an environment like OpenStack and you’re able to be so flexible with the technology. And then what you’ve gotta understand, is really the contract testing. So if you doing something like Simple REST API that then connects to Kafka, you’ve then got to understand the nuances of consumers and producers how the data going to be managed and like you said, how it’s going to be transformed. That they may end up in a GraphDB or something downstream. 

Jonathon Wright

So it’s kind of understanding where to start, where the data is a bit like a data lifecycle, really. And I know someone jokingly said the other day, there’s this new role of test data engineer in test. The bad acronym. But I understand the value of having a data engineer and definitely a test data engineer because there is so much of that, in combinations—the negative scenarios— you know, I heard someone say the other day, “Unhappy Path”, which is the most insane thing I’ve heard for a long time. But there’s also historical data. You know, having a tool that could automate test data is gonna have an essential area. And my understanding is you’ve just nailed that with a new product that you brought to market. 

Huw Price

Yeah, we sort of moved it now into what we’re calling test data automation. I think that’s the sort of new term we’re using rather than just test data management. We don’t really care how you got the data in. You know, in the old tools days, we just used to push stuff in through the database and it seems quite hard work with all of the relationships and triggers and stuff like that. But actually, if you’ve got an API which gets it in—and if you don’t have an API, you probably should have an API—then it may well be easier just to string together a set of APIs and we automate it out and we parameterize it up. And then we put some modeling on the front of it and say, “Every time we make a request for some data, we’re going to create you the standard 15 personas. And we’re also going to create you one broken set of contracts within there to make sure your validation works”. And that’s standard, you know. 

Huw Price

So this sort of ability to break it down, push the data, and using the new track is good. And then you use this sort of slightly new way of thinking in terms of how you string it all together. 

Jonathon Wright

And yeah, I think this is one of the things that’s been massively missing because if you think about it everything needing data, every single part of the pipeline. You know, one of the big challenges for performance is obviously you’re going to burn data. So being able to get data on the fly and maybe reduce some of the preparation time by subsetting and masking is and then you get a crawler or something to get the data that you need. You know, those kinds of tools are going to be incredibly powerful. So things like we see SAP going end-of-life and migrations and the complexity around these packaged applications. Are you seeing that kind of shift where people are trying to understand their ETLs better. 

Huw Price

Big time. Big time. We used to last. I’ve sort of got off point slightly, but we used to sort of just have these sort of test data engineers where we’re actually going after people who worked on business intelligence. Because if they are someone who’s very familiar with business intelligence, they know kind of what the end-user wants. They understand the complexity of the relationships. So picking up someone who’s maybe done a couple of years’ business intelligence and say, “Well, tell you what, your job now is completely the reverse of what you’ve just been doing. It’s to actually push the data into the systems so that you can think about how you know, how hard it was for you originally to be able to test that system. We’re now going to convert you to the opposite of that. You’re the person that’s going to provide the data for the BI people to be able to test the system works correctly, whether that’s big data, whether it’s a Kafka transform”. 

So it’s that sort of curiosity of how things get pushed into the system and what the transforms are. It’s actually quite fun and it’s incredibly productive. f you look at, say, most organizations, they probably have documentation on maybe 5 percent of their data because they’ve probably acquired lots of legacy data as they go through. But of course, if you want to accelerate against the new businesses—all these great sorts of banks that are appearing, if you really want to move or that you’ve got to understand the old world and also understand the new world. 

And we live in a competitive world. And if you’re not out there, then you’re going to suddenly gonna be the second in line because you had 2 million customers and suddenly you’re third and fourth and fifth. So for big companies, it’s critical that they managed to shine a light on that on their 90 percent of the data because they’re not going to understand all of it. But they need to go from 5 to 10 percent and the extra 5 percent is the bet that’s gonna be important for their new development. 

And that includes cleaning up some of the old data or even just making some tough decisions in terms of “we’re going to categorize it up”. So do they have a good credit history? There’s so much information. We haven’t got a clue in that. We’re just gonna have to make some broad rule of thumbs. And we’re just going to say we’re going to give them a rough rating because we just don’t have time to be able to go back and go through it. But we need to get this system out the door. We know we’re gonna make some commercial mistakes there. But on the other hand, the commercial imperative of actually getting the product to market is more important. 

So the business and the tech really do need to cooperate in spaces like this. You know, there are people that come to work. They do their job in I.T. They write a program, they go home. You know, I just don’t think it’s like that anymore. Most companies are now I.T. companies and they are competing flat out head to head against other companies.

So I think that’s this whole idea of being clever —you know, business decisions, picking off the right bits of the system that you need to exploit. Get the experts and to understand a bit more, come back up for air. Make a decision. Go back down. Do that a couple of times and then go for it. I think the data is really key in terms of understanding where you are and you know, what’s going on with your current system and also when making decisions about how we’re going to get that data into the new API driven Internet of Things, whatever the hell it is a piece of technology which is going to drive our chatbot, you know. All of those things are business/tech/business/tech systems.

So you can’t just sit there and look at the business user and say, I don’t know. You need to go and explore it. You need to get stuck in and you need to use some of these cool tools to be able to explore test-driven development, bank the data, and get in. Look to make sure these reports are working correctly. Make sure you can detect the patterns in the data. If you call big data is great, but how do you test it? You know the only way to do that is to create the really, really subtle patterns in the big data yourself to make sure you can detect them. If you can’t detect them, then your business is blind. 

Jonathon Wright

And I think that that the amount of times that we’ve probably been in to see customers, we’ve kind of maybe gone through the day of the life of that data. Right. So that it could be from the earliest how the customer does a request all the way through a system and then back again, You talked quite a lot about shift right. And one of the things I found really fascinating recently is from reading a blog around what you guys were saying: that actually, there’s probably only 20 percent of those combinations of possibilities in production. So, even, if you think, “well no one ever uses that type of combination in production”, to exhaustively test your system with those unknown unknowns. You know, there’s infinite possibilities. You know, it’s a real hard one to focus on. Should you be focusing earlier on or should you potentially be looking and interpreting what’s going on in your APM tools? Your Splunk, Dynatrace landscape’s to see what types of combinations people doing compared to potentially a new system which doesn’t have that kind of historical data. You know, how do you potentially understand from a risk-based approach what’s important? 

Huw Price

I think that’s I think is absolutely true. Very true. I mean, this all sounds a little bit sort of pie in the sky. But I mean, you can take something really simple. I mean, I was doing something the other day where there was a data feed coming in from an old system and it needs to be loaded in a system. You know, it’s just great. You know, it happens all the time. You could call it a legacy system is just some kind of data pipe. So we took a look at it and they said, well, can you mask it? Sure, we can mask. And of course, I’m not a big fan of masking. I think it’s it’s you know, it’s very easy to actually find your way back from one piece of information, especially health care data, like the data of a visit. And you can track the person down, to be blunt, especially in an organization with fewer than ten-twenty thousand people, which is most organizations. 

Huw Price

So we took this mask and we said, OK, well let’s mask it, great. And I said, well, while we’re at it, why don’t we do something a bit more interesting with it? So what we’re gonna do is we’re going to take a look at the data across the entire spread of your entire history. And we’re going to create all the combinations that exist in your historical data. So this was to do with a set of security accesses.

That was quite interesting. It’s quite a complex set of security accesses in terms of reviews refusals, appeals, or discusses stuff. So that was fine, so we did that. We beat that up a bit. And then you think, well, actually, why don’t we look at the new system, which is probably more interesting in terms of what actually types of data of security profiles are allowed in the new system. And let’s mask the file again this time. But this time we’re going to substitute in all the potential combinations that exist in the new system. So your 5000 people that came over actually had a spread of the different combinations of the different types by basically doing some modeling, sort of classic modeling. 

Now, you could say, well, that was the old system. This is the new system that doesn’t exist. But in but in the world where a system does exist, then you do use the shift right data. So every time we create a file for testing or for masking or for whatever, we actually just apply the shift right stuff, the model stuff, which is the pure world idea of what it could potentially be, including any new ideas. But also you could probably skew it up and say, well, let’s try and actually reflect what’s actually going on in production so that when we’re looking at maybe some performance testing or something like that, we’ve actually kind of picked up the same sort of subtle characteristics of the production volumes as well. I mean, why not sort of creating a couple hundred million rows actually pretty easy now. So why don’t we create them with it with the same subtle variations that production have.

So when you try and run your lookups online etc. It’s never the actual URLs that’s the problem. It’s when it goes down into a three-level joint, somewhere in a database. I mean, we actually had one today and the database had less than 200 rows in it. And it went into like some kind of lock loop and you’re only going to find that out with using some of those shift right pieces of information to kind of create that same levels of subtle data that does exist rather than just, “let’s just take a hundred million customers and hope for the best”, you know? 

Jonathon Wright

Yeah. I was having a really interesting challenge when I was there do with the Performance Advisory Council last week in France. And one of the guys was saying, well, this could be a lot of shadow systems on new systems that are kind of—forget AI IT for the time being—But if you think a system’s going to get replaced with a new system, it could be driving your car. It could be an IoT endpoint until they roll out the new a/b kind of canary/dark canary kind of launch. You know, they may want to run them side by side and then analyze the data to see if it’s making the same decisions or it’s an improved accuracy or improves misclassification rate. You know, I think it’s good. It’s still getting to the point where it’s like data on the fly. You know, you having to analyze a lot of different data. From lots of different systems, they’re doing all these kind of SLAM / predictive maintenance stuff is huge amounts of data. 

You know, it’s going to be really interesting to see how did they own that life of the data? How did they own the life of the data for your car? You know, what’s all that black box information you’ve got? What do they do with it? How are they using that data to then optimize it or improve the life of products, you know? You know, get better quality overall, as you know, and learn from real, real end endpoint, say. You know, I think this could be really challenging. And also with data security, I guess that’s the big other big one. Are you seeing that now with Californian data privacy stuff?

Huw Price

Actually, Dylan, my son, he works. He works for us now. So keep it in the family, as they say. And he actually lives in California. And he said it will be a little bit like the Europeans in that they’ll just kind of meander along and then suddenly someone got fined and then the fines will get bigger and then they’ll go, oh, my God, we should have done something about this. 

You know, we’ve I mean, probably five years ago we were going on about GDPR. And it was only when the fines started to ramp up to people suddenly start going, oh, my God, this is serious. You have to actually engage your brain on this. You know, this idea to say, I want to remove have my data removed from your IT systems that include locks. Do we know where the data is? Do we know that if it’s in tax files, do we know it was in XML files? Was it ever downloaded onto a spreadsheet? You know, did you track that? It went down onto a spreadsheet. Is in any RP system, do SAP provide reports on where everyone’s data is? You know, there is just an insane amount of complexity inside this legislation, and quite rightly so. 

I mean the likes of Cambridge Analytica and Facebook. And in terms of what they got up to is pretty horrendous for individual freedoms. So I’m a big believer that we should have these rights. So I think that companies cannot get away with this stuff anymore. And they really need to roll their sleeves up and spend a lot more time building these data catalogs, building much more sophisticated systems to be able to assimilate user behavior rather than having to rely on real-life data in the development environment, which is absolutely not allowed under GDPR. 

So I think they’ve got to shift. And I think the California stuff, once we see a couple of fines in the news, in the papers, then is things will shift rather rapidly. And I suspect like most regulation, it’s a lot easier for companies to go to the highest standard, which at the moment is the European Union, which which is good. But it’s you know, they’ve actually become sort of pan regulations now and they’ve sort of moved up to the United Nations. 

And it’s you know, if you’re a U.S. company in Ohio having to deal with three different regulatory requirements, just go for the highest one. Why not? You know, your customers want it. So unless there’s some specific need, whereas commercially where you’re you’re going to “cheat the system” by not applying regulation, then I think it’s a bit of a gray area. And it’s pretty easy for companies just to adopt the highest standards, especially as it’s going to take three years to change their systems anyway. So they may as well start now. 

Jonathon Wright

I think that’s really good advice. I was quite surprised actually on the flight, because of the scares, I got handed a piece of paper which said hand-write out your own personal details health information and it said on the printout version, one point six or whatever. It printed out in January. I don’t thinking from a GDPR perspective. What are they going to do with that? Right. I think going to OCR and then use tesseract who’s going to be looking at my data? Where’s that data? We’re so used to these primary systems that when things go wrong, we start backing onto the secretary systems, a pad of paper. You know, it does ask a lot of questions—are the government even GDPR compliant? You know, all these kinds of big challenges around your own personal data. But I think as part of that realism is you need to be able to generate realistic data loads, you need to build to visualize where there’s data missing. One of the tools which I always loved that was there, James, is the gapping analyzer. 

Huw Price

Yeah. Yeah. We’ve actually kind of rebuilt that, not in quite the same way we have gone for a spot diagram, but it’s we’ve using the same data pools to be able to compare production and development and we think of it more as what am I actually trying to test now? You know what are the business characteristics that I care about, and show me what I’m missing. And it’s pretty easy then to be able to say, well, those are the gaps. And again, it’s a very, very business type question to ask people. They’ll say, oh, yeah, but we absolutely need cross-currency trade that span three accounting periods because there are all sorts of complexity in that. And if you actually look at the real system, there’s probably none today. There’s probably going to be two next month. But how do you test with that? You know? So these are the things that cause the system to crash, cause all sorts of problems. 

So go after the hard stuff. And actually, that’s sort of a final note. I would say that over the last six months, we have been asked to tackle some of the hardest test data problems that we’ve ever come across. But actually, we haven’t been we haven’t faltered at all. We’ve just carried on using the same analytical method and the same structured approach. And especially as now, there’s so much really cool stuff out there and get that we can use to kind of do some of the legwork for some of this that we really are driving. You know, basically no test data problem, no data problem too hard. Just chuck it at us. Let us leave it with us for a couple of days. And if you think of the value to the business of something that they were doing, which is causing them, as you mentioned earlier on, brand damage or being able to produce sort of regulatory requirements, reporting requirements which are wrong and then having to backtrack and go back to hundreds of companies and ask them to resubmit it and that is with these complex XML, XBRL type documents, these are the types of areas where you absolutely have to get it right. You know, you can maybe make one mistake every couple of years. But if you’re making mistakes all the time, that is just no way to leverage. 

Jonathon Wright

Today, as you know, I got an email today from BA, which because my South African flight had been canceled. And BA breach. You know, there’s been so many organizations that have had data breaches quite recently. You know, it’s so bad for brand, so bad for damage. You know, it’s is just can’t happen now. Not in 2020. So great advice there. You know, how is the best way to get in touch with you? And then also what recommendations would you say for your website or blogs for people to follow what you’re kind of doing? 

Huw Price

Yeah, I mean, just go to LinkedIn. Just look me up: Huw Price. You know, I’ve been a bit busy. Should post a bit more on that. But if you follow James Walker or me on LinkedIn, that’s probably the easiest. That’s where most people go. 

Huw Price

And you know, from out of there, what we try and do is play lots of videos. We like technical videos. We always have to have a couple of slides, just kind of set it up a little bit. But the technical videos are actually quite interesting for people who are gonna be sitting home for the next three months. Maybe now’s the time to catch up on some of the new thinking in the new tech. 

So contact us on LinkedIn to start with. Just give us an idea of what you’re interested in. Well. Well, either put together some videos or point you some existing videos. I think that’s for me, a good way to go. And then if Tom Price as well he sees work with you in the past, he’s just such a good thinker and current writer in terms of writing position papers, which I think might be useful for when you’re trying to talk to management. You know, I never got a wide spread of people here, but if you’re trying to change the way management thinks, you need to kind of surround them a little bit with some white papers and some presentations and things like that. 

Huw Price

So. So some nice high-level ones that also some details ones whether it’s blockchain, whatever the hell it is, you need a lot of different pieces of collateral if you’re trying to change what an organization is doing. And often you find valiant people trying to change and they’re just up against. I would say vested interest. I think those days are probably gone now. It’s more just the sort of big companies finding it very difficult to do anything because they’re kind of overwhelmed. Maybe a better way to put it at the moment. So it’s like how do you break out of that cycle so you can actually get on with some new stuff, you know? 

Jonathon Wright

Absolutely great advice. And also, I’m guessing they can just request a demo. 

Huw Price

Yeah absolutely I mean, to be honest request a workshop, I mean, the best thing to do is we just have a quick chat. Say hello. Then we’ll say some a few things. Tell us what you’re up to. And then we might ask for test plans, bug logs. You know, just give us a presentation of how you do things at the moment. Just talk us through your toolchain and then we just 90 minutes, just chat. Let’s just talk through and then it’s like everything. You don’t want to change wherever possible. If you got good stuff, keep it. Quite often people focus on symptoms, not the real underlying problem. So that’s what a good workshop should try to cover. You know, where is the real problem, not the symptom. And then also looking for areas where you can stick automation in that, which can kind of improve things, you know. You know, with Slack and all these great tools out there now, you should be able to monitor when a release is risky and that should start popping up three or four days before the release is going out. And there’s about 10 extra metrics that you could be using and your pretty name knows about so you know. So there are these things like that. And I think just still a little workshop is great. It’s just it’s good for everyone. We learn something. And at the end of it if you buy a little bit of software, great. You know, if we give you some information, that’s fantastic. 

Jonathon Wright

Amazing. You know, infinite wisdom. Make sure you reach out. Well, put all of the links that we talked about today on the show notes—and reach out for a virtual clinic. 

It sounds like you probably could solve some of those serious problems you having with test data quality. And thanks so much for being on the show. 

Huw Price

It’s a pleasure. It’s great to talk with you, as ever.

Slack Team

Get a free copy of our 2020 QA Salary Guide
Subscribe to our mailing list below