QAL Podcast Joe Triccas

The Importance of Shift Left (With Joe Triccas)

Helping developers understand the testers mindsets.

Related Links:

Audio Transcription:

Intro:

In digital reality, evolution over revolution prevails. The QA approaches and techniques that worked yesterday will fail you tomorrow. So free your mind. The automation cyborg has been sent back in time. TED speaker Jonathon Wright’s mission is to help you save the future from bad software.

Jonathon Wright:

Hey, and welcome to the show. Today I got Joe Triccas, who’s going to talk to us all about shifting left and the importance of getting that testing mindset. So, welcome to the show, Joe.

Joe Triccas:

Hi there. Thanks for having me.

Jonathon Wright:

It’s fantastic to have you, and you’re on day one of your isolation?

Joe Triccas:

Day three. Oh, wait, is it Thursday? It’s day four. We did a trial day on Monday, which just turned into let’s not go back into that building, so yeah, I’ve been on lockdown for four days now.

Jonathon Wright:

And how are you coping with enterprise collaboration and doing stand-ups and just carrying on BAU?

Joe Triccas:

I’ve got to be honest, I think that in a month if this is still ongoing, we’re going to look at that office bill and be like, “What are we spending our money on?” Because it’s relatively seamless, right? I mean, that’s the advantage of being in the software field as well. We have the opportunity to just decide to not go into the office, and yeah, generally speaking, I personally am struggling slightly, but from an organizational health perspective, it’s feeling like, generally speaking, people are coping. 

I’m intrigued, though, as to the impact this is likely to have on overall software quality, which obviously is going to be a key focus of both my role and this talk.

Jonathon Wright:

Absolutely, and I must admit, my favorite film of all time is The Net, and I don’t know if you’ve seen The Net with Sandra Bullock.

Joe Triccas:

I’ve not, no.

Jonathon Wright:

There you go, you’ve got something to do when you’ve got a few days extra. So, this is a ’90s movie, where she’s a software tester, right? She works from home, let’s just put it… She gets the latest build coming through on floppy disk, and she’s testing, from what I can see, is Doom, which ironically is coming out tomorrow, right? The new Doom game.

Joe Triccas:

Wow.

Jonathon Wright:

But the thing is when I watched that for the first time, I was like, “That’s the job I want,” and that was before I even started. This was kind of me at college and having a bit of crush on Sandra Bullock, and part of it was, I was like, “God, that’s such a dream job. You stay at home, you test games, you raise bugs.”

She’d found this bug, which actually was a backdoor to money laundering and all this extra money, which is totally wild, but then she ends up, you know, people are following her, trying to kill her, and you know how this kind of blockbusters go. 

Do you feel that that’s your kind of role, that’s kind of happened to you since you started? I know you started out for a utility company back in, what was it? 2008?

Joe Triccas:

Yeah, 2007, 2008 was when I was moved into their IT function. Yeah, so to speak to that as sort of my journey over the lost just shy of a decade in software testing, I was a typical, stereotypical almost tester, in that I have to hold my hands up and say I did fall literally into testing, to the extent I was just a bit of a diamond in the rough. 

I was in their customer service department, I was a real pain in the IT department’s backside. They were at their point where they were just dipping the toe into what then was termed the eCommerce world, which really it was just having a website. It’s just a SaaS website, basically, that you could log into and manage your meter readings and things. But it was new ground and I was just in the right place at the right time to just naturally surface myself as someone with a technical capability.

So I ended up sort owning all of their prod bugs and being the business representative and eventually moving in as a trainee tester on their billing system, which was like a, oh, wow, that was a COBOL-based, we’re going old school here, like the ’70s based system, which was actually now at this stage of my career, I look back at that and I’d actually argue being that close to the CPU and assembly has given me much deeper respect, I would argue, for the precision that development engineers need to ensure, and I would argue like the sort of ensures I’m cognizant, at least, of the number of transformation layers high-level languages go through to get to execute on a CPU. And then you consider the modern paradigm where the CPU’s not even in the same, possibly even in the same continent as you, and yeah, we’re existing within very complex systems at the moment. So it started basic there, and then, yeah, that would be SSE for you.

Jonathon Wright:

Yeah, I remember a fantastic… So I used to work in Portsmouth for Zurich Insurance.

Joe Triccas:

Oh, really!

Jonathon Wright:

Back in the ’90s. It was one of my first LoadRunner gigs, right? And it was a fascinating place to work and live, but I actually then went down the road to Southampton to another utility company. I forget what their name is. I don’t think it was SSE. But, anyway, we had this really interesting issue, and I don’t know if you’ve come across this in the same industry, but there was this cloud that smelt like gas, and then they had a huge amount of call center numbers coming through and it kept on bringing down the customer service platform because actually, it wasn’t gas, it was just some kind of freak thing that had come over from France and everyone-

Joe Triccas:

I remember that.

Jonathon Wright:

… kept on smelling it.

Joe Triccas:

I remember it. Yeah, I remember that event. So it probably was SSE, because certainly Hampshire in general, from a generational and distribution perspective, is completely managed by SSE’s distribution business, so their sort of emergency service center.

So it was interesting, actually, how my career did sort of go full circle in that I started my career in the SSE’s retail arm, adding a testing capability to their billing system, and then over the years moved through various places and ended up at GE Power, where I was responsible for… So they have a product called PowerOn Advantage, which literally, it is the software that runs the power grids, the national grids of entire economies. Yeah, that was amazing. A really lot of responsibility in that software.

So, yeah, there were points where the software that I was then responsible for, there was an incident, I’m pretty sure it was early last year, where there was an outage across most of the United Kingdom for about 40 minutes. Now, if this outage management software that I was previously responsible for wasn’t in place, that seriously could’ve just ground the entire electricity grid to a halt. And that software is literally the software I used to see the emergency service center representatives using when they were on the phone to people phoning up saying, “Oh my gosh, I can smell gas. What’s going on?” And that would have all the maps of where all the stuff is, without exposing too much about that particular package. So it was quite a romantic idea that I did end up going sort of full circle from their chain of value.

Jonathon Wright:

Well, GE is just a legend. I’ve seen, literally, when we started talking about, you know, everyone started talking about digital transformation, GE Power was literally the go-to company for kind of IoT, for visualization. I remember seeing this amazing data visualization of all of the grid systems all connected in together. It was absolutely… You know, they were the best in class. And it was quite interesting because we were doing the same kind of thing.

I was at Hitachi at the time and we were doing nuclear power stations, so fusion, Horizon, which was based out in Gloucester, which we were actually building nuclear power stations and doing the testing strategy for how you test them. That always worries me when you’re doing things like nuclear power. And then-

Joe Triccas:

Yeah.

Jonathon Wright:

It’s like, how do you put testing around that? And I had to do that in kind of a one-week sprint is a kind of an interesting one. But yeah, and then we went out to Australia and there was a really interesting one around that because 90% of bushfires, which obviously they’ve just had last year, are caused by downed power lines, so they were wanting to utilize our visualization platform to kind of understand if there were any anomalies on the power grid and then potentially go off and investigate them because obviously there are huge distances between landmasses, and so it has kind of, well, okay, this predictive platform, so things like SLAM, so your predictive maintenance platforms, could kind of obviously predict when there’s potential weather which is going to impact these lines, and it’s so complex, but I don’t think most people see that level of complexity. 

Do you find that when you were in the utility area and you were doing this kind of critical systems, your thought process for QA was completely different to maybe what it is now with a smaller startup company?

Joe Triccas:

I’d say it’s a bit of a yes and no response, because in particular where I’m at now, we are dealing with a mass scheduling challenge, so at Airts we provide sort of scheduling software to, at the moment, accountancy firms is what we’re focused on primarily, due to our CEO previously being an accountant, so he knows that industry very thoroughly. But the key idea behind the business was to actually spin out a bunch of artificial intelligence research that our CTO was working on at the University of Strathclyde.

So in terms of complexity, so the power grid runs in a fundamentally linear fashion and they’ll be segregated by various sort of… substations, that’s the word. There you go. [inaudible 00:10:55]. But effectively you can envision it as big serial rings. So from that perspective, there can be visual complexity, but from a physical perspective, it can actually be a bit more simple than the more novel sort of SaaS products that we’re just starting to be able to build.

Airts is a company, I was employee 14, we’re up to 36 now a year later, so we’ve grown fast, and just the way we’re able to leverage novel research in AI means for such a small number of people, we’re able to schedule the work for 100,000 individuals down to a 15-minute granularity. So if you imagine 37 and a half hours a week times 100,000 people, or times four, effectively, that’s the sort of chunks of time, the number of chunks of time we’re dealing with. Then you consider that the whole concept is real-time scheduling, so at any point in time, any change in that overall shape of resource availability means all other factors, like all other existing schedules, need to be considered. 

So there’s some of this stuff sort of happening in the energy industry, especially in particular with outage management, because the legislation these days is so hard on, or depending on your country, obviously, I appreciate, I think California and some of the western states in America suffer a little bit from lacking this legislation, but fundamentally in the UK, more than 10,000 people without power for I think it’s about 10 minutes. I can’t remember the exact regulations because I’ve been out the industry a couple of years, but it’s not a very long amount of time if you were in a traditional model where you needed to identify the issue, get that information to an electrical engineer, have them travel there, resolve it, get everyone back on. 

Having some measure of soft AI, some ability to make a safety-conscious decision and empower a human being to click that button with confidence that no one’s going to get vaporized as a result of it is… And that’s effectively what we’re talking about here, right? Like the outage management system could involve clicking a button on a mouse that results in a 144,000-volt switch switching, and anything downstream from that’s in contact is going to have a very bad time. So that person needs to have confidence in the software that they’re able to do that.

But fundamentally, the actual scheduling, is it in a concept you could see it as a scheduling problem that’s like a temple or schedule of where’s the power going to be at what point in time, and are there any humans that should be in contact with it? If not, you’ve got a green button. Click your mouse, I’ll switch the switch and that street has power again. And then obviously the next iteration is removing the human from that loop, which is “the” challenge of the energy industry, I would say, is to have a seamless, global, sustainable national grid or, well, international grid, I’d say, eventually. But that’s me getting all blue sky.

Jonathon Wright:

Oh, it’s definitely not a blue sky. One of the first most scary days of my life was going into the Horizon nuclear power offices and literally sitting down with what was the CTO. I was like, “Well, what are your top five things?” Obviously safety, you know, the normal kind of things popped up, and then he talked about drones. And this was probably five years ago and the idea of drones was a bit out there, and he was like, “Yeah, I need two of your team to be able to do their drone flight certifications.”

Obviously the legislation in the UK means that you can only pilot a drone which is looking into your own land, so you have to fly it kind of backward all the way around. And part of it was exactly that, is they wanted to use near field, they wanted to use IoT that if they needed to go and look at a system, it would be able to interact without using a drone, instead of sending a human to go there.

The same thing was that the next item down was no buttons, right? You always go into these power grids and there are just thousands of controls, and the idea was you don’t need that. You know, I remember Homer, that was the great thing about Homer, sat there waiting for that button that he needed to press and then he’d never press it, would he? Part of it is that’s kind of adding a level of human augmentation, which you don’t really need.

But it’s fascinating what you’re talking about because we had a similar kind of a problem. We were doing… I don’t know if you’ve heard of [Statverger 00:15:57] in Germany, but part of what Germany is doing is they’re trying to use just renewables, so we’d had a bit of a spec, which must admit it’s probably one of the hardest testing tasks I’ve ever done, which was we want to be able to predict the amount of power that’s going to get used on the grid and also predict the amount of energy that’s going to be generated by renewables. But, of course, how do you do this? And I know GE was the example that we actually used. 

So they were using predictive weather and information, historical information from AccuWeather from the last five years, or whatever it was, and they were able to kind of understand, based on a wind turbine, what potentially how much wind would be there for a solar panel, which we were making both the wind turbines and solar panels, how much electricity it would get, and then on the opposite side of the time was around how much would be taken from the grid, so, okay, football or sporting events have been canceled now, but they could pretty much understand how much was getting pulled. 

And then, what they wanted to do, which is kind of sounds like what you’re doing at the moment with this scheduling, is they wanted to be able to buy and sell electricity within a 15-minute window, and the whole point of this was to be able to sell a small percentage 24 hours before but actually be able to sell it as people are wanting to be able to pull it. So in California, as you mentioned, people like air conditioning units would negotiate with a broker platform for clean energy at a certain price and then they would pull that is based on the fact that they know that they’re going to switch on in 30 minutes so they need to source the power.

To me, it’s crazy, because obviously you can’t have wind turbines in your test lab, so [inaudible 00:17:48] an IoT device, which means you then need to also, you know, you need to think about things like service ritualization, but then service ritualization for 10,000 wind turbines all with different energy signatures coming out and you send a rest request to say, “Stop, power up,” and then you start seeing different outputs based on the amount of wind that’s coming through, to then get those 10,000 different streams of information coming back into the virtual power station, to then emulate that actually by buying and selling in that kind of 15-minute windows, they’d be able to generate an extra 8% revenue, or whatever it was. 

Those models are incredibly complex and probably most people who are out there on the call are thinking, you know, combining your expertise within GE Power and SSE, all your kind of utilities, plus this scheduling capability and then on top of that kind of AI, these challenges are real and they’re also incredibly complex. So how do you go about establishing that kind of QA capability within a startup which has got such big ambitions?

Joe Triccas:

Sure. So, day one, certainly when I started… Where to start, really. The interview, I guess, is where it all starts. When you first meet the team. It was really important for me to discuss… They were recruiting for a software test engineer, which I have basically trended towards the mind model that’s an anti-pattern, but even the concept of naming test as a distinct function immediately absolves responsibility of certain upstream functions, so this was one of the first sorts of messages I was endeavoring or drums I was endeavoring to bang to the board at the time when they were sort of looking to take me on.

So, yeah, I ended up getting hired by them and my role ended up being a software quality engineer, and that was really part of trying to just shift… Obviously test and quality assurance both existed in the paradigm of software engineering for the last, like, two decades, but certainly, it just felt like a distinguishing factor, anyway, a way to shift the narrative. Maybe it was more about me in hindsight.

But long story short, the key for me was to pitch to the development engineers a vision of what is the ultimate development engineer, so the absolute, like the 100X, what we sometimes refer to as the unicorn engineer, so the true full-stack capable of having some measure of dev-ops capability and infrastructure capability so they can stand up their environments, write the code, which is the existing sort of job spec, but then also write all of them… or at least contribute to the ancillary sort of projects you’d have on the go to validate the main codebase.

So that was the sort of mantra that I was walking in with was to empower every development engineer to be able to do the whole thing. In theory, they should be able to go home, stand up some VM somewhere, or some app server, build their own app, fully functionally test it, fully integration test it. Obviously that will all take a lot of time. 

So the other caveat, or not a caveat, but the other reality is businesses. You never join a business with a clean, blank slate and a blank check and no prior commitments, so the other challenge was we had so much business already planned, so effectively I ended up focusing independently on building out sort of an automation framework to basically spearhead that and mitigate our need to get hundreds of testers, because as I just described, the complexity of the software to actually manually, functionally test it, it would triple our business in terms of test resource. So, yeah, it’s just not feasible to do that in the short-term, really, or desirable, to be honest, in the long-term, because manual testing as a role is not well suited to human beings.

Jonathon Wright:

[crosstalk 00:22:25] very good at kind of doing testing, not really checking, so you’re a big advocate for this kind of shift-left approach. What do you mean by shift left and how do you go about doing that?

Joe Triccas:

The first step you can take to shift left is to ensure people with a testing capability are involved at the earliest possible stage of the discussion of a change, whatever that change might be if it’s a bug or a ticket or tech debt resolution. Whatever it might be, having someone that has a demonstrable capability in exploring the real fringes and the edge cases of a problem, having them in there at early doors can help. 

But it is important to sort of ensuring that’s scoped. It’s very easy to continuously pull potential problems from the ether, so it does need to be scoped within the context of the business risks and the outcomes your business needs. But I’d say that’s the first step is just having that.

Then the next step is to proliferate that mindset so it’s not just a capability within a subset of people, and that can sometimes be the biggest challenge because not everyone wants that skillset, which I would say is a problem in itself. That’s like an institutional or an organizational challenge. 

But from my perspective, I’ve been really lucky because everyone that I work with is super interested, they really admire the ability of that… It’s named the Testing Mindset. When you do the IFAC course in the UK, you stumble across a concept called Professional Skepticism, which I think is the most succinct way of describing that testing mindset. It’s the concept of just assuming every…

I operate under the sort of axiom that every assumption will result in failure, so at any point in the process from what already exists to what we wish to bring into existence, any assumption from A to B will result in some sort of unplanned work, run-over, which inevitably results in a compromise of quality. So that, for me, is the key, is it’s all about over-communication of assumptions. And that is an ego problem, right? So we suddenly stumble into, I’ve got this background as a technical functional tester, and now, actually, I’m effectively providing like an ego check to people to say, “Actually, this is a psychologically safe environment, it’s perfectly fine for you, to be honest, about the things you’re assuming. Some of them might be fallacies and we can discuss that and we can ensure we’re all operating from the same viewpoint.”

But in my experience, the key to shifting left, setting aside functional automation test frameworks and API test frameworks, all that stuff, is the real squishy, human bit, which is, do I feel safe telling everyone that was my commit? I introduced that bug that caused X damage. When I deliberate on that, it’s most likely due to this assumption that if it had been raised at this point one in the whole process, we may well have mitigated. We also may not have, the point being I would say 95% of issues that make it through at some point are as a result of an assumption.

Jonathon Wright:

Assumptions are really difficult, isn’t it? In the good old days you’d be starting requirements engineering, you’d be understanding that you shouldn’t have should, could and would, and you’d go through and you’d say, “That’s a bit ambiguous.”

So, do you find that you’re capturing things from when people talk about a concept or an idea, are you validating those very quickly and then kind of justifying them, the architecture, justifying the decisions that are made, being involved at every part of that process? Is that something that you feel is kind of the role of a QA engineer?

Joe Triccas:

Yes. To be succinct, yes. I would say certainly from the quality engineering perspective, so that was the second mantra that I was beating my drum to was, my objective was to engineer a process that results in a quality product, so not to say how do we take this existing process and, sure, augment it, keep it fundamentally the same and augment it to result in an increase in quality? To me, that quality is a… You have to start with quality in order to get quality out at the end, and to be the first hurdle is an assumption.

I would say the analogy I used in the [inaudible 00:27:56] at Airts, which I think does resonate quite nicely with this conversation, is modern science: we don’t have to redo the original experiments that validated the presence of a force called gravity. We don’t have to redo that every time we embark on a new experiment that’s at a deeper level of understanding with respect to that concept because we have explored it, we have documented it, we have observed it, and most importantly we have confidence in basic checks that validate it’s still a present variable, so I can just pick up my mouse and drop it on my desk and I know gravity’s still there, right? I don’t need to do all of the subtle, all the crazy experiments that have happened in order to validate that. 

So that is what I would say that the way in order to… It’s safe to have assumptions, as long as you have checks in place that provide you feedback as to any change in that variable. This is where test automation becomes really valuable because you can have assumptions about a feature in your product as long as you can get realistic, valid and believable feedback as to whether your assumptions have been validated.

Jonathon Wright:

Absolutely, and I know the previous role you were very familiar with Cloud infrastructure. How do you find that building things on the Cloud versus how you used to do things with on-prem? Do you find that it is a lot harder to see when you’ve got things like .NET Core and you can’t access that kind of messaging layers as easily? Are there more challenges from an automation perspective now that you’ve got the Cloud kind of scale?

Joe Triccas:

My current role at the moment, I also do deliver some support, so I am in the support rota with the developers, and certainly, when you are traversing multiple networks to get into like a Cloud-based network, you do lose some… Certainly, obviously, it just basically increases the latency, like there’s no alternative to that. But just as well, as you say, you can lose… It can sometimes feel like you’re flying blind a bit if you don’t have access to the same logs that you might see, might not be able to remote into the VM, or it might not even be running on a VM, and so it might just be containerized and there’s X number of instances and, yeah. 

I would say that is just part of the challenge of software engineering at this point, and in the same way, that login would be something you’d ensure is written out to a hard drive, it adds a layer of complexity that has to just be addressed as early as possible because it’s so critical to the supportability of an application. There needs to be someone banging the drum of support, and it also does tie nicely into testability, as you mentioned, because you can be interrogating these logs as much as the front-end, as much as the APIs, just to validate that it is operating exactly as you would expect.

Jonathon Wright:

I think that’s actually really fascinating. I occasionally should slap myself on the wrists and tell myself off for talking about things like ops dev, right? And my view with ops dev is this kind of, well, I really want to see operational staff involved with things like sprint planning, being involved in the end state so they can kind of say, “Okay, well we’ve got to manage this. Will these different platforms work in our own environment? How will we support them? How will we get the information from things like site reliability engineering and stuff like that?”

Now, actually, your background, coming from the initial support side of things, actually is more of a support ops dev support where you’re actually looking at, well, if I find an issue with this in my production and I’m dealing with something like ServiceNow to raise a ticket and then understand the criticality of it and then make it so that where actually it feeds back, I need to have access to the logs. If Prometheus is pulling out information on a container, I need to be able to have access to what’s important from a monitoring perspective, what’s important from a performance perspective. 

It’s been really interesting to see just how many systems have gone down since COVID-19. Yesterday they announced that Ocado is not only scrapping their mobile app, which I was sat there queuing on and they said, “Oh, yeah, but we’ve also got a really fair system which we’re using to actually allow people to wait in a queue,” which we knew they didn’t really do or implement, and then they said “Yeah, the same thing’s happened with the website,” and then they’ve now admitted that actually, technically, the website and the mobile app suck and therefore they’re going to rewrite it and then put it back out.

Now, we know that that’s underlining is Waitrose, we know that’s John Lewis, we know JLP have done these things at scale before, so why is Ocado different, right? And GoToMeeting, I noticed… [inaudible 00:33:45] three meetings now that I literally can’t get on to GoToMeeting because it’s gone down. 

It’s interesting how at times where we need to leverage this kind of Cloud-based apps, which we know are Cloud and maybe haven’t been architectured to scale, and that makes a real challenge because what you’re doing at the moment is that level of complexity like Zoom would do, like if you’re doing [inaudible 00:34:13] to kind of connect into your online calendar. All of these are dependent on other upstream and downstream systems of which if one of those fails, then this systemic failure across the entire stack, the ability to quickly do that and do a root cause analysis, pinpoint failure analysis, requires access to what would you do if you were in support? How would you diagnose the problem or the steps you do? Well, I’d go and have a look at the log. I’d go on into the user. 

Do you find that insight and that mentality they’ve got has actually helped create much easier to maintain from a support perspective as well as an operational standpoint?

Joe Triccas:

Yeah, definitely. I’d say building in space for people to literally walk into the shoes of another and be able to empathize with their pain is possibly the most effective way of addressing that gap, so actually building space into the engineering budget so that engineers can go… If you’re in an organization of sort of the mid to large size, focused on software, you’re probably going to have a support function that is your first or second-tier support. I would argue it is a responsibility of development engineers to go and spend some time with those people, to understand the pain that they encounter.

95% of the time you’ll find exactly what you just described. Like there’ll be hidden nuggets of unplanned work that crop up all the time within that function that can just be engineered away if people know about it. Equally, situational awareness is so critical, right? Like having proactive monitoring so that you can actually foresee an issue come in and head it off, or at least do something proactively rather than everyone’s queuing because we actually can’t… 

Yeah, I can’t imagine what the actual technical issue was, but guarantee it was something silly like they just couldn’t serve enough web sockets or something. 95% of the time it is something minor like that that you just… Unless you’re willing to actually ask the question at day one, you know, we want to build this app. What’s the average usage? Oh, it’s going to be approximately this level based on existing walk-in clients, or whatever, and then what’s the worst-case scenario, which we’re basically now in, it’s that sort of being confident to name that.

It may just well be that someone did and the business just accepted, do you know what, the risk is low given the low probability or frequency of this event, so we’re just consciously not going to engineer quality in that domain. The problem is at this point that really sucks for your reputation, so I think that’s one other element to this is understanding how to communicate risk and most important risk to your business’s reputation, because so much rests on that in this modern world.

Jonathon Wright:

And the brand damage is not quantifiable. That’s the problem is that part of it is you don’t write, as you’re doing these planning, you look at business value. You can talk about the business value of having a mobile platform, but you don’t really ever sit there with a ticket value of, well, from a brand perspective, if this system goes down for three hours, it’s not only going to cost us X amount of million per hour in loss of revenue, but the brand damage means that we could see maybe a 20, 30% decrease of customer loyalty and retention rate, and that could be something that just ends the company, right?

Joe Triccas:

Yeah.

Jonathon Wright:

It’s this kind of day where it’s actually this kind of issues, these edge cases that maybe people did think about from an SRE perspective but they made it low risk, and part of it was maybe they were looking at this kind of fintech, retail tech kind of approach where actually we’ll rely on other people’s APIs, we don’t need sandbox environments, we’ll have a live system, and then they can’t roll back, they’ve got no real big DR strategy there, they’ve not really done any of the chaos engineering aspects. 

Is this the door that opens chaos engineering to 2021? Have we just discovered it?

Joe Triccas:

Well, chaos has turned up, right? Like it’s inevitable, I would argue because some event is going to occur that is… We’re in a globalized world now. We’re sort of existing… We’re still attached to old modes of working. Like we spoke briefly at the beginning about my personal struggle with transitioning to working from home because I’ve just never really needed to do it, except for maybe out of hours releases, which were naturally just waiting for your window, bring down the servers, deploy and check stuff. Which is pretty easy to do, but to actually remained focused… 

Yeah, I’d say in terms of chaos engineering, we’re starting now to get to the point of technological reliance where there’s definitely space for more novel sort of chaos engines, whether it’s like some sort of soft AI or machine learning-based tool that you can just say, like, here is a URL, or here is an IP. Introduce chaos through various means.

I know there are some tools within the scope of the sort of Cloud-based solutions. It’ll be very interesting if I could get my hands on something that was a little bit more… basically like a virtual hacker but that can scale. Because this is the challenge, I think, is I’m sure there are white-box tools, actually, that they can start to lean on that, but certainly that would be interesting.

Jonathon Wright:

Well, like when we first talk, what I’m doing today is I’m using Burp Suite, which is a security tool on [Calay 00:40:47], which is a security penetration testing platform, and I’m using the spider to transpose a website to get images off it for a customer. So the customer knows I’m on their site, I’m throttling it, so I’ve only got a certain amount of spiders there because I don’t want a DDoS attack it, and I’m extracting those images so we can push them through our Tesla flow to see if it can recognize the images based on what we’ve been doing with identifying different clothing brands and styles.

So, this is a legitimate tool which I’m using, but at the same time, this could be done to the opposite side of things, well… And I know Flood.io, since the COVID-19, have said, “Oh, yeah, free to use our performance platform,” but maybe those are the things. You DDoS a DNS and see what happens. You bring down your Kafka or you bring down a certain amount of nodes on cube or something, and you kind of go, “Okay, as reliance, it’s able to spin them back up. We’re not losing any information, we’re not losing anything critical,” and I know your background, especially when you did the stuff with Sitekit, this ability to deal with multi-Cloud. So, okay, we’ve got a problem with Azure, we can’t fix it, but we can deploy anywhere. We can put our cube instance in AWS. We just remove everything and point the DNSs.

Do you think this idea of multi-Cloud, and I know people call it mega-Cloud as well, which I don’t know if that’s just because they own a mega drive, but this idea that they’re able to move between Cloud instance, they’re able to move back on-prem if they need to with a reduced set of functionality just so that they can continue to keep the business up.

Joe Triccas:

Absolutely. I think that even the concept, like we get quite wedded to the brand name that we’re sort of paying for access to virtual CPUs, but fundamentally, they are server racks, right? So it is just an offsite server. 

So I’d say from a technical, like the first-principles perspective, there really isn’t much separating the various Cloud. Obviously, they have their own internal capabilities and they can sort of manage different types of traffic, but at the raw CPU level, you’re effectively buying CPU cycles.

So from that perspective, definitely, but [inaudible 00:43:23] you need to explore it because they do have their own requirements in terms of how certain things are configured and deployed. But I do think that’s generally standardizing now, though, which is good.

I’d say that we’re at an interesting crossroads where the sheer level of distributed computing power, like just based on modern smartphones, is already orders of magnitude greater than the faster supercomputers. I don’t know the exact number, I’ve not sat and done these numbers, so this is just my mind extrapolating from what I’ve seen, but I imagine you don’t have to go far back for the current distributed smartphone network to easily rival the top supercomputers in the world. I’d say five years or so if you’re looking at like the iPhone 11s and the top line Samsung flagship phones, et cetera. If you wired all them up…

The point I’m making is, I’m pretty sure Cloud will be relatively short-lived from a long-term species perspective, in the same sense that if you look at the industrial revolution, all right, it was 100-odd years and we’re just watching the timelines compound in terms of our leaps in technology. This feels like a stopgap to me, to facilitate much more distributed computing.

It’s interesting you mentioned Sitekit where we were working on a self-sovereign identify platform for large payments provider, which really was the concept of binding your passport, your driving license, your national insurance number to the face ID scan on your phone, so you no longer need to carry the documents. You can board a plane by unlocking your phone. 

That was really interesting, and it’s happening, and it’s coming, and that’s the circumstance where actually the Cloud computing component, other than some cryptographic transforms, which are just getting more and more efficient as time proceeds, is mostly done on your mobile device, so there’s not even… The level to which that could augment society, compared to its relative Cloud reliance, is actually quite incredible, compared to some of the leading SaaS products that we’re seeing.

I love Cloud. It’s been a huge revolution in terms of the capability it delivers to software engineers, I’d say, the speed at which you can go from having nothing, just an idea, to an implementation running in a different country with your basic hot swaps, your backups, all done just with a click of a button. The way that it’s reduced that, yeah, reduced the latency between an idea and an MVP is awesome. But I would say, looking forwards, I mean I’m talking like a 10-year horizon here, but I’d be really surprised if we’re still building huge data centers that are driving most processing.

Jonathon Wright:

I’ve got to agree with you. One of the predictions I made a couple of years ago was people start bringing things back on-prem. There’s only so much computing power you need, and when I was at Hitachi, HDS was made physical kit, right? And that physical tin was UCP, so it was a unified compute platform, which meant that yes, you got 192 cores, physical cores, and maybe 192 gigabytes of ram, but it was enough to do SAP HANA, it was enough to use for whatever you wanted. You could spin up as many VMs or Dockers or whatever containerization you needed. There’s enough power there. 

Now, the question about giving it to somebody else’s data center is, and the point you’ve just made about the side of things of edge devices, you know, the new iPhone, I think they’re calling it 11 Pro Max, or whatever it is, it’s got six gigs of ram on it and it’s got quad-core. The Xboxes are pretty much just supercomputers now. Your Tesla outside has X amount of GPUs and CPUs that you could use.

Now I’m not saying move the device [inaudible 00:48:00] to your car, but I’m also not saying why not, because at the end of the day, if I can run Calay on a Raspberry Pi for $40, then I know that my Tesla outside is going to have even more power and more GPU power because it has to do computer vision, it has to recognize where people are stepping out on the road, and it runs on Linux. Is there any reason why I can’t just load up my stack on there, right? And that’s the big question. And then all I need to do is, a bit like Azure do with their big data farms, which are just containers they stick on the back of lorries, is all I need to do is just plug it into the water, or in this case electricity, but electricity, water, ethernet and I’ve got my mobile data center. 

So if edge devices are really getting that powerful, why not? is the big question. And I think this is, you know, we joke, but at the same time this ML on both the Android platform and Apple platform, you know, Windows have been talking about Win ML for a long time, which is 2019 onwards, which is predefined machine learning stuff, which you could go off the easy example with that one, is go into PowerPoint, load PowerPoint, put any image you want, right-click on it, go to All Text, it’s already trained to say, “The image you’ve just done is a picture of a kitten sat next to a lake.” That stuff is kind of done to death.

So I do see some really interesting opportunities of, well, how do you make those decisions on architecture? How easy can you switch between it? What are those technical challenges where you’re wanting, like what you’re doing, being able to do 15-minute windows? That’s the kind of criticality now is what kind of IOPS do you need? 

I was talking to somebody last week in France before the quarantine while I was there, and he was saying that he saw this great YouTube video where there’s a guy shouting at a new disk drive and because of the frequency of our voice, it actually made the IOPS drop, so it went slower.

Joe Triccas:

What?

Jonathon Wright:

So it’s a bit like talking to plants. They need oxygen. Are we going to be there wanting to actually have physical [inaudible 00:50:13] so we can shout at it and actually make an impact on some of the testing? 

But it’s been absolutely fantastic chatting to you, and we could literally chat, and I think we should. I think literally come day 25 or something, or 28 days later, we’ll come back and we’ll see you rocking backward and forwards going… All the websites will be gone down, no one will be able to buy food because their apps just didn’t scale in the Cloud. They thought, “Oh, no, it’ll be fine. We’ll use the same architecture, we’ll just stick it in AWS, it’ll be fine. Nobody’ll ever know until COVID-19.” So, anyway.

Joe Triccas:

That would be [crosstalk 00:50:57].

Jonathon Wright:

So, just in kind of close, how’s the best way to get, people who are listening, to get in touch with you, or do you have anything, any links, blogs and stuff that you can share with us, or LinkedIn? What’s the best way to reach out to you?

Joe Triccas:

The best two platforms would likely be Twitter or LinkedIn. You can email me directly, [email protected], why not? So certainly, if people want to get in touch, do feel free.

I will also be cheeky if you don’t mind, and just briefly plug an idea I’m co-founding at this point in time, which I think would be of great interest to you, especially given your exposure to nuclear power plants.

Jonathon Wright:

Bring it on.

Joe Triccas:

It’s a concept called With-U, which is a device and a platform. The device plugs into existing light sockets and provides infrared and visual spectrum, real-time feeds for users of augmented reality and VR devices. So the concept is effectively like a VOIP call, but I’m coining it Presence Over IP, and the idea is effectively you can step into someone’s real-time environment.

So like this podcast, for example, I own [inaudible 00:52:10] and if you open one of these devices installed, I could With-U, which is the sort of term I’m looking to try and coin as this concept, and I would be sat next to you in real-time. I could see your screen, I could see the spiders crawling all over your browser, [inaudible 00:52:27].

Yeah, so I’m actually co-founding that now. We’ve got a prototype that we’re in the process of building with my co-founder, and I just wish I’d done this two months ago so that] when this cropped up, everyone just had virtual offices. So I just thought I’d chuck that out there. It’s a very interesting concept and certainly, if anyone’s listening and they find that idea appealing, get in touch.

Jonathon Wright:

Yeah, absolutely. Do do you have a website or a kickstart page or anything we can promo?

Joe Triccas:

Not going to lie, this is legit stealth mode still. I just thought I’d tell you because it resonates so closely with the discussion about the drones in the nuclear power plant as a use case that I’ve been looking at. We’re in stealth mode to the extent that we want to get a physical prototype built. There are no digital media to consume other than this two-minute mention of it on a podcast. It will be coming, but the key for me is that there’s actually something physical and demonstrable, so I just wanted to plug it at the very end.

Jonathon Wright:

[crosstalk 00:53:30].

Joe Triccas:

And you might see my name floating about.

Jonathon Wright:

I’ll make an intro as well, so when I did my TED talk, the guy before me did a HoloLens TED talk, and it was really interesting because what he did was for the American Football over in the States so the guys could wear a Microsoft HoloLens headset and then do reps. Instead of having everyone on the team training 24 hours a day, throw the ball, let’s see what this formation is, let’s see what happens if someone breaks through there, you could literally do it virtually, so you could have other teammates explain at the same time.

Great demo. I’ll link it to you because he’s a VR, MR guy, and-

Joe Triccas:

[crosstalk 00:54:16].

Jonathon Wright:

… yeah, I think the idea is absolutely fantastic because that coworking aspect of having With Me, as you’re saying, gives you that kind of capability to actually sit down, do some peer programming, look at the same screen, do those interactions that actually we still want, so yeah, fantastic. 

As we’re plugging, my latest web domain that I purchased is COVID.fo. I’m trying to work our what F-O stands for or-

Joe Triccas:

[crosstalk 00:54:46].

Jonathon Wright:

… could potentially be an abbreviation for, but I don’t think it’s a startup company, but I definitely think it’s a cool domain to have. I did previously register Dontassumemygender.com, but that didn’t seem to make as much thing, but Corona.fo definitely might get some traction with the idea of once it’s disappeared, but yeah.

Joe Triccas:

[inaudible 00:55:09] abbreviation of F-O, or for F-O, that succinctly it surmises my attitude towards COVID quite nicely, but…

Jonathon Wright:

Well, free emails for all is what I’m saying, so if you want a free COVID.fo domain-

Joe Triccas:

Oh, okay, yeah.

Jonathon Wright:

… email, I will make sure it redirects to your Gmail for you and anybody else who wants it on the show. So there you go. I’ve just found a use for it. 

Anyway, Joe, it’s been an absolute pleasure. Best of luck with COVID-19, and let’s save the world from making bad decisions about Cloud infrastructure.

Joe Triccas:

Absolutely. It’s been a real pleasure. Thanks for having me.

Jonathon Wright:

Thank you.

Slack Team

Get a free copy of our 2020 QA Salary Guide
Subscribe to our mailing list below