Let’s blame the dev who pressed “Deploy”

@TootSweet@lemmy.world

I do wonder how frequent it is that an individual developer will raise an important issue and be told by management it’s not an issue.

I know of at least one time when that’s happened to me. And other times where it’s just common knowledge that the central bureaucracy is so viscous that there’s no chance of getting such-and-such important thing addressed within the next 15 years is unlikely. And so no one even bothers to raise the issue.

@aodhsishaj@lemmy.world

Hey man, look, our scrums are supposed to be confidential. Why are you putting me on blast here in public like this?

@aodhsishaj@lemmy.world

Git Blame exists for a reason, and that’s to find the engineer who pushed the bad commit so everyone can work together to fix it.

Blame the Project manager/Middle manager/C-Level exec/Unaware CEO/Greedy Shareholders who allowed for a CI/CD process that doesn’t allow ample time to test and validate changes.

Software needs a union. This shit is getting out of control.

@jordanlund@lemmy.world

Blame the dev who pressed “Deploy” without vetifying the config file wasn’t full of 0’s or testing it in Sandbox first.

@aodhsishaj@lemmy.world

That’s not how any of this worked. Also not how working in a large team that develops for thousands of clients works. It wasn’t just one dev that fucked up here.

Crowd Strike Falcon uses a signed boot driver. They don’t want to wait for MS to get around to signing a driver if there’s a zero day they’re trying to patch. So they have an empty driver with null pointers to the meat of a real boot driver. If you fat finger a reg key, that file only containing the 9C character, points to another null pointer in a different file and you end up getting a non bootable system as the whole driver is now empty.

If you don’t understand what I just said here’s some folk that spent good time and effort to explain it.

https://www.youtube.com/watch?v=pCxvyIx922A&t=312s

https://www.youtube.com/watch?v=wAzEJxOo1ts

@Scio@lemmy.world

If capitalism insists on those higher up getting exorbitantly more money than those doing the work, then we have to hold them to the other thing they claim they believe in: that those higher up also deserve all the blame.

It’s a novel concept, I know. Leave the Nobels by the doormat, please.

@aramova@lemmy.world

Wait, are you trying to say that Risk/Reward is an actual thing?

/s (kinda)

@Geyser@lemmy.world

Was there a process in place to prevent the deployment that caused this?

No: blame the higher up

Yes: blame the dev that didn’t follow process

Of course there are other intricacies, like if they did follow a process and perform testing, and this still occurred, but in general…

@aodhsishaj@lemmy.world

How could one Dev commit to prod without other Devs reviewing the MR? IF you’re not protecting your prod branch that’s a cultural issue. I don’t know where you’ve worked in the past, or where you’re working now, but once it’s N+1 engineers in a code base there needs to be code reviews.

@j4k3@lemmy.world

If they didn’t follow a procedure, it is still a culture/management issue that should follow the distribution of wealth 1:1 in the company.

@tabular@lemmy.world

removed by mod

@Blue_Morpho@lemmy.world

"George Kurtz, the CEO of CrowdStrike, used to be a CTO at McAfee, back in 2010 when McAfee had a similar global outage. "

@Life_inst_bad@lemmy.world

Wild theory: could it have been malicious compliance? Maybe the dev got a written notice to do it that way from some incompetent manager.

@AA5B@lemmy.world

While that’s always possible, it’s much more likely that pressures to release quickly and cheaply made someone take a shortcut. It likely happens all the time with no consequences so is “expected” in the name of efficiency, but this time the truck ran over grandma.

@Entropywins@lemmy.world

We have rules against that at my job…literally if God came down and wrote something out of process that’ll be a no big guy.

@Cyteseer@lemmy.world

As a counterpoint to this articles counterpoint, yes, engineers should still be held responsible, as well as management and the systems that support negligent engineering decisions.

When they bring up structural engineers and anesthesiologists getting “blame” for a failure, when catastrophic failures occur, it’s never blaming a single person but investigating the root cause of failures. Software engineers should be held to standards and the managers above them pressuring unsafe and rapid changes should also be held responsible.

Education for engineers include classes like ethics and at least at my school, graduating engineers take oaths to uphold integrity, standards, and obligations to humanity. For a long time, software engineering has been used for integral human and societal tools and systems, if a fuck up costs human lives, then the entire field needs to be reevaluated and held to that standard and responsibility.

Nine

This is why every JR Engineer I’ve mentored is handed a copy of Sysadmin Code Ethics day one along with a copy of Practice of System and Network Administration.

We really need a more formal process for having the title of engineer and we really need a guild. LOPSA/USENIX and CWA are from what I can tell the closest to having anything. Because eventually some congress person is going to get visited by the good idea fairy and try to come down on our profession. So it’s up to us to get our house in order before they do.

@EnderMB@lemmy.world

If I’m responsible for the outcome of the business, I want a fair share of the profits of the business.

@TwitchingCheese@lemmy.world

I get that it’s not the point of the article or really an argument being made but this annoys me:

We could blame United or Delta that decided to run EDR software on a machine that was supposed to display flight details at a check-in counter. Sure, it makes sense to run EDR on a mission-critical machine, but on a dumb display of information?

I mean yea that’s like running EDR on your HVAC controllers. Oh no, what’s a hacker going to do, turn off the AC? Try asking Target about that one.

You’ve got displays showing live data and I haven’t seen an army of staff running USB drives to every TV when a flight gets delayed. Those displays have at least some connection into your network, and an unlocked door doesn’t care who it lets in. Sure you can firewall off those machines to only what they need, unless your firewall has a 0-day that lets them bypass it, or the system they pull data from does. Or maybe they just hijack all the displays to show porn for a laugh, or falsified gate and time info to cause chaos for the staff.

Security works in layers because, as clearly shown in this incident, individual systems and people are fallible. “It’s not like I need to secure this” is the attitude that leads to things like our joke of an IoT ecosystem. And to why things like CrowdStrike are even made in the first place.

Let’s blame the dev who pressed “Deploy”

Let’s blame the dev who pressed “Deploy”

Let's blame the dev who pressed "Deploy" - Dmitry Kudryavtsev

Technology

Our Rules

Approved Bots

Let’s blame the dev who pressed “Deploy”plus-square

Let’s blame the dev who pressed “Deploy”plus-square

Let's blame the dev who pressed "Deploy" - Dmitry Kudryavtsev

Technology

Our Rules

Approved Bots

Let’s blame the dev who pressed “Deploy”

Let’s blame the dev who pressed “Deploy”