When the Slack Channel Gets Archived, but the Service Keeps Running

TL;DR:

We interviewed 100+ companies to understand how they manage engineering governance in the face of dev setup entropy. This post breaks down six patterns we saw — from CI templates to scorecards to DIY platforms — and shows where each one succeeds, fails, or quietly collapses under real-world pressure.

It still runs.

The service, buried deep in the infrastructure, still receives traffic. Still logs errors. Still serves a feature that someone, somewhere, might rely on. No one remembers exactly what it does. The README is out of date. The Slack channel was archived two layoffs ago.

Once, it had a team. A mission. Engineers sat in a pod, under warm fluorescent lights, planning features and joking over bad coffee. They gave the service a clever name. They wrote dashboards. They argued over endpoint naming conventions.

And then time passed.

Reorgs swept through. Teams were renamed, merged, dissolved. People left. Ownership blurred. Documentation drifted.

Today, it’s still deployed — but untended. Not quite dead. Not quite alive. Ask the SRE team, and you’ll hear a sigh: “We think it’s critical, but nobody owns it anymore.”

We heard dozens of stories like this. Different details. Same themes.

A service running in limbo. A checklist ticked, but never read. A vulnerability found, but never fixed.

This is how engineering chaos creeps in. Not through malice — but through entropy. Through drift. Through the thousand tiny decisions that feel right in isolation, but accumulate into a system no one fully understands or controls.

In a previous article, we explored the root causes: microservices, the explosion of tooling, the limits of standardization, and the widening gap between policy and reality.

This time, we’re going deeper into the coping mechanisms — what companies are actually doing to survive. These aren’t best practices. They’re stress responses. Workarounds. Cultural antibodies. Sometimes admirable. Sometimes alarming.

If you’ve ever asked yourself, “How are we still standing?” — this article is for you. In the sections below, we’ve captured the six most common patterns we saw — not as endorsements, but as field notes. What’s working. What’s breaking. And what it really looks like to fight chaos at scale.

1. “Golden paths” and Common CI/CD Templates

One approach we heard over and over is the push for common CI/CD templates. The idea sounds simple: build a standardized pipeline. Pre-bake security gates, quality checks, compliance steps. Let every team inherit the best practices automatically.

No more chaos. No more gaps. In theory, it’s brilliant. In practice? It’s a lot more complicated.

One engineering leader at a mid-sized tech company told us a story that stuck with us. Their platform team had spent months building a beautiful, standardized CI/CD pipeline — packed with every imaginable check: security scans, compliance validations, code quality verifications, deployment safeguards.

It was everything you could possibly want… except agility.

When one of their app teams needed to move quickly, building a proof-of-concept that demanded agility, the standardized CI/CD templates slowed them down to a crawl.

The template enforced ten different types of scanning, multiple build artifact signatures, lengthy deployment verification steps — all great for production, but suffocating for a scrappy new project trying to iterate fast.

The result? That team quietly abandoned the corporate pipeline altogether. They built their own lightweight, custom CI/CD flow on the side — completely outside the platform team’s governance. Because their custom pipelines weren’t compatible with the company’s core Kubernetes cluster, they had to additionally spin up an entirely separate production environment, starting from scratch.

They weren’t alone, either. Over time, more and more teams opted out, each for slightly different reasons: slowness, inflexibility, incompatible tech stacks.

What was meant to be a unifying standard ended up fracturing the organization even further.

We heard variations of this story at almost every large company we talked to: Rigid templates that worked beautifully in theory, but in reality, couldn’t flex enough to meet the diversity of needs across dozens (or hundreds) of different teams.

The key pain points with CI/CD templates:

They are incredibly hard to retrofit once chaos is already entrenched.
They fit a narrow slice of projects, but real-world engineering is messy.
They require almost religious discipline to maintain and evolve — otherwise, templates rot and drift apart just like codebases do.

Templates can be powerful. But when they don’t account for the specific needs of each team, they stop being a solution and become yet another source of resentment and shadow infrastructure.

2. Manual Production Readiness Checklists

Another common strategy we heard about is relying on manual production readiness checklists.

The theory is straightforward: Before every launch (or sometimes on every PR) an engineer walks through a list of best practices.

Did we scan for vulnerabilities? ✅
Did we run all the tests? ✅
Did we validate compliance artifacts? ✅

Check, check, check.

On paper, it felt rigorous. But over time, it became something else entirely. Teams like it because it’s lightweight and flexible. Leaders like it because it feels like accountability without heavy infrastructure. But the cracks start to show over time.

A pattern we saw a few times — and one that hit especially hard — came from a well-known security vendor. Their process required engineers to manually verify key readiness criteria before every production push. Everything from security posture to data integrity was supposed to be double-checked and signed off. But after enough repetition, the checklist became muscle memory. People stopped really thinking about each item. It became easy, natural, even, to just… click through.

And one day, that’s exactly what happened. A very senior engineer, under pressure to push out a fix quickly, ticked off a manual verification step without actually performing the underlying check. No malice. No negligence. Just human imperfection, and an overloaded day.

Unfortunately, that missed step allowed inconsistent data to be shipped to production. This resulted in a massive re-ETL operation to clean up customer data — an expensive, painful computational process — and, worse, dashboard inconsistencies that led to widespread customer complaints.

It wasn’t just a checkbox miss. It was customer pain, operational cost, and real fallout.

We heard variations of this story again and again: Even the best engineers — even the most security-conscious organizations — struggle with the basic truth that humans are imperfect. Especially when they’re busy, tired, or under pressure. Manual checklists work great in theory. But in practice, they degrade over time — becoming rubber stamps that create a false sense of security, until the day it really matters.

3. Scorecards (Typically in IDPs)

In the last few years, a new tool has started gaining real traction: engineering scorecards, often embedded inside Internal Developer Platforms (IDPs) like Backstage, Cortex, OpsLevel, and others.

The promise sounds irresistible: Catalog all your services, assign clear ownership, track engineering maturity automatically. See which teams are thriving and which ones are falling behind — all from a single dashboard.

When we first started hearing about scorecards in our interviews, the excitement was palpable. Finally, a way to bring order to the chaos! Finally, real visibility into engineering practices!

But as more companies rolled out these systems, reality set in.

One engineering leader put it best:

“At first, it sounds like you’re going to have x-ray vision into your entire SDLC. In practice, it’s more like getting a blurry polaroid.”

Even the scorecard platforms with the richest plugin ecosystems quickly run into a hard truth: Engineering culture is specific. Every company’s engineering process is as unique as a snowflake — shaped by specific needs, regulatory environments, team philosophies, tech stacks, and risk tolerances.

Encoding that nuance into a conventional scorecarding system is impossible. Most scorecards end up measuring only broad, surface-level signals: Does this service have an owner? Is there an on-call rotation? Is the critical vulnerability SLA being met?

Important, yes — but shallow.

Meanwhile, the deeper layers of engineering quality — the ones buried inside CI/CD pipelines, testing practices, security scans, deployment rigor — are almost invisible.

And worst of all: scorecards operate after the fact. Problems aren’t surfaced early, where developers can fix them easily. They’re discovered weeks or months later, when leadership pulls up a dashboard and asks: “Why are we failing this?” By then, the damage is often done: bad practices have shipped to production, technical debt has compounded, and fixing it means slowing down work that teams thought was already finished.

Scorecards are still a young technology. They’re valuable, especially for driving accountability at the leadership level. But they’re not a silver bullet. They’re a snapshot, not a real-time feedback loop. And when it comes to taming engineering chaos at scale, late, shallow visibility isn’t enough.

4. Relying on Individual Vendor Tools

When it comes to securing the software development lifecycle, most companies aren’t short on tools. If anything, they’re drowning in them.

A moment that crystallized this problem came from a company in a heavily regulated industry — the kind where compliance isn’t optional, and the risk of a breach could mean existential damage.

To stay safe, they invested heavily. Across the organization, they deployed sixteen different security vendors: SAST scanners, dependency auditors, container image validators, SBOM generators, runtime monitors, and more.

At first glance, it sounds like a fortress. Sixteen overlapping layers of defense. But peel back the surface, and the reality was far more chaotic.

A handful of those vendors were owned centrally by a DevSecOps team. The rest? Scattered across the org — bought by individual app teams, integrated however they saw fit, with wildly inconsistent coverage. And when leadership asked the critical questions — “Are we actually covering every service properly?”, “Are new critical vulnerabilities being fixed within 30 days, like our policy says?”, “Which teams are using which vendors, and are they configured correctly?” — the answer was… silence. No one knew.

There was no central dashboard. No inventory of what was installed where. No unified policy enforcement across the patchwork. Just sixteen different vendor consoles, each living in its own world, each only visible to a small subset of teams.

If a new vulnerability popped up in production, finding it wasn’t the problem. It was governing the detection — making sure the right service was protected, making sure the right person was accountable, making sure the vulnerability was remediated on time.

And that governance layer, the most crucial part, simply didn’t exist. This wasn’t an isolated story. We heard versions of this from multiple companies: An impressive arsenal of specialized tools — but no way to tie them together into a coherent, enforceable, visible system. Specialized vendors go deep. But without a unifying governance layer, depth alone isn’t enough. The chaos stays.

5. DIY Engineering Standards Tracking

For companies where the cost of failure is measured in billions — like major banks, insurance giants, and critical infrastructure providers — chaos isn’t just inconvenient. It’s an existential threat. So it’s not surprising that some of the most sophisticated organizations we spoke to have chosen a different path: build their own internal system to track and enforce engineering standards across every project.

One story we heard from a leading financial institution was particularly striking. Over a decade ago, they realized that relying on piecemeal vendor tools and manual processes wasn’t going to cut it. Too much was at stake — regulatory scrutiny, customer trust, brand reputation. So they went all-in.

They built a sprawling internal platform:

Every service had to register.
Every build and deployment was instrumented.
SBOMs were tracked.
Test coverage was logged.
Vulnerabilities were cataloged centrally.
Artifacts were signed.
Deployment practices, CI configurations, code ownership — everything was collected, analyzed, and surfaced to leadership.

It was, frankly, awe-inspiring. But then came the caveats.

Building that level of oversight wasn’t just expensive. It took years — more than a decade of iteration, executive sponsorship, and dedicated engineering teams working full-time. Dozens of engineers — not one or two — focused solely on maintaining and evolving the platform.

And even with all that effort, cracks still showed through. One of the leaders we spoke with admitted: “We have a very effective process, but not a very efficient one. Enforcing fixes earlier, during development, is still a huge struggle. Bringing feedback to PRs is where we fall short.”

In other words: They could spot problems. They could pressure teams to fix them. But they couldn’t easily prevent bad practices from landing in the first place, without additional heavy-lift interventions.

This wasn’t a story about failure. It was a story about just how high the bar is — and how, even at the top end, real-time governance across the SDLC remains painfully difficult. DIY systems can work. But they require an astronomical investment — and even then, they leave critical gaps that are still hard to close.

6. Doing Nothing

Not every company has the resources, or the will, to tackle engineering chaos head-on. Sometimes, the strategy is simpler: write down the right policies, install the right tools, and trust that things will work out.

Everything looked good on paper: policies in place, tools deployed, evidence filed.

But behind the scenes, a different picture often emerges. One story that came up again and again involved how companies handle vulnerability reports. The compliance framework would require that you “catalog known vulnerabilities” for every release. But it wouldn’t necessarily prescribe how those vulnerabilities should be fixed. So teams would dutifully run a scan, generate a long PDF listing all the vulnerabilities, upload it to the compliance evidence system… and move on. The report was created. The checkbox was ticked. And nobody ever opened that file again.

The tool was technically used. The policy was technically followed. But the actual vulnerabilities — the very issues the policy was meant to protect against — often went unaddressed.

As one security lead half-joked to us: “In our industry, we are world-class at knowing what our vulnerabilities are. We’re just not so great at fixing them.” This isn’t laziness. It’s systemic reality.

When companies have hundreds of microservices, dozens of disconnected tools, and no unified enforcement layer, the scale becomes overwhelming.

And so, little by little, governance turns into compliance theater: Doing just enough to pass an audit. Proving you know about the risks, without proving you’re mitigating them.

The risk isn’t visible day-to-day. Until, of course, the day something slips through — and the gap between “knowing” and “doing” becomes very, very real.

Why None of These Alone Are Enough

Each approach solves part of the problem. No single strategy solves everything. Here’s how each of the six strategies stacks up across key needs:

Need	CI Templates	Manual Checklists	Scorecards in IDPs	Individual Vendors	DIY	Do Nothing
Strict enforcement	✅	❌	❌	⚠️	⚠️	❌
Early feedback (shift-left)	✅	⚠️	❌	⚠️	⚠️	❌
Ease of insertion in existing projects	❌	✅	✅	⚠️	❌	✅
Full coverage across teams	❌	✅	⚠️	⚠️	⚠️	❌
Depth	✅	❌	❌	✅	✅	❌
Leadership visibility	❌	❌	✅	⚠️	✅	❌
Scalability	❌	❌	✅	⚠️	⚠️	✅
Developer autonomy	❌	✅	✅	⚠️	⚠️	✅

✅ = Good
⚠️ = Partial / Hard / Mixed
❌ = Bad

Today, organizations are choosing a combination of strategies, each compensating for the others’ blind spots. Some are barely holding it together. Some are building brilliant internal systems — but at astronomical cost.

Conclusion

If this all sounds familiar — if you’ve seen these patterns, lived these stories, or fought these fires — you’re not alone.

Every strategy we examined solves a slice of the problem. But what’s missing isn’t effort. It’s integration. The real challenge isn’t checklists, scorecards, or even security tools — it’s that they’re disconnected, shallow, or easy to ignore.

The common thread is a lack of end-to-end visibility and enforceability across the software development lifecycle.

In my (biased) opinion, a strong solution would:

Instrument the actual places where work happens — CI/CD pipelines and codebases — in a way that is compatible with the diverse nature of today’s tech stacks
Collect and centralize rich SDLC metadata across services and teams
Enforce engineering standards programmatically, not as dashboards, but as PR checks and release gates
Give leadership live visibility into adherence, not just shallow metrics

Not another tool on the side. Not yet another dashboard. Not a spreadsheet. Not a PDF uploaded to a compliance folder. What’s needed is a unified layer that connects policy, practice, and production — and closes the loop before the damage is done, not after.

Earthly Lunar: Monitoring for your SDLC
Achieve Engineering Excellence with universal SDLC monitoring that works with every tech stack, microservice, and CI pipeline.

Get a Demo

Vlad A. Ionescu

Founder of Earthly. Founder of ShiftLeft. Ex Google. Ex VMware. Co-author RabbitMQ Erlang Client.

Popular Posts

Learn More About Earthly