Sal Furino
18 hours ago

So happy to have had the opportunity to speak at @devopsdaysnyc yesterday! I gave out 9 tips for folks starting to use SLOs and how to get more value out of the SLOs they already have. #sre #reliability #slos

1 day ago

the differential between what is written here and is lived by many companies is huge
Here are a few things that the term SRE means in the outside world in our personal experience:

- Expensive and good at on-call
- Distributed systems consultant
- Platform engineer
- Rebranded ops group member
from *SRE in the Real World: for Xooglers*

Mike Wyer
1 day ago

Up til 2am working on a mysql replication issue.

The combination of factors that led to the replica getting 24 hours behind could probably be summed up as "the system was left in the first configuration that barely worked". Row-based replication choked on batched deletes.

Always good to be reminded that any time you do something unusual, the system can behave in unexpected ways.

And however many prior failure conditions you account for, it can always find one more.

#sre #systems

konst 🇳🇿
1 day ago

Uptime SLOs based on error %s are an easy place to start, but we need to go beyond that as developers.

I used to run a 99.95% uptime SLO system, everything green, we're all proud. But we learned from customers that a <0.01% failure was causing data loss with major real world implications for those affected.

Look at your golden flows, the most important parts that are important. You might not get the data from HTTP response codes, new measurements may be needed.

#sre #observability

Check out my article at APM Digest about the growing complexity of #observability and rising MTTR (mean time to remediation), with interesting stats from the latest #DevOps Pulse survey.
#DevOps #SRE #APM

DevOps Weekly
2 days ago

Introducing a tool for running diagnostic and administrative tools locally on your machine, but with outgoing network connectivity as if they're running in your k8s cluster.


#devops #kubernetes #sre

husimo 🦊
3 days ago

Hello les rézo, @cooptilleuls cherche un·e Site Reliability Engineer pour étoffer son pôle #SRE composé de trois personnes.

Le détail est ici :

On a trois offres qui s'appellent PtiKube, GroKube et Kustomize (et on se pense méga drôles et on est sympas 😄).

Vous l'avez peut-être deviné, on fait pas mal de Kube ! #k8s

On est une SCOP, une coopérative de 70 salarié·e·s où les décisions se prennent collectivement et démocratiquement :)

(N'hésitez pas à repouéter ! <3)

6 days ago

Over my morning coffee, I enjoyed @Ilovelemons #SLOconf talk examining SLO practice through the lens of the book "Seeing Like A State".

I keep seeing that book come up so it's probably time for me to read it... I think there's a lot we can untangle in the greater #SRE discipline using social and political thinking.

Dan's language around using #SLOs as an interface between the company and the team feels really powerful!

Stephen Townshend
1 week ago

Nervous and excited about my big trip to the UK and Singapore, I fly out on Monday.

If you're at #awssummit next week or #srecon APAC the following week, come say hi!

#sre #observability

An mspaint drawing of a map of the world with my flight path.
Sam Rose
1 week ago

#SRE #DevOps poll.

Anomaly detection alerts are…

Stephen Townshend
1 week ago

Unfortunately there is no #SlightReliability episode this week... So as is tradition, I have a haiku for you. #sre

The haiku reads...
You need to see more
In reliability
Than technology
1 week ago
2 weeks ago

Optimizing a software delivery organization holistically is a complex endeavor.

Engaging teams and middle & upper management in #DataDrivenDecisionMaking enables holistic #ContinuousImprovement of a software delivery organization.

More on #InfoQ:

#Metrics #SRE #Agile

2 weeks ago

Event report from the recently concluded #SRE Meetup in Bengaluru

You know, I say in my bio and #intro that I used to work in #tech, but I haven't really actually said very much about what that was aside from comments here and there on other people's toots. (Edit: oh, and boosts are appreciated but not requested of this.)

So, here's a brief summary: I worked on two things primarily in my former career. The first was observability (more traditionally known as monitoring) systems, and second was realtime stream processing systems. In both cases the systems were
BIG. Like, big. No, bigger.

#observability systems processed hundreds of millions of datapoints/second, and the stream processing systems did terabytes/second of data. They were both globally distributed with the core backends running on thousands of machines collecting data from fleets of millions of machines. Because of the scale of them, I used to joke that I didn't count in anything smaller than a petabyte, which was honestly more true than joke.

I did a lot of different things while working on them from core engineering (I'm even academically published despite never having attended college, which is quite point of pride for me even years later,) to product management for the user interfaces for them, to
#SRE type work.

As I find ways to talk about what I did more obliquely, I hope to muse on this stuff more because I still find it really cool to think about even though I've left the industry to make porn instead.

#StreamProcessing #SystemsEngineering

2 weeks ago

We use #Prometheus Remote Write every day so we wrote a post on it

#SRE #Observability

So apparently all I needed to get Incident Management product folks to pay attention to me is to become a customer of one instead of an employee of one. Who knew?

In related news, I just got a vendor signed that took me 3 months of difficult maneuvering, research, creative presentation work, and intense collaboration with others. Thanks to all who helped me, you are also helping me soak in a victory I so desperately needed.

I came here to kick ass and chew gum and tbh I don't like gum.

2 weeks ago

[Job] Nous recrutons en CDI un·e Site Reliability Engineer (#SRE). Nous avons un bureau à Lille, Paris, Nantes, Lyon, Oujda et Tunis mais vous pouvez aussi nous rejoindre en remote.

Infos sur le job, grille et simulateur de salaire sur cette page ⬇️

Zon Orti
2 weeks ago

You have StackOverflow,
I have SlackOverflow

#devops #slack #sre

DevOps Weekly
2 weeks ago

Should We Run Databases In Kubernetes? CloudNativePG PostgreSQL


#devops #kubernetes #sre

Lily Cohen
2 weeks ago

Have been busting my ass working on this for the last ~8 months and we're finally live!

If it's broken it ain't me
​:kubernetes:​ ​:Blobhaj_Innocent:​

#Kubernetes #SRE #Max #HBOMax #HBO

Screenshot of an iPhone screen with the ‘Max’, the ‘HBO Max’ replacement, circled in red

@jrlpy Hey at least my piece of it still working great! :kubernetes: 😅 #SRE

Mathieu Poussin
2 weeks ago

Any recommendation on a service to host npm, nuget, pypi, rubygems and rpm private packages ? Something like Nexus or Artifactory but managed.
We found but checking for other options.
#developers #dotnet #nodejs #ruby #python #sre #linux

Matthew Broberg
2 weeks ago

What's the best definition of uptime or availability?

I'm trying to crowd source other's thoughts in the space and looking for more ideas.

#Reliability #Update #Availability #SRE #PlatformEngineering

Mr Whiskers
2 weeks ago

Just read an insightful article exploring the evolution of software and hardware strategies over the past two decades! It delves into the shift from scaling out to scaling up, highlighting the benefits of leveraging single-node architectures. If you're interested in simplicity, performance, and cost-effectiveness, this is a must-read. Kudos to the author for shedding light on this fascinating topic! #technology #scalingup #sre

2 weeks ago

Emergency procedures used in incident response help stabilize the system in a degraded state. When leveraged properly, they result in faster incident response and become a foundation for further resiliency improvements in your system.

It's certainly more fun to anticipate the ways that the system can fail and recover than inventing measures on the spot.


Lily Cohen
2 weeks ago

FML. Wish me luck! 😅

#HugOps #SRE #HBOMax #Max

Screenshot of push notification that says:
HBO Max becomes Max tomorrow.
Tap to pre-order the Max app and be the first to know when it's available for download.
Adriana Villela
3 weeks ago

Had a blast speaking in my hometown at #KubeHuddle Toronto this week!

#torontoTech #devOps #sre #cloudNative #cncfAmbassadors

Mark Dorset
3 weeks ago

I had Covid-19 a couple of months ago for the first time, and while in my recovery state, I started watching Grey's Anatomy, which my friends have had a lot of fun teasing me about. I'm still going through it now (there are a lot of seasons), and it only just struck me this morning why I might still be guilty-pleasure watching it... The hospital is a lot like IT engineering groups. There are those that show empathy for the patients (users) than others. There are those that are almost single-mindedly focussed on the deep technical challenges (jargon, processes). There are those that take risks and are celebrated when they succeed. And, there's the need for being on-call and responding with calm in emergency situations. The list goes on... I guess I'm enjoying it as a parallel to the work that I've done for many years and the types of engineers I saw. (Note that I'm not comparing saving lives through tricky brain surgery as equal to ensuring that mail servers are running smoothly). It makes me ponder what each industry could learn from the other. One is that there appears to be a systemic and formalised path for juniors (interns) to be taught and grown to higher skill levels. I've done a lot of mentoring in my IT career, but I don't think it's nearly as common as it should be. #medical #engineering #sre #greysanatomy

Just changed all the constraints in my #sql schema to ON DELETE CASCADE and now I need to hand in my #SRE badge and bury my head in shame.

Stephen Townshend
3 weeks ago

If we bury our heads in the sand of technology we leave ourselves at the mercy of the wolves. #SRE is as much about changing people and organisations as it is about technology. If we don't measure these other things, how do we know if we're succeeding?

An mspaint picture of someone with their head buried in sand, while a pack of wolves stands around them about to eat them.
3 weeks ago


Sageable #Innovation Brief: @PagerDuty #AIOPs is a Powerful New ‘System of Action’ for #DevOps & #SRE (login req'd)

Not a subscriber? Contact @Sageable for a **FREE TRIAL** and to learn about flexible subscription options!

3 weeks ago

I'm a little triggered by the word "emergent" wrt large distributed systems. Indulge me.

"Oh, this system is emergent" just implies "whelp, something is weird and no point investigating BECAUSE GHOSTS"

In my experience, it is never ghosts. Your system is emergent because your predecessor decided to implement split-horizon DNS in 2003. You will discover this after a long, painful journey in which you *will* contemplate the meaning of sentience, just not in the way you initially thought.


SREcon23 EMEA's deadline is getting closer! Please submit your talks and make this an awesome conference! #srecon #sre

Stephen Townshend
3 weeks ago

Struggling to get #SRE traction within engineering teams? Gwen Berry and I share our "reliability benchmarking" approach to start the SRE conversation as part of #SLOconf here:

Stephen Townshend
3 weeks ago

Do you like the mspaint style artwork of Slight Reliability? Now you can peruse it at your leisure on Instagram:

(Trying something new... I will continue to upload new artwork from social posts, events, and the podcast + take photos and videos at events to add here) #sre #observability

mspaint drawing of Stephen sitting in a chair waving
Open Source JobHub
3 weeks ago

Are you looking for open source talent? Check out our quick start guide and get your jobs posted in minutes on #OSJobHub #hiring #OpenSource #tech #jobs #career #developer #Linux #FOSS #SRE #engineer #developer

(illustration of different people in cogs) Looking for open source talent?

If you don't have anything better to do (and even if you do!) Check out #SLOconf23 because it's awesome and I have a talk in it!

It's free, it's cool, it's got all the talks available on YouTube so you don't even have to shitpost in slack (although I'd love the engagement tbh)

My talk is on Motivating SLOs Mathematically and it is DERANGED in the best way possible

#slo #sre #sloconf

Chris Ayers
3 weeks ago

Hi Friends! Like #DevOps, #SRE, or #PlatformEngineering ? Want to share with the community something you've learned? Submit to our local conference, #DevopsDays #TampaBay happening on September 21st at the #Tampa Riverwalk at Armature Works! #cfp

konst 🇳🇿
3 weeks ago

One of my favourite reminders, as much as I struggle to apply it, is:

"That person is bound to do that, you might as well resent a fig tree for secreting juice." – Marcus Aurellius.

Also an old #SRE saying.

4 weeks ago

so is zabbix the go to opensource system monitoring tool now? or is it observium?

#sysadmin #sre #zabbix #observium #opensource #monitoring

Lily Cohen
4 weeks ago

Brand new server, new #introduction!

Hi I'm Lily, a proud
#trans #lesbian from #denver, a #mom of a 12 year old, and am #engayged to the love of my life.

This the
​:calckey:​ alt for

I'm a
#neurodivergent #WomanInTech working as an #SRE for Warner Bros Discovery, the creator #FediHost (the platform hosting this and other servers), admin of ​:mastodon:​, and the newly launched ​:calckey:​

For fun I like to play
#snowboard, #longboard, go #camping, play #videogames, hang out at the lesbian bar, play #guitar, and watch #tiktok in bed w/ my partner.

Alex Papageorgiou
4 weeks ago

How did I miss this? Legit one of the best presentations I've seen in my life: #sre #srecon

Open Source JobHub
1 month ago

Fighting for a healthy Internet for 20+ years, @mozilla has open positions for software and security engineers, marketing managers, SREs, and more. Check out the jobs now on #OSJobHub #jobs #career #engineer #software #security #marketing #finance #SRE #DataScientist

white Mozilla logo on a black background
1 month ago

Abby Bangser shares her journey from a QA to a platform engineer in this episode. We discuss the similarities between testing and infrastructure-related areas like site reliability engineering. In addition, Abby explains observability and shares some sound advice regarding implementing it.
@abangser #SRE #Observability

Open Source JobHub
1 month ago

Think you're not an open source professional? We have technical and non-technical positions at top open source companies. Browse jobs now on #OSJobHub #jobs #career #FOSS #engineer #SoftwareDeveloper #sales #marketing #SRE #DevOps #sysadmin

(illustration of people in different cogs) Open Source is more than tech! Find jobs in sales, marketing, customer support, and more! Open Source JobHub
DevOps Weekly
1 month ago

Made a crossword on common DevOps terminologies..


#aws #devops #kubernetes #sre

1 month ago

when I was a practicing #SRE spending a couple hours spelunking in production observability data was my fave way to expand my system knowledge

Give my Production Scavenger Hunt a try if you've recently adopted a new #o11y tool, joined a new team, or feel the call to explore this afternoon


1 month ago

OH: "[...] It is a micro-service architecture with multiple services"...

🤔 Well, sir, that sounds like a tautology! #SoftwareEngineering #SRE

Garrit 🧳🇯🇵
1 month ago

Wir befinden uns aktuell im Aufbau unseres #DevOps / #SRE / #Operations Teams.

Hast du Bock auf #AWS, #Kubernetes, #Terraform, #PlatformEngineering oder #CICD? Egal ob Junior oder Señor, meld' Dich gerne Mal bei mir!


Open Source JobHub
1 month ago

We're celebrating 1 year of Open Source JobHub! Let us help you cut through the noise and find the job that's right for you. Check out open positions on #OSJobHub #jobs #hiring #career #OpenSource #developer #engineer #SRE #Linux #FOSS #sales #marketing

Celebrating One Year of Open Source JobHub! Find Your Place in the Open Source Ecosystem
Reid D. M.
1 month ago

Alright for config management what've we got that people like besides:
- Puppet
- Salt
- Chef
- Nix/Guix
- Ansible (barely counts)
- CloudInit (barely counts)
- /bin/sh (blessed? cursed? you decide)

#homelab #sre #askmastodon

Mark Carter
1 month ago

👍🏻 New Relic Grok - The first GenAI observability assistant. Get deep insights from heaps of telemetry data using natural language via a chat interface #machinelearning #sre

Kelly Shortridge
1 month ago

I loved the opportunity to share my wisdom with such a thoughtful, engaged audience (and to meet likeminded nerds like @hazelweakly!)

I sincerely believe #SRE and #platform engineering teams are critical for sustaining software #resilience (and #security).

PS if you vibe with my talk and this thread, read my new book:

Kelly Shortridge
1 month ago

The video of my #SREcon talk is live:

It's all about how SREs can align their mental models of a system with reality to sustain software #resilience -- because SREs are a critical mechanism of adaptation in our systems.

If you're an #SRE you're probably not like, waking up thinking, "How will I be the mechanism of adaptation today?" so I wanted to provide some scaffolding around the concept in the talk.

This will be a 🧵of five key takeaways:

Charlie Owen
1 month ago

Want to work alongside me? (you fool if you do). We have an opening for a Principal Architect on our Engineering Enablement leadership team.

If you've got a strategic and very human eye for platform services and infrastructure then please apply!

#Jobs #PlatformEngineering #SRE #DevOps #PeopleFirst #Cloud

Prashanth Mohan
1 month ago

How do large organizations that use public #cloud services (like #aws and #gcp) optimise their costs? Are there off the shelf tools for cost optimisation and cost analysis? I imagine when people reference #finops, it is referencing a #sre function of rearchitecting and performance vs reliability tradeoffs, etc.

Markus Eisele
1 month ago

Why Cloud Zombies Are Destroying the Planet and How You Can Stop Them - Holly Cummins (

2 months ago

It's easier to fix prod than it is to fix trust.

Always remember that.

#SRE #DevOps #FediHire #Tech

Open Source JobHub
2 months ago

Looking to find your place in the open source ecosystem? Check out all jobs in #SoftwareDevelopment #sales #security #marketing #DevOps #LinuxDevelopment and more on #OSJobHub #jobs #career #OpenSource #SRE #cloud #engineer

(image of rocket taking off from a laptop) Looking for your place in open source? Set up job alerts and get started today on Open Source JobHub

@GossiTheDog #sre

It's not data exfiltration, it's an unscheduled offsite backup.

I'm speaking at SLOconf 2023! It'll be May 15-18th, and the talk is titled "Motivating SLOs Mathematically"

It's going to sound more like an unhinged late night rant than a carefully collected presentation, but I hope it gets people thinking! ❤️

"Have you ever wondered if there's something behind the experiential knowledge that we hold as best practices?" 👀

Also, big shout-out to @ahidalgosre for continually pestering me until I agreed to give a talk :ablobfoxbongo:

#sloconf #slo #sre

A banner picture for SLO conf that shows "speakers: hazel weakly" with the talk title being "Motivating SLOs Mathematically"

The conference will be May 15-18th

OH: (on tickets) Specifically my framework is WTEF. If you come to me or someone on my team for help I need to hear what you Want, what you Tried, what you Expected, and what you Found.

#SRE #DevOps

Reid D. M.
2 months ago

TAW: #sre

Emperor Palpatine photoshop captioned "now witness the power of this fully operational bash script"
Reid D. M.
2 months ago

for id in $(curl $API/list); do curl $API/get/$id; done | jq '.' #sre #taw

Hello World!

#Introduction #Introductions

I'm an #SRE at a mid-sized #BayArea tech company, looking for resources, advice, and support on unionization drives.

This is an anonymous account, for now, because I am concerned about illegal and unethical reprisals from the leadership of my organization if it gets out that I am involved in unionization before we're able to get the NLRB involved.

Expect me to be talking about the tech industry's technical and social barriers to organization.

A few years ago, a young help desk admin flew across the country for a final #interview with me for a (remote) Sysadmin role. He was nervous to be interviewing for his dream job.

The guy next to him on the hop to #Boise was an engineer at some #Idaho #Tech company, when he learned about the interview he spent the whole damn flight doing interview prep with him.

I wish I knew who that engineer was so I could thank him, kid nailed his interview. And I recently promoted him to #SRE.

Roy Greenhilt
2 months ago

Okay #linux #sre and #security nerds. I have an IP on my network. It's showing up a lot in pi-hole as a very active device.

Problem is, I can't identify it.

nmap -O sez "Too many fingerprints match this host to give specific OS details"

nmap alone says no ports are open.

But I can ping it.

MAC address says there's no vendor id for it (it starts with ae:1e:79)

So... what is this thing?

dan slimmon
2 months ago

"Eventually this customer has had enough. They leave. This represents both a sizable blow to revenue and a scathing indictment of your product’s reliability at scale. But, on the bright side, both MTTR and MTBF benefit enormously! That’ll look great on the quarterly slide deck." (~700w) #sre #devops #incidentresponse #postmortems

2 months ago

@timo are there any good alternatives? We use OpsGenie but suspect it is pretty much the same thing. Any disruptive new players in the market?

#alerting #monitoring #oncall #sre

Mark Carter
3 months ago

🤔 #Google plans to reduce its ratio of site reliability engineers to #developers by using automation. There is currently one site reliability engineer for every 10 software engineers, and the target would bring this to one #SRE for every 20 software engineers, Some Googlers are interpreting this as a sign of impending #layoffs to meet that goal, or it could mean that Google will hire half as many site reliability engineers as before, the source said

konst 🇳🇿
3 months ago

I've been on a slow-moving mission against reliance on manual runbooks, in favour of automation.

Yes people will skip steps when doing your manual runbook.

Another drawback is that your manual runbooks have no regression test against them. The steps will absolutely break.

(And no, a wiki stringing 6 different scripts together in the right order and asking people to run them on the right hosts is not automation.)

#DevOps #SRE

3 months ago

I should make more slide decks. Enjoying a wee trip down memory lane with some past team members.


#engineering #leadership #devops #sre #webdev #latency

3 months ago

Did you know that server actions taken by #Wikimedia sysadmins are logged wiki with history going back to 2004?

And since 2019 have been broadcast to the fediverse via @wikimedia_sal? (which moved instances today)

Details about how it works technically are at - it's written and maintained by @bd808

#devops #sre

increasingly keen on writing small go programs instead of bash scripts - at least for anything above 10 lines

still easy to deploy anywhere since you get a static binary, you get type checking thrown in, and you don't have to deal with bash's bullshit

#go #golang #bash #devops #sre

3 months ago

#observability related question for anyone who happens to build or run a tracing backend: what happens if I send duplicate spans?

#DevOps #SRE #OpenTelemetry

Stephen Townshend
3 months ago

In this article I explain the difference between #metrics, #logs, #traces, and #profiles as simply as possible (using the metaphor of a café). I also discuss the relative strengths and weaknesses of each telemetry type. #observability #sre

A graphic of a keyboard where each key is one of the signal types.