diff options
Diffstat (limited to 'src/catastrophic-failure.thrust')
-rw-r--r-- | src/catastrophic-failure.thrust | 51 |
1 files changed, 51 insertions, 0 deletions
diff --git a/src/catastrophic-failure.thrust b/src/catastrophic-failure.thrust new file mode 100644 index 0000000..e2a7709 --- /dev/null +++ b/src/catastrophic-failure.thrust @@ -0,0 +1,51 @@ +--- +title: Catastrophic Failure +subtitle: your system is larger than you think +date: 2020-07-01 +--- +{% extends 'templates/base.html' %} +{% block body %} + <nav> + <a href='/'>> index</a> + </nav> + <header> + <h1>{{ title }}</h1> + <p>{{ subtitle }}</p> + </header> + <article> +{% markdown %} +In a past life, I was a software verification researcher, and attended the [NASA Formal Methods](https://shemesh.larc.nasa.gov/NFM/) conference in 2014. +The opening speaker at NFM gave a talk with various anecdotes about various kinds of failure, from "an implicit double-to-float cast wasted an entire mission" to _catastrophic failure_, in which someone dies. +Catastrophic failure at NASA is big and obvious and makes the news: if the rocket goes wrong then everyone onboard dies in a big fireball a mile above the ground. + +They then moved on to work they had done for the US Postal Service, who had been experiencing thefts by counter staff, and wanted a new point-of-sale system to combat this. +This involved finding a set of constraints for operating the cash register such that they could do their job while being unable to take anything extra. +The speaker laid out the constraints, and offered a $100 Postal Order to the person who could find the Catastrophic Failure. + +> _audience:_ could it become locked out and inoperable?<br> +> _speaker:_ no.<br> +> _audience:_ could they take money by doing …?<br> +> _speaker:_ no.<br> + +This went on for a while, as we tried to find deadlocks or vulnerabilities in the system. +Eventually, + +> _audience:_ what if someone came in with a gun and demanded the money?<br> +> _speaker:_ bingo.<br> + +USPS decided that stemming theft was not worth risking the death of an employee. + +What this story tells us is that software has consequences. +It's easy to look at a missile guidance system or High Frequency Trading and say "that's unethical!", but far more mundane software performing far more mundane tasks can also have dangerous or even lethal failure modes. + +For example, banks are notoriously bad at updating names, and deadnames can resurface at inopportune moments that risk outing the user to housemates. +Parental spyware will out a kid to their parents, risking homelessness or suicide. + +As engineers we must keep the _whole_ system in mind, including its users and their wider lives and situations. +We must respond to our products' worst failure modes, no matter how unlikely we believe them to be. +You cannot rollback a corpse. + +And if that means a product or feature does not launch, then so be it. +{% endmarkdown %} + </article> +{% endblock %} |