El on Flickr.
It’s been a few years since I’ve been part of Instrument, but I couldn’t be prouder of my friends and colleagues at the studio. Some phenomenal digital craft handmade in Portland, Oregon.
We get to craft a lot of nice UX here at Instrument, but it’s difficult to show, since a lot of it is touchscreen, iPad, or interactive kiosk work out in the real world. So we made this reel to feature some of our favorite UX projects from the past few years.
Featured clients: Google, Umpqua Bank, Nike, XBOX and more.
After the neologism “NoOps” made its way around the web to much ridicule this week, Lucas at AppFog took a stab at reclaiming the term following its rejection by developers and operations teams alike. I originally posted this as a comment on a link aggregator, but the original post has been deleted so I’ve moved it here to give it a home. The original comment follows:
It is unfortunate that many young tech companies, in an effort to differentiate themselves, work to invent a new term and cloak it with the appearance of a movement or industry trend. It’s further unfortunate when those to whom this term is being marketed recoil with laughter and amusement at its implications, forcing its purveyors to double down and attempt to re-assert control over the meaning of a term which may please analysts and folks with Twitter accounts who fashion themselves “thought leaders,” but whose roots and meaning are firmly in marketing rather than industry and art.
So let’s talk about the marketing term “No-Ops.”
In AppFog’s attempt to assert a meaning over the marketing term, we’re told that it means “developers can code and let a service deploy, manage and scale their code.” I have yet to meet a single company facing a sizable technical challenge whose performance and availability needs could be met by a strait-jacket PaaS with an autoscale button. I’m not saying that many smaller companies don’t use such services successfully – that’s very much true. But let’s be clear: the dream of push-button autoscaling while letting “somebody else” handle deployment, monitoring, instrumentation, and anything that may go wrong in the middle of the night is a marketing dream. As engineers, we have a business need and emotional need to own our availability . Placing the sum total of your operations into a PaaS providers hands, biting down hard on that marketing dream of NoOps, and throwing the pager out the window doesn’t mean you have nothing to worry about. It just means that you don’t care, and can do nothing about it.
But it’s not enough to stop there. Instead, the author sees fit to posit his / her marketing term as the continuation of a history in the evolution of web operations, and proclaim that the term is one that “traditional operations” personnel revile. If “No-Ops” is a success of any sort, it’s a marketing win, not a technical one – and certainly not an operational one. If you think that you can place every single egg in your company’s basket in the hands of PaaS providers and never worry again until you have to twiddle the auto-scale dial after which everything will be fine, you’re only fooling yourself.
Who will migrate data on an oversubscribed Postgres shard to two more shards by night by partitions of account IDs? Who will enable dual-writes, run a migration, then cut over reads as we move into Riak? Who will notice a spike in await on the Kafka RAID, recognize the week-over-week trends pointing to your team running out of iops, order, and rack a set of new boxes with SSDs before it’s too late? Who’s watching the switches and keeping track of which racks have GigE and which have 10GigE uplinks to the next rack to avoid oversubscribing the network?
It’s rare in our industry to see a promise so removed from reality. Indeed, if NoOps is a movement at all, it’s powered only by the dream of not having to do one’s job which is to ensure that a company is able to deliver on their business value. Who among us isn’t tempted by such a promise? 
I don’t often make it to conferences, but occasionally submit a talk or two on ideas that have guided (and emerged from) my thought, research, and work over the past year. Here’s the second for 2012:
Faced with unprecedented growth and equally demanding calls for reliability and predictability, we as engineers find ourselves called to develop stable distributed applications with solid scalability characteristics and seamless failure modes – and to get them into production by yesterday. While some applications can be designed as stateless, shared-nothing systems, others (such as databases, caches, stream processing engines, and other stateful systems) require predictable computation and a more complex distribution story. This session provides an overview of popular distributed application design strategies (Dynamo, master / slave, and centrally-coordinated but self-organizing systems), load balancing techniques, warm handoff and rebalancing, and clean handling of failures.
Fortunately, the past year has seen a wealth of new and maturing work in the field, with frameworks such as Riak Core / Akka, coordination systems like ZooKeeper, and higher-level libraries such as LinkedIn’s Norbert and Boundary’s Ordasity and Scalang –– all of which help contain the complexity of distributed systems design, reducing the implementation of many common patterns to a few dozen lines of code in one’s language of choice. Appropriate selection among frameworks such as these can aid agile implementation of many systems while avoiding the NIH trap — or at least illuminating the dangers along the way.
This session offers a deep dive into the tradeoffs of different algorithms and techniques for load balancing, methods of containing cascading failures, and designing for resilience. While distributed architectures can introduce a surprising amount of indeterminism if implemented without care, approaches such as central coordination, atomic broadcast, and consistent hashing / token-based partitioning allow engineers to make continuous assertions about the condition and behavior of a system at runtime, ensuring its health and continued operation.
Of course, nothing matters more than the final operational story proven out in production: ensuring that one’s applications are running, available, and delivering on business value 24/7 – while hopefully allowing engineers and ops teams to sleep a solid 8. To that end, we’ll also discuss the runtime and reliability tradeoffs involved in failover, distribution, and balancing approaches, ways to minimize volatility across a cluster, and reflect on a few horror stories / postmortems.
With a focus on practical strategies for implementing sound systems correctly, attendees can expect to leave with a solid understanding of the space, a survey of approaches that can be implemented quickly (including a batteries-included framework for building a distributed application with cluster membership, automatic load balancing, and handoff in 25 lines of code), and a path for further exploration and research.