Book Club: The Mormon Way of Doing Business

I've heard The Mormon Way of Doing Business by Jeff Benedict is a pretty interesting book. I'm going to give it a try.

Has anyone read it? What do you think?

Big Iron-y

Scale. Reliability.

When you're in enterprise I.T.  you care a lot about those words--sometimes too much.

Like most I.T. shops, we have a very complex environment: multiple hardware, os, database and programming languages. You'd expect a fair number of outages.

Some of our biggest outages this year, however, came as a result of (or were made worse by) our attempts to insulate ourselves from outages or to provide better scale:

  • The network load balancer to our data center failed. Luckily we were redundant, right? Wrong. The redundant load-balancing element failed, too. BUT it didn't know it failed so the system thought it was operating, but we were actually down. It took us 30 minutes to realize our applications were down because of the failed load balancer.

  • We had an eerily similar situation happen with a database load-balancing solution.

  • We use a storage area network (SAN) primarily to allow us to scale storage at a cheaper price. These SAN cabinets are big iron city. Guess where four of our biggest outages this last year happened. That's right: our SAN. We've since moved to a new vendor for storage.


So what do you do? Introducing additional protection introduces additional points of failure. Is it worth it?

What's your experience?

Managing complexity

A few weeks ago Joel warned you that there would be occasional guest posts - I am the first volunteer. The brief bio on beta.tech.lds.org should provide you with some understanding of my experience and biases. In this post, I leverage those experiences and biases to offer some observations about complexity....

One of the attributes I failed to develop during my academic training was a proper appreciation for the perils of complexity. I recall creating code that was fast, efficient, and utterly un-maintainable. In the academic context this seemed fine, since I was typically the only one maintaining the code, it was rarely used beyond the end of the class, and system failure affected only my grade. However, over the past 15 years my professional experience has changed my perspective and caused me to value simple, understandable, and maintainable solutions over those which lack the foregoing traits yet are fast, efficient, and theoretically “robust.” I offer the following observations relative to the impact of complexity on the reliability, maintainability, and scalability of the systems we create.

  • Humans cause failures. In my experience, human failure is a more likely cause of downtime than the failure of a physical component. However, we often try to increase system availability by adding redundancy to mitigate for component failure. This redundancy has the collateral impact of adding complexity, which unfortunately increases the likelihood of human failure. I’m not sure we always end up net positive on the availability scale.

  • Control planes, which manage highly redundant environments, are often themselves single points of failure. The likelihood of control plane failure generally increases as the component redundancy becomes more complex.

  • Failure modes are difficult to predict and detect. As a result, sometimes secondary components go unused during primary component failure.

  • With redundant systems, we assume that the joint probability of multiple independent failures is small. Unfortunately, the assumption of independence is often incorrect.

  • Complex systems are difficult to scale. RFC 3439 quotes Mike O’Dell, the former Chief Architect of UUNET, as saying, “Complexity is the primary mechanism which impedes efficient scaling, and as a result is the primary driver of increases in both capital expenditures and operational expenditures.”

  • And finally, an observation by Willinger and Doyle et al: “…we point out a very typical, but in the long term potentially quite dangerous engineering approach to dealing with network-internal and -external changes, namely responding to demands for improved performance, better throughput, or more robustness to unexpectedly emerging fragilities with increasingly complex designs or untested short-term solutions. While any increase in complexity has a natural tendency to create further and potentially more disastrous sensitivities, this observation is especially relevant in the Internet context, where the likelihood for ‘unforeseen feature interactions’ in the ensuing highly engineered large-scale structure drastically increases as the network continues to evolve. The result is a complexity/robustness spiral [i.e., robust design → complexity → new fragility → make design more robust → …] that—without reliance on a solid and visionary architecture—can easily and quickly get out of control.”


What can be done to help manage complexity or mitigate its impact? I doubt there is a silver bullet, but the following concepts have been helpful to me.

  • Use a crutch to force yourself to remember what is important. My crutch was a note hanging on the side of my monitor to remind me that supportability, maintainability, and reliability were more important than performance and efficiency – not that the last two were not important – they were just not the most important.

  • Document what you are doing as you are doing it. If your solution is simple it should be easy to describe. Consider the documentation process a litmus test for simplicity.

  • Avoid tight coupling and interdependence. Focus on isolation, separation, and modularization.

  • You are more likely to be successful tailoring your system to the capabilities of your operators than tailoring the capabilities of your operators to your system.

  • Use automation, but continue to be vigilant about managing down the underlying complexity that the automation is abstracting.

  • This one is going to be controversial: sometimes you have to make hard tradeoffs in which you abandon some amount of functionality (and possibly redundancy) to maintain simplicity. This involves understanding the difference between what you can do and what you should do.

  • "Make everything as simple as possible, but not simpler."  --  Albert Einstein.


How has complexity manifest itself in the environments in which you work? What are you doing to manage it? Is it hypocritical for a complex post to extol the virtues of simplicity?

Tech Talk in Mountain View, CA

Our next tech talk will be in Mountain View, CA on April 25th. Some of us will be in the area for a conference so we're doubling up.

You will find all of the details here.

Look forward to seeing some of you folks from California!!

Guest Posters

Occasionally you'll see a guest poster on this site. When you do you'll know it because a) the IQ of the post will seem much higher than typical and b) the author's name will be listed.

Take it easy on them!

A Peculiar I.T. Shop

In many ways the Church's I.T. operations resemble those of a normal company:

  • Network systems

  • Email systems

  • Workflow applications

  • Financial & HR applications

  • Training


The Church is peculiar in that each of these systems is enormously more complicated than it might be for a typical company because each of them potentially supports millions of members of the Church, people who aren't considered "employees."

Let me give you a few examples.

Network systems which support operations. Our team provides the networks for the buildings where Church employees work and for our data centers. The networking needs here are pretty typical. However consider the number of chapels around the world. They all need some measure of connectivity either for the ward clerk system (MLS) or for family history centers. All broadband connections into chapels are required to use centrally-filtered internet access. Layer upon the sheer numbers of network connections the complexity of having networks in countries like the Philippines and some of the islands of the South Pacific, places where the network infrastructure isn't as robust as it is in other countries.

Email systems. Providing an email system for Church employees is no big deal--standard stuff. However we provide email for all of the LDS missionaries across the world and for local ecclesiastical leaders.

Workflow applications. The Church has some pretty typical workflow applications: budgeting, ERP, policy management, intranet content creation, etc. But we're also creating applications for use by all of the Church members. In the United States and Canada missionaries now sign up for their missions using an online tool. Once an assignment is made by our Church leadership this tool facilitates all of the logistics of getting them to the right place at the right time with the right preparation. We'll be providing more and more of these applications to Church membership over time to help decrease time spent in Church administration.

Financial & HR applications. Along with the solutions we use to manage the general ledger, pay taxes, manage facilities, et al (all typical stuff), we must also track both the donations of the members worldwide and the dispersion of those funds for welfare, missions, building construction, and so forth. With the number of units and the variety of currencies, you can imagine how complex this task is.

Training. Finally, we don't just provide training for employees. We provide teaching & training resources for members and for local ecclesiastical leaders worldwide. LDS.ORG and other Church-sponsored web sites garner over 50 million unique page views per month (not including FamilySearch). You can't get much more mission critical than supporting the 11:00pm Saturday LDS.org rush to prepare talks and Sunday school lessons for the next day. :)

By conventional measures, the I.T. operation which supports Church employees should be simple and routine. But our "extended workforce," I guess you could call it, increases the complexity of our I.T. operations significantly, requiring that we act a lot bigger than we are.

It's one of the things that makes our jobs so fun!

Interviewing

Whatever you think about the book Good to Great it's hard to argue one of its premises--that great companies don't exist without great people. I'm a believer.

In my experience a great engineer can be equal to two, three or even more average engineers. They have good attitudes. They're productive. They do things right and minimize re-work. They're not defensive. They communicate with others effectively. They look for things to do when they've got spare capacity. They're easy to talk with. And they inspire others. I just love them. People like this are easily worth what their skills and experience demand in the market.

So how do you find them?

That's the key, huh? With a crack (or even average) HR department, finding people to bring in for interviews isn't terribly difficult. It's a little harder at the Church as the intersection of individuals who have temple recommends, want to live in Utah, have the skills & aptitudes we're looking for and are willing to work for less than market wages is small. Still we're able to get people in the door. A good HR department will not only bring many people in the door, but a healthy percentage of them will turn out to be the folks we want to hire.

Once folks are in the door for interviews, however, the hard work begins for the rest of the team. People do not inherently know how to interview. Discovering "great people" in an interview is not intuitive. It takes training, preparation and a good measure of thoughtfulness.

Here are some suggestions that can improve your interviewing techniques.

  • Interviewing for experience is one of the biggest mistakes people make. You can read someone's experience on a resume. Prepare prior to the interview by reviewing the resume carefully. Don't waste your time asking questions you could have found out with a little preparation.

  • Figure out what you're going to ask ahead of time. Write down the questions you'll ask and think about what you're trying to discover with each question.

  • Save time to write down your feedback after the interview. It helps you process the information you've gathered and will help down the line when you're looking back.

  • Smart candidates ask lots of questions and keep the interviewer talking. People love to talk about themselves. I've had many, many managers tell me they loved such-and-such candidate, where afterward I've asked detailed questions and found the person was duped into telling the candidate about his job, the organization, his family, the weather, what's it like to live in Utah, etc. The "sell" you think you're accomplishing with an interview like that just isn't that important.

  • I look for three things when interviewing a candidate: intelligence, passion for technology, attitude. Finding questions that test the second two is easy. The first is more difficult. I like asking problem-solving questions. In my opinion, questions that require an "a-ha" moment, some flash of non-intuitive or non-deducible inspiration, aren't that useful. I prefer questions where you can watch them think. Encourage them to think out loud and/or to use the board. And I'm perfectly willing to let the candidate stew while they think through the hard questions. You need people who can think.


  • Leave your technology biases at home. So they love Linux and you don't--so what? So they use a different editor than you do, or they put their curly braces in the wrong place--so what? I don't care if a candidate speaks COBOL, Java, .net, Ruby, Fortran, Forth, LISP, or any other language, for that matter. A candidate's opinion on which operating system is most secure is just not super relevant. If she can solve problems then her technical biases don't matter.

  • Ask real questions--even simple ones. It helps you see them think in context. If you're hiring a developer, ask them coding questions. If you're hiring an architect, have them create architectures. If you're hiring a business manager, have them write a business plan with you.


  • Try to use the same questions for a set of applicants coming through for the same job. It lets you get a relative view.


These are just a few ideas. The one point I'd get across is this: take recruiting and interviewing seriously. Getting great people is the most important thing you can do as a manager or as an organization.

What additional tips do you have?