Integrity Part 2
When changing goals based on new information gets the side eye: A Facebook story

Background
Last article I started without a theme, riffed on a random interview question I was facing, and ended up with a post I’d labeled as being about “integrity.” By the end of it I’d teased a followup with another related story on the topic. Mainly I did that to force myself out of the “what will I write about trap next?” by preloading a semi-commitment. I figured there was a 20-40% chance I’d back out and cover something else. But the article seem to hit a vein with folks. Meaning that I got a few more positive notes than I usually do1. One of them was along the lines of “well, that’s a decent story - but I know you’ve got way more provocative ones - such as the time when [insert story about someone who knew a goal was being measured inaccurately but didn’t do anything about it]. ” I remembered the story they were referring to2 but didn’t want to expand on it given it was a 2nd hand “juking the stats.3” encounter.
The story I had in the mind at the end of the last article did fit reasonably into the category of story the friend was referencing. That plus the dopamine hit created by people giving positive feedback on my writing bolstered the “integrity” required to follow through on the promise of my last post.
I’m not sure this story really is all that connected to “integrity” - ie; having strong moral principles, in general. But it is an example of how if you care more about looking good in the short term than long term value it can degrade your organizations effectiveness in complex ways. Since my editor has taken the month off4 I’m just going to run with it.
The story: So I come into work one day…
And a couple folks come by and say - “Rich, can we talk?” and when I nod they grab an office for some privacy. Since they probably found me sitting near a kitchen5 in an open space I was immediately intrigued. Most conversations were fine to have out near the snacks.
But first - a bit of background
At the time, I was working with the Brand Safety team at Facebook (currently Meta). “Brand Safety” is a catchall phrase that describes the desire of advertisers (aka “Brands”) to avoid having their company caught up in an unhelpful mess arising from their advertising dollars (and thus their organization) appearing to support something controversial. For example: Proctor and Gamble most likely don’t want their ads for soap showing up in ISIS recruitment videos or white supremacist content6. Advertisers similarly (though to a lesser extent) may want to avoid their ads running alongside debated social/political issues or edgier/adult content.
There are tradeoffs involved as everyone would love to have (i) huge reach and conversion (ii) low prices, and (iii) Disney-esque brand safety7. In reality you usually cannot have all three. Most advertisers are making some tradeoffs. Large brands with broad appeal tend to care about such stuff8. This is why most content based advertising platforms have a mechanism in place to (a) evaluate content across multiple dimensions that brands care about (b) a way for brand advertisers to signal what’s OK and not-OK for them, and (c) a process to avoid matching not brand safe content with advertisers.
It’s not just that brand sensitive advertisers are avoiding what you’re think of as NSFW content. Someone in the airline industry might be especially interested in avoiding news about air travel issues. Others might just want to avoid news as a category. It gets complicated.
What’s important to the story is that our teams ran a process that labeled each new video that might be eligible for monetization along a few axes of interest. For example; did this video cover a debated social issue, have hateful content, or adult content (say rated R+)? The labeling was done by a combination of machine learning algorithms and human reviewers.
Avoiding mistakes was important as was labeling throughput. The former was necessary to maintain trust with advertisers, the latter impacted the monetization for creators (videos tended to get a lot of views early and taper off - so taking too long to get a video checked out would limit financial upside). We cared a lot about this running smoothly - and as such measuring the accuracy was a continual process, against which important goals were set.
The good news that the accuracy had been high for a while - so nothing to see there. Or was there….?
Back to our story - the big reveal
In the conference room the conversation went something like this9:
Engineer/Data Scientists: Rich, we were going over the data for video labeling accuracy and found something. It’s pretty surprising.
me: OK, let me have it
< Insert conversation about how we had a couple of review stages and the way we combined them turned out the accuracy measurement had gotten botched up for a while >
me: Oh, I see what you’re saying. Once you frame it that way it seems pretty obvious we had it wrong. TBH - I feel a bit silly for not noticing it earlier. So to summarize - it seems like the way we’ve been combining QA assessments means that our accuracy is something like 80% vs. 99%+ accurate10?
Engineer/Data Scientists: Yup - that’s about right. (some folks look meaningfully at the closed door).
me: OK, got it. Well - since you’ve been thinking about this - any ideas on how to improve the accuracy now that we better understand the state of reality?
Engineer/Data Scientists: Yes - actually we do. Let us explain…. <insert long explanation of great ideas to improve.>
me: Hmmm - so let me recap to see if I got this. We’ve thought our accuracy was 99% for the past year (after we made a set of big improvements before I joined). Now we realize it’s lower by a good amount. The “good news” is that advertisers generally were pretty happy with the way the system worked - so it’s not as though there was obviously a big problem in the system. The “better news” is that we realized we had a bunch of ideas to really make accuracy way better than it is now. The “bad news” is we were wrong about a metric we published internally and might feel a bit silly. How’s that?
Engineer/Data Scientists: yes - that’s right.
me: OK - well thanks for finding this and letting me know. So … what’s the question?
Engineer/Data Scientists: well, we’re wondering what to do here given the metric isn’t accurate?
me: Well, it’s an internal metric - so I guess we should writeup what we learned, and explain how we will pivot our planned work to implement these big improvement opportunities you found. Of course it’s complicated so we should help everyone understand the choice by providing. The alternative would be to not mention this and not invite discussion on pivoting to improve the accuracy - and that seems bad for advertisers, creators and long term trust. I’ll need your help but I’ll writeup the docs/presentation etc and speak to them, etc.
Engineer/Data Scientists: (with what seemed like a look of relief on their faces) That sounds great! But you’re sure?
me: Yeah - I mean it’s a little bit awkward but (a) I don’t see much choice as we know the metric we use as a guidepost is inaccurate, and (b) I’m actually truly pumped that we can make it way better. I’m sure that even if by improving things by like 70% the final “accurate” number is slightly below what we publish internally now it will still be a HUGE win.
Engineer/Data Scientists: (looking slightly skeptical, but still positive) - OK, sounds good.
When one is writing it’s often easy to without even making so make oneself look good. Therefore, I want to be clear that the heroes of this story are the 3-4 engineers and scientists who noticed this and brought it up - and brought forward effective ways to make the system better.
Anyways … As promised I got to work letting my boss know, and writing up some explanation as to what we’d learned and planned to do.
Why it’s good to be worse off than you thought sometimes
Now, you could see this as justifying a mess, or at least making lemons out of lemonade and retconning why you had lemons in the first place, having been sent out to buy mangos. You’d think that would be my view given how cynical and glass half empty I can come across. But I’m more nuanced than you’re giving me credit for. In this sort of scenario I really do view it from the “wow, that new opportunity looks WAY bigger than we thought” angle. It turns out I really love lemonade, including the making it part. Or I’m delusional and self justifying - so I’m going stick with the liking lemonade thing.
Next steps - goals matter, even when maybe they should matter less
Over the next week or two I met with various people who one wouldn’t want to surprise with a “new and worse” accuracy measure. While I take some personal pride in not overly focusing on managing optics upwards11 - I’m not completely naive and realized that explaining this shift could go very badly. Going badly could mean tons of extra time explaining and re-explaining ourselves, only to have the team’s strong work be unappreciated. That seemed like a lot of wasted time and a bad outcome. To avoid that I tried to be careful and intentional in taking people along.
We built up a way to walk people through what we learned, why it was missed before (including by the many people who would/could later judge the miss), and the big new opportunity to improve. Highlighting of course that feedback on the existing system had been strong - likely because of the improvements the team had made before I’d arrived.
By this point I’d been at Facebook long enough to see that goals were taken extremely seriously in terms of perceived impact and tied to compensation pretty directly. Even though I believed that goal setting itself was a less rigorous process than I’d lived at Amazon. It still didn’t occur to me that mattered with respect to this decision - as the path to long term correctness is to worry about the eventual outcome more so than short term setbacks. Believing the error rate was 99% if it really was 80% just meant there was a huge number of possible basis points improvement we would ignore. I truly did see the “oopsie” moment as a big win in disguise. Naively, I also felt that if you were going to play fast and loose with checking “correctness and alignment with outcomes” of goals on the setting side you’d need to have a faster trigger on updating the goals as new info came in.
What happened next? Well - it turns out the team was right to be nervous about “doing the right thing.” While I got understanding feedback as I helped us manage the shift - I don’t think it ever really stuck. I’m confident that at least my manager understood what we’d learned and agreed with the next path. But it was really, really hard to shake the broader org’s belief that somehow we’d screwed up and were trying to argue our way out of the fuck-up. Otherwise “how could they have ended the quarter with lower accuracy than they started?” - or so I heard A LOT.
To this day I don’t know if it was truly an understanding issue (made harder because no one would ever go backwards at a goal at this company), or it was understood but it really was just not acceptable for any reason to restate a goal.
Nothing truly “bad” happened to the team, even if it was pretty annoying to have this undercurrent of “those people messed up” and “can you explain again why accuracy went backward?” dog us for many month. Also a bit annoying to have these questions show up in one’s performance review. But in reality it was at worst a source of friction - not a truly negative outcome12. But it was grating enough to make it clear we should avoid such things in the future - I certainly don’t recall us being celebrated for finding the opportunity and shining a light on reality.
Afterward: aka the insidious impact
Alright - so “Rich and the team made the right decision and it all basically turned out OK in the end.” Boring…..
Yeah - maybe you have a point on the “boring.” This sounded way more interesting when I started. I suspect the live presentation is more interesting than the written form. But no one invited me on their podcast - so written is all I’ve got.
But … I haven’t gotten to what I view as the real point of the story. aka “the twist!”
People learn lessons from their experiences. Even those who start out with high integrity (in this case focused on the true outcomes) get worn down over time. It’s only rational to have that affect future decisions. That’s where the problem starts.
There were two folks on my product team around when this story took place. One of them was present at least for part of the saga of the “improved accuracy goal” situation. The second - let’s call them Chuck joined later.
Sometime after this all died down Chuck came to us and described what he’d learned on a recent deep dive. Much of our work had been focused on individual pieces of video content - and labeling them as an input to brand safety contributions. But we’d been thinking more about how we reviewed the “brand safety” of organizations (or in Facebook parlance “Pages” (or page owners)). The example I was fond of was that no advertiser wanted to have their ad run during a cute cat video - if the cute cat video was published by ISIS of the KKK. Chuck had recognized that our assessment of “accuracy” in brand safety was wrong - or at least could stand significant improvement. His proposal was to restate the current entity level accuracy with a new but clearly improved way of assessing each Page and then get busy making things better.
Something roughly along the lines of the following conversation took place between Chuck, Bill (the PM who’d been there a while) and myself13.
me: Gotcha Chuck - that all makes a ton of sense. I agree we should go with your new definition of page brand safety and set a related measure for assessment accuracy. What’s your timeline for next steps?
Chuck: I think we should restate the goal immediately, with the new explanation of accuracy and just get started executing the changes.
me: hmmm - I totally agree with the plan, but it’s almost the end of the quarter coming up. So what if we just wait until the end of the quarter and just insert the new goal into the planning cycle then and get going. In the meantime, knowing the change is coming you can just spec out all the work so we can hit the ground running and make great progress on the restated accuracy measurement.
Chuck: Hmmm - but why wait?, I mean the old measurement isn’t super helpful and this new one is way better - even if it does have a measurement like 40% lower than the old one. The old one isn’t really aligned with the advertiser goals, so why wait?
me: <insert telling of the story shared earlier in this article>
Chuck: Seriously, are you really sure it’s that big a deal that we should wait 30 days?
me: Bill - what do you think?
Bill: Chuck - we should go with Rich’s plan here. <laughing ruefully>
That’s how the flywheel that supports whatever your culture teaches you value works. In this case that it’s more important to be right on your quarterly goal in terms of direct incentives than long term ones.

One last observation
There’s a risk with any writing that people will take away more than you intend. Maybe Melville just really wanted to write a literal story about some super annoying, very white whale. But in the end readers get what they get from. If you take away from this that Facebook is bad and Amazon is good, that wasn’t my intent, but I can’t control that. FWIW - I don’t believe it’s that simple. For example, clearly Meta/Facebook had way better snacks and other delightful amenities when I worked there. Amazon had no snacks but was at least in my experience more disciplined in really doing a once over on goal value before getting started.
When folks look back at some of the biggest oopsie’s in Facebook’s history (think Cambridge Analytica14) there’s a feeling of “how could anyone non evil have done that?” But in cases like this for big companies I suspect that overly focusing on goals in a one dimensional way could have contributed. To use an example that may or may not be true - focusing on improving engagement without counterbalancing controls on what’s driving that engagement could have a negative outcomes in the long run on some platforms.
Based on my experience - companies that are grounded in core tenets are less likely to overlook 2nd or 3rd order effects that might be negative. Making it easier culturally to adjust goals when new information suggests a better path. In the long run that’s probably one of the most important types of cultural integrity to shoot for. Also a controllable input too - which is always nice.
I’m not saying a LOT, just more than usual. Which could be three or so. ;-)
I bet almost all readers have a story like that, given that most people spend their time working with people.
“Juking the stats” - another great single phrase depiction of systematic system weakness introduced by The Wire.
Come to think of it, where did that editor go? I’m not sure I’ve ever seen them actually.
Ironically it was easier for me to concentrate when not sitting at my assigned desk. Probably because lots of people actually working near the desk. Grabbing one of the many free spots in public areas tended to be no less quiet - but populated with what felt like more background noise. I was rarely at my desk. In hindsight it was the sort the lie you tell yourself when the real reason was “wouldn’t it be better if I was sitting four steps away from many choices of free food?” That whole Amazon run leading up to the angelic snack closet may have done more lasting emotional damage than I realized at the time. ;-)
You laugh, but (a) the whole ads on terrorist video isn’t a thought experiment, and (b) P&G has had to spend time swatting down rumors they were in league with the literal Devil. So, if brands are cautious about such things, there’s some reason behind it.
I articulated this three legs of the stool example to explain Brand Safety tradeoffs before Disney somehow got caught up in a fight with the governor of Florida and pissed off strict constructionist segments of their fanbase relative to The Little Mermaid and Snow White. Making them less the poster child for “everyone loves them” than before. For the record, I skipped seeing the Snow White live action remake only because I’m still terrified of that film after seeing it in a theater when I was around 7. That's one crazy scary witch imho.
There are some advertisers that don’t care about anything other than performance. Those are likely those with the weird ads for games in the weird games your kid downloads in some endless circle of (weird game) life.
This happened a while ago. I cannot guarantee the details are totally right. Just summarizing with no details seemed boring. I likely took some literary license here and there. Also - I collapsed the identities of several people who were in the room into one persona. This keeps me from making up several names and also I think it will soon be clear that writing dialog is not my superpower. So the sooner we get through this the better.
To be specific - I really don’t remember how far off we were, nor what we thought the original error rate was. It wasn’t like we were at 10% or even 50% accuracy. The key point was it was pretty darn different than the baseline we thought we’d had.
Which reminds me I have an article to finish on that not entirely thought through life philosophy.
Given the non-linear financial benefits of a truly stellar review at Facebook vs. a “solid” performance rating I may be grossly misrepresenting the long term impact here. But I’d rather not think about it too much.
Note - all these names are made up. Well, probably not mine. Probably.
This may not really be even close to one of their biggest goofs. But I didn’t feel like googling for more examples right now.