Production Incident

Posted in :

stlplace

November 16, 2022

Reading Time: 2 minutes

(Update 01-07-2024) Was doing some hashtag#productionSupport this past week, for the most part. At one time, it reminded me once at the credit card company, we let the one (out of 6) node (server) running for 8+ hours during the maintenance window, before we brought the other 5 nodes back online. hashtag#theFun hashtag#withProduction hashtag#SRE

(Original 11-16-2022) Had a production incident recently after production deployment. I was not intimately familiar with the oracle index charge (drop) and impact on other apps (lack of visibility and lack of perf testing environment). It’s hard to prevent this sort of thing from happening but I think as developers we should learn from those mistakes and try to avoid similar mistakes in the future.

For production incidents, it’s best for the dev team (or production support team) to know before the customers call in: especially in the case of external customers. I recall at my immediate previous employer, during pandemic we had this “Screen and go” web app. One morning the app went down: it turns out to be the auto scaler issue. Another time the Twilio SMS were blocked by the carriers. We found out both via the customer service desk.

Technical Assessments

At the credit card company I worked at a while ago, we have this thing called technical assessments for Change. Change is usually a production deployment of code / infrastructure change. The author of the Change need to add the technical assessments of the impacted team, I recall in one change I had to include 16 external teams tech assessments. It took some time and effort for me to get their blessing. But the plus side of all this, if we did this for the mentioned incident above, the incident may not happen (if the impacted team did seriously evaluate potential impact to their apps).

Btw, I just realized I did not write a lot on production, other than this one.

East or west, rewriting is the hardest

July 23, 2024

Reading Time: < 1 minute 再说说software rewrite：我在上班的地方，参加过两次rewrite （重写）的项目，现在的项目是其中一个，大概做了两年多了：中间换了不知道多少contractors，现在好像还没有看到the light at the end of the tunnels。每次此类项目，management 一开始都有非常optimistic 的expectations：但实际工作中不是这样。 Why rewrite is hard? Or the hardest? I can think of a few reasons. 1) Management usually has very realistic (overly optimistic) expectations. I don’t know what kool aid the management drinks, but somehow there is current “sort of working system”, somehow lead
Read More
My new favorite hotel in Shanghai 近年来我在上海住的旅馆

July 21, 2024

Reading Time: < 1 minute 最近四次经过上海，我一般都住虹桥火车站CBD旁边的凯悦嘉轩酒店也就是 Hyatt Place，她家就在火车站/地铁站🚉天桥对面：可以走路走到，前提是最好手上不要拿太多的行李🧳。具体地址是：虹桥商务区申虹路9号, 上海, 中国, 201106. 我一般是用凯悦 World of Hyatt 的点数换的：一般是五千点一个晚上，我觉得还可以。凯悦的点数我是直接从 Chase Sapphire Preferred (CSP) Ultimate Rewards 那里转过来的，一点换一点。一般认为CSP Ultimate Rewards 一个点值2美分，五千点相当于1百美元。她家上海店的布置跟美国的很像，但是她家的早餐自助餐比美国的要多很多选择。我开玩笑说早上吃得饱一点，到下午四五再吃晚饭就省了一顿饭的钱（again开个玩笑）。其实我发现到了我现在这个年纪，尤其要注意吃饭吃个七分饱，要避免桌上美食的诱惑，与肚里的馋虫作坚决的斗争。这次我带了家眷一起住酒店。好像她们也考虑了我的想法：一个房间有两份免费自助餐，再加上12岁以下小孩全免。市场价自助餐大概是68 还是 128 元。我这次就把自助餐让给了我的家眷：我去了永和大王，以前我也去过那里吃过中餐或者晚餐。这次我发现点餐完全是微信小程序：我的手机漫游有时不是很给力。除了虹桥天地 The HUB 以外，那个旁边还有一个龙湖天地天街 Paradise Walk – 我喜欢那里的西贝莜面村。这两个地方都有相当多的吃饭的地方，中外的都有不少。嘉轩酒店旁边还有一个很不错的浙江台州的餐馆叫台乡缘（在甬虹路上，从嘉轩/金臣中心过马路就到）。我最近经朋友邀请，去吃过，她们哪里的海鲜和台州菜相当不错。大屏幕上的台州风土人情也是非常好。我老家是宁波，在台州北面，我们都靠海。台州菜和宁波菜有相通之处。风土人情也是：比如说摘茶叶，我小时候摘过。随着城市后的发展，以前一些农村的小吃慢慢地就消失了，还有方言，因为慢慢地大家都讲普通话了。如果你喜欢咖啡，那一带有相当多的咖啡店：我想得起来的有 – 星巴克，瑞幸咖啡，加拿大的 Tim Hortons, 上海本土走高端路线的 Manner 咖啡，美国的 Peet’s coffee, 醒美咖啡 SayMay Coffee, 等等，应该还有很多。在龙湖天街那边有一张地图，上面标着大多数的咖啡店。很多牌子的店，不止一家：比如说星巴克，瑞幸。备注：虹桥火车站，地铁站，再加上虹桥机场，加在一起，又叫虹桥交通枢纽（地面上还有长途汽车站）。地下，路面都比较复杂。我记得我2017年第一次从火车站到旅馆，花了点时间才找到，走到。我上面写了地址。油管上有不少关于火车站的介绍，也可以看看我的一个短视频介绍。我在虹桥天地那里，有时也会给人指路。这一次也指了一次：帮助一对夫妻从虹桥天地 THE HUB的地下，搭乘 Cordis Hotel（康得思）电梯到旅馆门口，坐上车。最后我把嘉轩酒店 Hyatt Place
Read More
Storytelling 讲好故事，让别人去说吧

July 18, 2024

Reading Time: 2 minutes Storytelling is a very important skill to have, especially in a country like the USA, and at the this say of age. I think it’s important to have this skill over the history too. For example, in this article, we can see storytelling helped human beings survive, and I quote below: Since our ancestors evolved
Read More
VPN software I have used personally

July 17, 2024

Reading Time: 2 minutes Sometimes called 科学上网 in CHN Mac: VPN Prime – Unlimited Proxy 4+ iPhone, iPad VPN – Super Unlimited Proxy 4+ Both are free, and both have add-ons that are available for purchase. I verified them on my trip and they work for the most part, iPhone app sometimes takes a few tries to get going. The Mac app works out of
Read More

Share this:

Like this:

Related Posts

East or west, rewriting is the hardest

My new favorite hotel in Shanghai 近年来我在上海住的旅馆

Storytelling 讲好故事，让别人去说吧

VPN software I have used personally