Thursday, March 30, 2023

CVE-2022-4696 mitigation on GKE

There's a CVE on GCP that could lead to a privilege escalation (see security bulletin GCP-2023-001). This can be mitigated by blocking the affect syscall with a seccomp profile. Unfortunately Kubernetes doesn't make it easy to deploy a profile, you can only reference a file under the kubelet's root directory, and GKE doesn't provide an easy facility to deploy those files to all the nodes in the cluster. So here's a quick workaround to mitigate the CVE if you need to mitigate it quickly for specific workloads that are at risk (of course the recommended course of action is to upgrade GKE instead).
apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    app.kubernetes.io/name: seccomp-config
    app.kubernetes.io/part-of: seccomp
  name: seccomp-config

---

apiVersion: v1
kind: ConfigMap
metadata:
  name: seccomp-profiles
  labels:
    app.kubernetes.io/name: seccomp-config
    app.kubernetes.io/part-of: seccomp
data:
  CVE-2022-4696.json: |
    {
      "defaultAction": "SCMP_ACT_ALLOW",
      "architectures": [
        "SCMP_ARCH_X86_64",
        "SCMP_ARCH_X86",
        "SCMP_ARCH_X32"
      ],
      "syscalls": [
        {
          "names": [
            "io_uring_enter",
            "io_uring_register",
            "io_uring_setup"
          ],
          "action": "SCMP_ACT_ERRNO"
        }
      ]
    }

---

apiVersion: apps/v1
kind: DaemonSet                     
metadata:
  name: seccomp-config
  labels:
    app: seccomp-config
    app.kubernetes.io/name: seccomp-config
    app.kubernetes.io/part-of: seccomp
spec:
  selector:
    matchLabels:
      app: seccomp-config
  template:
    metadata:
      labels:
        app: seccomp-config
        name: seccomp-config
        app.kubernetes.io/name: seccomp-config
        app.kubernetes.io/part-of: seccomp
    spec:
      containers:
      - name: seccomp-config
        image: busybox
        command:
        - "sh"
        - "-c"
        - "ls -lR /host && cp -v /config/*.json /host/ && sleep infinity"
        volumeMounts:
        - name: hostdir
          mountPath: /host
        - name: seccomp-profiles
          mountPath: /config
        resources:
          requests:
            cpu: 1m
            memory: 1Mi
          limits:
            cpu: 25m
            memory: 25Mi
        livenessProbe:
          exec:
            command:
            - "true"
          periodSeconds: 600
        securityContext:
          privileged: true
      volumes:
      - name: seccomp-profiles
        configMap:
          defaultMode: 420
          name: seccomp-profiles
      - name: hostdir
        hostPath:
          path: /var/lib/kubelet/seccomp
          type: DirectoryOrCreate
      serviceAccountName: seccomp-config
This deploys the seccomp profile on all the nodes with a daemon set (alternatively consider using the security profiles operator). You may need to deploy this in a namespace where privileged pods are allowed. Then any pod where you want to plug this hole you need to add this to the securityContext of the container:
          seccompProfile:
            localhostProfile: CVE-2022-4696.json
            type: Localhost

Tuesday, September 13, 2022

tar hanging when extracting an archive during a Docker build

Consider the following excerpt of a Dockerfile building nginx: 

RUN \
mkdir -p /usr/src/ngx_brotli \
&& cd /usr/src/ngx_brotli \
&& git init \
&& git remote add origin https://github.com/google/ngx_brotli.git \
&& git fetch --depth 1 origin $NGX_BROTLI_COMMIT \
&& git checkout --recurse-submodules -q FETCH_HEAD \
&& git submodule update --init --depth 1 \
&& cd .. \
&& curl -fSL https://nginx.org/download/nginx-$NGINX_VERSION.tar.gz -o nginx.tar.gz \
&& curl -fSL https://nginx.org/download/nginx-$NGINX_VERSION.tar.gz.asc  -o nginx.tar.gz.asc \
        && sha512sum nginx.tar.gz nginx.tar.gz.asc \
&& export GNUPGHOME="$(mktemp -d)" \
&& gpg --keyserver keyserver.ubuntu.com --recv-keys 13C82A63B603576156E30A4EA0EA981B66B0D967 \
&& gpg --batch --verify nginx.tar.gz.asc nginx.tar.gz \
        && rm -rf "$GNUPGHOME" \
&& tar -C /usr/src -vxzf nginx.tar.gz

Looks pretty simple, right? Yet it's hanging. Halfway through the archive extraction, tar goes into an endless busyloop:

root@docker-desktop:/# strace -fp 29116
strace: Process 29116 attached
wait4(-1, 0x7ffff15dcf24, WNOHANG, NULL) = 0
wait4(-1, 0x7ffff15dcf24, WNOHANG, NULL) = 0
wait4(-1, 0x7ffff15dcf24, WNOHANG, NULL) = 0
wait4(-1, 0x7ffff15dcf24, WNOHANG, NULL) = 0
wait4(-1, 0x7ffff15dcf24, WNOHANG, NULL) = 0
wait4(-1, 0x7ffff15dcf24, WNOHANG, NULL) = 0
wait4(-1, 0x7ffff15dcf24, WNOHANG, NULL) = 0
[...]

More interestingly, I couldn't reproduce this by manually running the shell commands one by one in an identical interactive container.

I realized that the gpg --recv-keys call left behind a couple process, dirmngr and gpg-agent, and that those appear to be the culprits (for reasons not yet clear to me).

Easiest way to "fix" this was to ask them to terminate by deleting the $GNUPGHOME directory (which dirmngr watches with inotify).

Just posting this here in case anyone else ever gets puzzled the way I did.

Saturday, July 7, 2018

Creating an admin account on Kubernetes

I spent a bunch of time Googling how to do this so I figured it could help someone else if posted the steps to add an admin account on a Kubernetes cluster managed with kops.

k8s has service accounts but that's not what you want to create an admin account — equivalent to having root privileges on the cluster.  Instead you simply need to create a certificate/key pair for the user and sign it with the master's CA (certificate authority).

In this example we'll create an account for user foobar.
  1. Create a private key:
    openssl genrsa -out foobar.key 2048
    For extra security you can also opt for 4096 bits for the key but for some reason kops defaults to 2048 right now.
  2. Create a CSR (Certificate Signing Request)
    openssl req -new -key foobar.key -out foobar.csr -subj '/CN=foobar/O=system:masters'
    The CN (Common Name) contains the user name and the O (Organization Name) must be system:masters to be a super-user.
  3. Fetch the master's private key from S3, from the bucket kops was configured to use:
    aws s3 sync $KOPS_STATE_STORE/$NAME/pki pki
    Here the variables $KOPS_STATE_STORE and $NAME are the ones referred to in the kops documentation. For example:
    aws s3 sync s3://prefix-example-com-state-store/myfirstcluster.example.com/pki pki
    All the PKI files will be downloaded from S3 into the local pki directory.
  4. Issue the certificate using the master's CA: openssl x509 -req -in foobar.csr -CA pki/issued/ca/*.crt -CAkey pki/private/ca/*.key -CAcreateserial -out foobar.crt -days 3650
At this point you could give the private key (foobar.key) and the certificate (foobar.crt) to the user, but if you want to be a bit nicer and generate a self-contained kubectl config for them, here's how:
kubectl --kubeconfig=kcfg config \
  set-credentials $NAME --client-key=foobar.key --client-certificate=foobar.crt --embed-certs=true
kubectl --kubeconfig=kcfg config \
  set-cluster $NAME --embed-certs=true --server=https://api.k8s.example.com --certificate-authority pki/issued/ca/*.crt
kubectl --kubeconfig=kcfg config \
  set-context $NAME --cluster=$NAME --user=$NAME
kubectl --kubeconfig=kcfg config \
  use-context $NAME
You can then hand over the kcfg file to the user and they could use it directly as their ~/.kube/config if they don't already have one.

Don't forget to rm -rf pki to delete the files you downloaded from S3.

Tuesday, January 9, 2018

Why I left Arista Networks

edit: I rejoined Arista Networks in early 2020 ;)
5 years ago, I wrote a blog post on why I joined Arista Networks back in 2012.  As I am now suddenly and unexpectedly leaving the company, I figured I'd write a bit of a retrospective and perhaps bring some closure to this otherwise fairly quiet blog.  I know that the original blog post has been used by candidates considering to join Arista, and even though I didn't write it with this in mind originally, I wanted to give a bit of an update to those considering to join the company in 2018 and beyond.

Why I left Arista

I was very happy and thriving at Arista and wasn't looking for a change.  But I guess change was looking for me and somehow managed to convince me to join a new startup as co-founder.  I won't say much more on that topic for now but it's one of those opportunities that was too big to pass up on.  It's not in the networking industry, so not competitive with Arista.

I really struggled with this change, it took some massive amount of questioning things to accept the idea to leave such a great company, with a great team, working on great projects, to throw myself into the unknown and push myself way outside of my comfort zone.  But I felt like I had to try, I had to seize this opportunity.

Arista in 2018

Everything that I wrote in my original blog post is still true as far as I'm concerned.  The big difference is that in the meantime Arista has established itself as one of the truly remarkable success stories in recent Silicon Valley history.

Now “Arista Networks” may not be a household name like Google or Facebook, but make no mistake, Arista's success in the networking industry is on the same track as Google's success in search or Facebook's success in social media.

Many others have tried (or are trying) to claim a piece of the networking cake dominated by Cisco, and I really cannot think of any other company succeeding in any meaningful way in that space.  If anything, previously established players have all but disappeared (e.g. Force10, Brocade) or become largely irrelevant (e.g. Extreme).  As the two remaining industry giants, Cisco and Juniper, are tumbling, steadily losing market share and focus, the brightest rising star in the datacenter networking industry has been Arista.  And yet Arista still only commands a low double digit market share, so there is a lot of room to grow further while also strategically expanding the TAM (Total Addressable Market).

There are a number of tailwinds benefiting the company:
  • Competitors still can't get their act together and continue to overpromise and underdeliver.  Quality issues continue to plague them.  Arista manages its roadmap carefully and will not hesitate to say "no" to a customer if they cannot commit to what the customer is asking for, rather than promise something that they know cannot be delivered on time or at all.  Quality remains paramount and the team is constantly trying to improve automated testing processes to ensure that every new release that comes out is better than the previous one and that no regression sneaks back into the code.  This includes things like automatically running tests based on what code changed by leveraging code-coverage information gleaned during earlier test runs, automatically triaging and root-causing unexplained test failures, and more.  There is a strong emphasis on building/improving tools and creating a development environment where everyone can be productive [1].
  • The routing industry is collapsing in the datacenter networking industry. This trend started a couple years ago and should by now be clear to anybody in the industry.  The gap between a "switch" and a "router" has been shrinking steadily to the point that we now commonly see datacenter switches play the role of edge peering boxes, backbone routers, cross-datacenter interconnects, etc.  This is hurting Juniper particularly bad, because this space was their bread and butter.  But with the wrong hardware and the wrong software, they cannot compete with the density and cost per port of commodity hardware.  The only lead they kept, and mostly the only differences that remain between switches and routers, are in specialized routing software.  And since Arista is a software company, not a hardware company, the team has been hard at work to implement routing features and scale the routing code way beyond what has ever been done on datacenter networking platforms.  This is probably one of the biggest boost to Arista's TAM and much work remains to be done in that space to close that gap fully.  It's very exciting.
  • Arista has been leading innovation in the networking industry. Whenever a new chip comes out, Arista is often the first to make it bridge a packet, sometimes before the chip vendor has done it themselves.  On many occasions, Arista has managed to push the hardware at a scale that exceeds the data sheet of the underlying hardware.  This is only made possible by Arista's edge on the software front.  Furthermore, Arista has influenced chip design with the silicon vendors they partner with to further widen the gap between the cost/performance of commodity hardware and vendor-proprietary ASICs like those designed at great cost by Cisco and Juniper.  Arista has been leading industry standards like 25/50G and more recently 200/400G, with the new OSFP initiative.  Arista was the first to take to market new technologies like VXLAN, internet-scale routing in a sub-$20k 1RU top of rack switch, streaming telemetry and network programmability, etc.
  • Arista's execution has been flawless.  The company faced some pretty serious challenges, including a set of massive lawsuits from the 800 pound gorilla with a virtually unlimited legal budget that would stop at nothing to slow them down or tarnish their image.  Despite all this, the company kept its head down and its focus, fought fearlessly for what was right, and managed to deliver 14 consecutive "beat and raise" quarters that turned it into a Wall Street darling.  This is really a function of the amazing exec team that has been at the helm of the company.
  • Arista is in the segment of the networking industry that is growing the fastest. There are a lot of products and areas in the overall networking industry but datacenter networking is the one growing the fastest, because everything is going to the cloud, and the cloud runs on this stuff.  Arista has managed to remain laser focused on this specific segment of the industry, slowly expanding into connected areas where opportunities existed to go after some low hanging fruits (e.g. tap aggregation, routing, and more).  Arista is present at a large scale in virtually all the major cloud environments out there.  Again, the name might not quite have the mindshare of a Google or a Facebook, but these days it's virtually impossible to use the Internet without going through Arista devices.
And while the headcount has more than quintupled since I joined, the company has managed to remain surprisingly apolitical and bullshit-free.  There have been growing pains, for sure, and it's not like everything is perfect and just happy rainbow unicorns either, but the company culture is essentially unchanged, and that's what actually matters.

So it was really, really, really freaking hard to say goodbye.  I've been lucky to be very happy everywhere I worked in my career, but to this point Arista has been by far the best company I've worked at.

So... As Douglas Adams would say: So long, and thanks for all the fish.


[1] A footnote worthwhile adding regarding the emphasis on tooling.  Ken Duda, one of the co-founders, is very involved in developer tools.  After becoming a Go fanboy he spent months working on a new way to put together development workspaces using Docker containers.  There are several people working with him on this new tool now and it has become the de-facto standard way of managing Arista's massive workspaces, which comprise millions of lines of code and often need to pull in tens of gigabytes of stuff.  This has saved everybody a lot of time and helped support / enable changes to the CI (Continuous Integration) workflow.

Additional disclaimer for this post: the views expressed in this blog are my own, and Arista didn't review/approve/endorse anything I wrote here.

Monday, May 1, 2017

Getting cash without selling stocks

I haven't posted anything here in a while, just been busy with life and hating Blogger's interface (and being too lazy to move to something else).  But I wanted to share some of what I've learned recently on the ways one can get liquidity, because I've run into too many people who told me "damn I wish I'd known this earlier!".

Disclaimer: This post, or anything else on this blog, is not financial / legal / investing / tax advice, just some pointers for you to research.  Each person's situation is different and what works well for one person may not be applicable for another.

So you IPO'ed or got an exit?

Good for you, your shares are now convertible to real $$.  The generally accepted strategy is to sell your stock and buy a low-cost diversified portfolio to avoid the risk of keeping all your eggs in the same basket.  You can either do it yourself by buying Vanguard funds, or use a service like Wealthfront (disclaimer: referral link) to do it for you.

Now, many people also want to use this liquidity to buy a home.  This is especially true in the Bay Area where the real estate market is crazy, with the median home price in SF being around $1.2m as of today, and your typical jumbo loan requiring a 20-30% downpayment, i.e. $240k to $360k of cash on hand.  I personally never had anything even close to this much cash, so I never thought buying a home was an option for me, even though I could technically afford the monthly mortgage payments (see this great rent-vs-buy calculator to run the maths for you, you might be surprised).

I've seen a lot of people sell off a big chunk of their shares, or even sometimes all of it, to buy a home or even just make a downpayment.  They were then hit with a huge tax bill and, sometimes, the regret of having cashed out too soon and not having captured some of the upside of their stock.

There is a lot of research that shows that IPOs typically underperform the market in the short term (1-2 years), and that investors buying at post-IPO prices typically underperform the market in the long term (3-6 years) as well.  Wealthfront has a nice blog post comparing the different selling strategies across a few different scenarios.

But if your strategy of choice is to diversify over the course of the next 3-5 years, as opposed to cashing out as quickly as possible, then that makes it much harder to get access to the cash to buy a home, unless you're willing to take a big tax hit.

Borrowing cash against assets

Enter the wonderfully dangerous world of lines of credits you can get against your (now-liquid) assets.  I didn't even know this was a thing until a couple months ago, but there is a plethora of financial products to get liquidity by borrowing against your stocks.  SBLOC (Securities-Backed Lines of Credit), PAL / LAL (Pledged / Liquidity Asset Line), pledged-asset mortgage, etc.  And margin loans.  They all come with slightly different trade-offs but the basic idea is essentially the same: it's a bit like taking an HELOC (Home Equity Line of Credit) against your assets.  If don't know what that means, don't worry, keep reading.

I'm going to focus on margin loans because that's what I've researched the most, the easiest to access and most flexible, and the best deal I've found in my case, with Interactive Brokers (IB) offering interest rates currently around 2% (indexed on overnight Fed rate).

Your brokerage account typically starts as a cash account – i.e. you put cash in (or you get cash by selling shares) and you can use cash to buy stocks.  You can upgrade your account to a margin account in order to be able to increase your buying power, so that your broker will lend you money and use the shares you buy as a collateral.  But that's not what we're interested here, we already have shares and we want to get cash.

Margin loan 101

I found it rather hard in the beginning to grok how this worked, so after being confused for a couple weeks I spent a bunch of time reading select chapters of a couple books that prepare students taking the “Series 7 Examination” to certify stockbrokers, and the explanations there were much clearer than anything else I could find online.  It’s all very simple at the end and makes a lot of sense.  As I mentioned earlier, this works mostly like a HELOC but with investment leverage.

Let’s take a concrete example.  You open a margin account and transfer in $2000 worth of XYZ stock.  Your account now looks like this:

Market Value (MV)= $2000
Debit (DB)= $0(you haven’t borrowed anything yet)
Equity (EQ)= $2000(you own all the stock you put in)

There are two margin requirements: the “initial” margin requirement, required to open new positions (e.g. buy stock), and the “maintenance” margin requirement, needed to keep your account in good standing.  With IB the initial margin requirement (IM) is 50% and maintenance margin (MM) is 25% (for accounts funded with long positions that meet certain conditions of liquidity, which most mid/large-cap stocks do).

The difference between your equity and your initial margin requirement is the Special Memorandum Account (SMA), it’s like a credit line you can use.
SMA = EQ - IM = $2000 - $1000 = $1000.
(Detail: SMA is actually a high watermark, so it can end up being greater than EQ - IM if your stocks go up and then down).

This $1000 you could withdraw in cash (a bit like taking a HELOC against the part of your house that you own) or you could invest it with leverage (maybe 2x, 3x leverage, sometimes more).

So let’s say you decide to withdraw the entire amount in cash (again, like taking an HELOC).  You now have:
MV= 2000
DB= 1000(you owe the broker $1000)
EQ= 1000(you now only really own half of the stock, since you borrowed against the other half)
SMA= 0(you depleted your credit line)
MM= 500(25% of MV: how much equity you need to be in good standing)

Now your equity is $1000, which is greater than your maintenance margin of $500, so you’re good.  Let’s see what happens if XYZ starts to tank.  For example let’s say it drops 25%.

MV= 1500(lost 25% of value)
DB= 1000(the amount you owe to the broker obviously didn’t change)
EQ= 500(difference between MV and DB)
SMA= 0(still no credit left)
MM= 375(25% of MV)

In this case the account is still in good standing because you still have $500 of equity in the account and the maintenance margin is $375.

Now if the stock dips further, let’s say your account value drops to $1350, we have:

MV= 1350
DB= 1000
EQ= 350
MM= 337.5

Now you’re running close to the wire but you’re still good as EQ >= MM.  But if the account value was to drop a bit further, to $1332, you’d be in the red and get a margin call:

MV= 1332
DB= 1000
EQ= 332
MM= 333

Now EQ < MM, your equity is short of $1 to meet the maintenance margin.  The broker will liquidate your XYZ shares until EQ == MM again (and perhaps even a bit more to give you a bit of a cushion).

Bottom line: if you withdraw your entire SMA and don’t open any positions, you can only absorb a 33% drop in market value before you get a margin call for maintenance margin violation.  Obviously if you don’t use the entire SMA, you then have more breathing room.

Obviously this whole thing is super safe for the broker, if they start to liquidate you automatically and aggressively when you go in margin violation (like IB would do), there is almost no way they can’t recover the money they loaned out to you, unless something absolutely dramatic happens such as your position becoming illiquid and them becoming stuck while trying to liquidate you (which is why they have requirements such as minimum daily trading volume, minimum market cap, minimum share price, which, if not met, result in increased margin requirements – IPO shares are also typically subject to 100% margin requirement, so you typically have wait if you're just about to IPO, but it's not clear to me how long exactly – might be able to get some liquidity before the hold up period expire?).

You have to run the numbers, based on the assets you have, how much would they need to tank given the amount you borrow, before you get a margin call.  Based on that and your assessment of the likelihood that such a scenario would unfold, you can gauge what amount of risk you're taking, what's a reasonable balance to maintain vs not.

Negotiating margin terms

I very recently figured out that while Interactive Brokers seems the only one with such low interest rates (around 2% when everybody else charges 5-8%), with the exception perhaps of Wealthfront's Portfolio Line of Credit clocking in at around 3-5%, you can actually negotiate the published rates.  I've read various stories online of people getting good deals with the broker of their choice, and usually the negotiation involves transferring your assets to IB and coming back to your broker saying "this is what I get with IB, but if you are willing to earn my business back, we can talk".

I did this with E*TRADE recently, they not only matched but beat slightly IB's rate, and made it a flat rate (as opposed to IB's rate being a blended rate, which would only beat my negotiated rate for really large balances) along with a cash incentive and a couple months of free trading (I'm not an active trader anyways but I thought I'd just mention it here).  Morgan Stanley was also willing to give me a similar deal.  I'm not a big fan of E*TRADE (to say the least) but there is some value in keeping things together with my company stock plan, and I also appreciate their efforts to win me back.

Buying a home

So once you have access to liquidity via the margin loan, the cool thing is that you don't pay anything until you start withdrawing money from the account.  And then you'll be paying interest monthly on whatever balance you have (beware that the rate is often based on daily Fed / LIBOR rate, so keep an eye on how that changes over time).  Actually, you don't even have to pay interest, it'll just get debited from your balance — not that I would recommend this, but let's just say the terms of this type of loan are incredibly flexible.

You can then either do a traditional mortgage, where the downpayment comes in part or in full from the margin loan – generally speaking lenders don't want the downpayment to be borrowed money, but since the margin loan is secured by your assets, that's often fine by them (I've had only one lender, SoFi, ironically, turn me down due to this, other banks where fine with it), or if you have enough assets (more than 2x the value of the property) borrow the entire amount in cash, make a cash offer (unfortunately a common occurrence in the Bay Area), and then get a mortgage within 90 days of closing the deal.  This is called delayed financing, and it works exactly like a mortgage, except it kicks in after you closed on the property with cash.  This way you pay yourself back 70-80% of the amount, enjoy mortgage interest deduction (while it lasts) and the security of having a fixed rate locked for 30 years.

I know at least two people that are also considering using this trick to do expensive home remodels, where it's not clear just how expensive exactly the work will be, and having the flexibility of getting access to large amounts of cash fast, without selling stocks / incurring taxable events at inconvenient times, is a great plus.

This whole contraption allows you to decouple your spending from the sale of your assets.  Or you may decide to pay the loan back in other ways than by selling assets (e.g. monthly payments using your regular income), thereby preserving your portfolio and saving a lot in taxes.  Basically a bit like having your cake and eating it too.

Friday, March 8, 2013

Why I joined Arista Networks

Over the past few months, many people have asked me why I jumped from the "web world" to the "network industry" to work at Arista Networks.  I asked myself this question more than once, and it was a bit of a leap of faith, but here's why I did it, and why I'm happy I did it.

Choosing a company to work for

There is a negative unemployment rate in Silicon Valley provided you know how to type on a keyboard.  It's ridiculous, but all the tech companies are hiring like there's no tomorrow.  So needless to say, when the time came to make a move, I had too many options available to me.  It's not easy to decide where you'll want to spend the next X years of your life.

My #1 requirement for my next job was to work with great people.  This was ranking above salary, likelihood of company success, and possibly even location (although I really wanted to try to stay in SF).  I wanted to feel like I felt when I was at Google, when I could look around me, and assume all these engineers I didn't know were smarter than me, because most of them were.  I could have returned to Google too, but I was in for something new.

I quickly wound up with 3 really good offers.  One from CloudFlare, who's coming to kick the butt of the big CDNs, one from Twitter, which you know already, and one from this datacenter networking company called Arista.  The first two were to work on interesting, large-scale distributed systems.  But the last one was different.

Why did I interview with Arista?

So why did I decide to interview with Arista in the first place?  In November 2010, I was shopping for datacenter networking gear to rebuild an entire network from scratch.  I heard about Arista and quickly realized that their switches and software architecture was exactly what I'd been looking for the previous year already (since I left Google basically).  We ended up buying Arista and I was a happy customer for about 2 years, until I joined them.

I don't like to interact with most vendors.  Most of them want to take you out to lunch or ball games or invite you at useless events to brainwash you with sales pitches.  But my relationship with Arista was good, the people we were interacting with on the sales and SE side were absolutely stellar.  In April 2011, they invited me to an event they regularly hold, a "Customer Exchange", at their HQ.  I wasn't convinced this would make a good use of my time, but I decided to give it a shot, and RSVPed yes.

I remember coming home that evening of April, and telling my wife "wow, if I was looking for a job, I'd definitely consider Arista".  The event was entirely bullshit-free, and I got to meet the exec team, who literally blew me away.  If you know me, you know I'm not impressed easily, but that day I was really unsettled by what I'd seen.  I didn't want to change jobs then, so I tried to get over it.

Over the following year, I went to their 2 subsequent customer exchanges, and each time I came back with that same feeling of "darn, these guys are awesome".  I mean, I knew the product already, I knew why it was good, as well as what were its problems, limitations, areas for improvement, etc, because I used it daily.  I knew the roadmap, so it was clear to me where the company was headed (I unfortunately couldn't say so for Twitter).  Everybody – mark my word – everybody I had met so far at Arista, with no exception, was stellar: support (TAC), sales, a handful of engineers, and all their execs and virtually all VPs, marketing, bizdev, etc.

So I decided to give it a shot and interview with them, and see where that would take me.

What's the deal with Arista's people?

Arista isn't your typical Silicon Valley company.  First of all, it doesn't have any outside investors.  The company was entirely funded by its founders, something quite unusual around the Valley, doubly so for a company that sells hardware.  By the way, Arista isn't a hardware company.  There are 3 times more software engineers than hardware engineers.  Sure we do some really cool stuff on the hardware side, and our hardware engineers are really pushing the envelope, allowing us to build switches that run faster and in a smaller footprint than competitors that use the same chips.  But most of the efforts and investments, and ultimately what really makes the difference, are in the software.

Let's take a look at the three founders, maybe you'll start to get a sense of why I speak so highly of Arista's people.

Andy Bechtolsheim, co-founder of Sun Microsystems, is one of the legends of Silicon Valley.  He's one of the brains who put together hardware design, except he seems to do so one or two years ahead of everybody else.  I always loved his talks at the Arista Customer Exchange as they gave me a glimpse of how technology was going to evolve over the next few years, a glimpse into the future.  Generally he was right, although some of this predictions took more time than anticipated to materialize.
Andy is truly passionate about that stuff, and he seems to have a special interest for optical technologies (e.g. 100Gbps transceivers and such).  He's putting the German touch to our hardware engineering: efficiency.  :)

Then there is David Cheriton, professor at Stanford, who isn't on his first stint with Andy.  The two had founded Granite Systems in '95, which got acquired in just about a year by Cisco, for over $200M.  This apparently made David a bit of a celebrity at Stanford, and in '98 two students called Larry & Sergey sought his advice to start their company, a search engine for the web.  David invited them over to talk about their project, and also invited Andy.  They liked the idea so much that they each gave them a $100k check to start Google.  This 2x$100k investment alone yielded a 10000x return, so now you know why Arista didn't need to raise any money :)
David is passionate about software engineering & distributed systems, and it should be no surprise that virtually all of Arista's software is built upon a framework that came out of David's work.

Last but not least, Ken Duda, who isn't new to the Arista gang either, as he was the first employee at Granite in '95.  Ken is one of the most brillant software engineers I've ever met.  Other common points he shares with Andy and David: super low key, very pragmatic, visionary, incredibly intelligent, truly passionate about what he's doing.  So passionate in fact that when Arista was hosting a 24h-long hackathon (Hack-a-Switch), he was eager to stay with us all night long to hack on some code (to be fair I think he slept about 2 hours on a beanbag).  I will always remember this WTF moment we had around 5am with some JavaScript idiosyncrasy for the web interface we were building, that was epic (when you're tired...).
Not only Ken is one of those extraordinary software engineers, but also he's one of the best leaders I've met, and I'm glad he's our CTO as he's pushing things in the right direction.

Of course, it's not all about those three guys.  What's even more amazing about Arista, is that our VPs of engineering are like that too.  The "management layer" is fairly thin, with only a handful of VPs in engineering and handful of managers who got promoted based on meritocracy, and that "management layer", if I dare to call it this way, is one the most technically competent and apt to drive a tech company that I've ever seen.

I would also like to point out that our CEO is a woman, which is also unusual, unfortunately, for a tech company.  It's a coincidence that today is International Women's Day, but let me just say that there is a reason why Jayshree Ullal frequently ranks high in lists such as "Top X most influential executives", "Top X most powerful people in technology", etc.  Like everybody else at Arista, she has a very deep understanding of the industry, our technology, what we're building, how we're building it, and where we should be going next.

Heck, even our VP of marketing, Doug Gourlay, could be VP of engineering or CTO at other tech companies.  I remember the first time I saw him at the first Arista Customer Exchange, I couldn't help but think "here comes the marketing guy".  But his talk not only made a lot of sense, he was also explaining why our approach to configuring networks today sucks and how it could be done better, and he was spot on.  For a moment I just thought he was really good at talking about something he didn't genuinely understand, a common trait of alluring VPs of marketing, but as he kept talking and correctly answering questions, no matter how technical, it was obvious that he knew exactly what he was talking about.  Mind=blown.

Company culture

Hack-a-switch
So we have a bunch of tech leaders, some of the sharpest minds in this industry, who are all passionate, low-key, and want to build the best datacenter networking gear out there.  This has a profound impact on company culture, and Doug made something click in my mind not so long ago: company culture is a lasting competitive advantage.  Company culture is what enables you to hire, design, build, drive, and ship a product in one way vs another.  It's incredibly important.

Arista's culture is open, "do the right thing", "if you see something wrong/broken, fix it because you can", a lot like Google.  No office drama – yes, Silicon Valley startups tend to have a fair bit of office drama.  Ken is particularly sensitive to all the bullshit things you typically see in management, ridiculous processes (e.g. Cisco's infamous "manage out the bottom 10% performers in your organization"), red tape, and other stupid, unproductive things.  Therefore this simply doesn't exist at Arista.

One of the striking peculiarities of the engineering culture at Arista that I haven't seen anywhere else (not saying that it doesn't exist anywhere else, just that I personally never came across this), is that teams aren't very well defined groups.  Teams form and dissolve as projects come and go.  People try to gravitate around the projects they're interested in, and those who end up working together on a particular project make up the de facto team of that project, for the duration of that project.  Then they move along and go do something else with other people.  It's incredibly flexible.

So all in all, I'm very happy I joined Arista, although I'm sure it would have been a lot of fun too with my friends over at Twitter or CloudFlare.  There are a lot of very exciting things happening right now, and a lot of cool challenges to be tackled ahead of us.

Jan 2018 update: I just left Arista.

Additional disclaimer for this post: the views expressed in this blog are my own, and Arista didn't review/approve/endorse anything I wrote here.

Wednesday, February 6, 2013

Google uses captcha to improve StreetView image recognition

I just stumbled on one of these for the first time:
Here's another one:
These were on some Blogger blogs. Looks like Google is using captchas to help improve StreetView's address extraction quality.

Sunday, January 27, 2013

Using debootstrap with grsec

If you attempt to use debootstrap with grsec (more specifically with a kernel compiled with CONFIG_GRKERNSEC_CHROOT_MOUNT=y), you may see it bail out because of this error:
W: Failure trying to run: chroot path/to/root mount -t proc proc /proc
One way to work around this is to bind-mount procfs into the new chroot.  Just apply the following patch before runnning debootstrap:
--- /usr/share/debootstrap/functions.orig       2013-01-27 02:05:55.000000000 -0800
+++ /usr/share/debootstrap/functions    2013-01-27 02:06:39.000000000 -0800
@@ -975,12 +975,12 @@
                umount_on_exit /proc/bus/usb
                umount_on_exit /proc
                umount "$TARGET/proc" 2>/dev/null || true
-               in_target mount -t proc proc /proc
+               sudo mount -o bind /proc "$TARGET/proc"
                if [ -d "$TARGET/sys" ] && \
                   grep -q '[[:space:]]sysfs' /proc/filesystems 2>/dev/null; then
                        umount_on_exit /sys
                        umount "$TARGET/sys" 2>/dev/null || true
-                       in_target mount -t sysfs sysfs /sys
+                       sudo mount -o bind /sys "$TARGET/sys"
                fi
                on_exit clear_mtab
                ;;
As a side note, a minbase chroot of Precise (12.04 LTS) takes only 142MB of disk space.

Friday, November 9, 2012

Sudden large increases in MySQL slave lag caused by clock drift

Just in case this ever helps anyone else, I had a machine where slave lag (as reported by Seconds_Behind_Master in SHOW SLAVE STATUS) would sometimes suddenly jump to 7 hours and then come back, and jump again, and come back.


Turns out, the machine's clock was off by 7 hours and no one had noticed!  After fixing NTP synchronization, the issue remained, I suspect that MySQL keeps a base timestamp in memory that was still off by 7 hours.

The fix was to STOP SLAVE; START SLAVE;

Thursday, October 18, 2012

Python's screwed up exception hierarchy

Doing this in Python is bad bad bad:
try:
  # some code
except Exception, e:  # Bad
  log.error("Uncaught exception!", e)
Yet you need to do something like that, typically in the event loop of an application server, or when one library is calling into another library and needs to make sure that no exception escapes from the call, or that all exceptions are re-packaged in another type of exception.

The reason the above is bad is that Python badly screwed up their standard exception hierarchy.
    __builtin__.object
        BaseException
            Exception
                StandardError
                    ArithmeticError
                    AssertionError
                    AttributeError
                    BufferError
                    EOFError
                    EnvironmentError
                    ImportError
                    LookupError
                    MemoryError
                    NameError
                        UnboundLocalError
                    ReferenceError
                    RuntimeError
                        NotImplementedError
                    SyntaxError
                        IndentationError
                            TabError
                    SystemError
                    TypeError
                    ValueError
Meaning, if you try to catch all Exceptions, you're also hiding real problems like syntax errors (!!), typoed imports, etc.  But then what are you gonna do?  Even if you wrote something silly such as:
try:
  # some code
except (ArithmeticError, ..., ValueError), e:
  log.error("Uncaught exception!", e)
You still wouldn't catch the many cases where people define new types of exceptions that inherit directly from Exception. So it looks like your only option is to catch Exception and then filter out things you really don't want to catch, e.g.:
try:
  # some code
except Exception, e:
  if isinstance(e, (AssertionError, ImportError, NameError, SyntaxError, SystemError)):
    raise
  log.error("Uncaught exception!", e)
But then nobody does this. And pylint still complains.

Unfortunately it looks like Python 3.0 didn't fix the problem :( – they only moved SystemExit, KeyboardInterrupt, and GeneratorExit to be subclasses of BaseException but that's all.

They should have introduced another separate level of hierarchy for those errors that you generally don't want to catch because they are programming errors or internal errors (i.e. bugs) in the underlying Python runtime.