Anubis.

Hey… quick question, why are anime catgirls blocking my access to the Linux kernel?

Intro

I’ve started running into more sites recently that deploy Anubis, a sort of hybrid art project slash network countermeasure. The project “weighs the souls” of HTTP requests to help protect the web from AI crawlers.

If you’ve seen anime catgirl avatars when visiting a new website, that’s Anubis.

I’m sympathetic to the cause – I host this blog on a single core 128MB VPS, I can tell you some stories about aggressive crawlers!

Anubis recently started blocking how I access git.kernel.org and lore.kernel.org. Those sites host the Linux Kernel Mailing List archive and the kernel git repositories. As far as I know I do have a soul, I just wasn’t using a desktop browser… so how exactly is my soul being weighed?

Note: Linux has Tux 🐧, OpenBSD has Puffy 🐡, SuSE has Geeko 🦎 and Microsoft has Bob 🤓… nothing wrong with mascots! 😸

Problem

The traditional solution to blocking nuisance crawlers is to use a combination of rate limiting and CAPTCHAs. The CAPTCHA forces vistors to solve a problem designed to be very difficult for computers but trivial for humans. This isn’t perfect of course, we can debate the accessibility tradeoffs and weaknesses, but conceptually the idea makes some sense.

Anubis – confusingly – inverts this idea. It insists visitors solve a problem trivial for computers, but impossible for humans. Visitors are asked to brute force a value that when appended to a challenge string, causes it’s SHA-256 to begin with a few zero nibbles.

    calcString := fmt.Sprintf("%s%d", challenge, nonce)
    calculated := internal.SHA256sum(calcString)

    if subtle.ConstantTimeCompare([]byte(response), []byte(calculated)) != 1 {
        // ...
    }

    // compare the leading zeroes
    if !strings.HasPrefix(response, strings.Repeat("0", rule.Challenge.Difficulty)) {
        // ...
    }

source: lib/challenge/proofofwork/proofofwork.go#L66

If that sounds familiar, it’s because it’s similar to how bitcoin mining works. Anubis is not literally mining cryptocurrency, but it is similar in concept to other projects that do exactly that, perhaps most famously Coinhive and JSECoin.

So how do some useless SHA-256 operations prove you’re not a bot? The argument goes that this simply makes it too expensive to crawl your website.

The typical datacenter used by an AI crawler

This… makes no sense to me. Almost by definition, an AI vendor will have a datacenter full of compute capacity. It feels like this solution has the problem backwards, effectively only limiting access to those without resources or trying to conserve them.

Numbers

Let’s assume the argument has some merit and math out the claims.

We can see that with the default Anubis configuration, a typical website visitor will have to solve a challenge with a difficulty of 4.

// DefaultDifficulty is the default "difficulty" (number of leading zeroes)
// that must be met by the client in order to pass the challenge.
const DefaultDifficulty = 4

source: anubis.go

This means that a visitor must make the first 4 hex digits of the challenge hash be zero, so 16 bits (4 digits, one nibble each). Therefore, you can expect to mine a suitable nonce within 2^16 SHA-256 operations.

If every single github star on the anubis project represents a website that has deployed Anubis, how much would the cloud services bill be to mine enough tokens to crawl every single website?

// CookieDefaultExpirationTime is the amount of time before the cookie/JWT expires.
const CookieDefaultExpirationTime = 7 * 24 * time.Hour

source: https://github.com/TecharoHQ/anubis/blob/main/anubis.go#L32

At the time of writing, Anubis has 11,508 github stars.

The default configuration means mining one token gets you access for 7 days (although I think this expiration check is broken, see below), so we need 11,508 * 2^16 SHA-256 operations per week, how expensive is that?

To get some numbers, I started an e2-micro vm on Google Compute Engine, and ran openssl speed. This is what you get in the free tier.

$ openssl speed sha256
Doing sha256 for 3s on 16 size blocks: 6915549 sha256's in 3.00s
Doing sha256 for 3s on 64 size blocks: 4631718 sha256's in 3.00s
Doing sha256 for 3s on 256 size blocks: 393694 sha256's in 3.21s
Doing sha256 for 3s on 1024 size blocks: 100123 sha256's in 3.00s
Doing sha256 for 3s on 8192 size blocks: 13300 sha256's in 2.98s
Doing sha256 for 3s on 16384 size blocks: 7137 sha256's in 2.99s
version: 3.0.17
built on: Tue Aug  5 07:09:41 2025 UTC
options: bn(64,64)
compiler: gcc -fPIC -pthread -m64 ...
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
sha256           36882.93k    98809.98k    31397.40k    34175.32k    36561.61k    39107.90k

It looks like we can test about 2^21 every second, perhaps a bit more if we used both SMT sibling cores. This amount of compute is simply too cheap to even be worth billing for.

So (11508 websites * 2^16 sha256 operations) / 2^21, that’s about 6 minutes to mine enough tokens for every single Anubis deployment in the world. That means the cost of unrestricted crawler access to the internet for a week is approximately $0.

In fact, I don’t think we reach a single cent per month in compute costs until several million sites have deployed Anubis.

I’m just not convinced this math works… this is literally nothing for a souless AI vendor with a monthly cloud services budget in the 8 figures. However, the cost for real soul-owning humans with limited access to compute is high – the Anubis forums are full of complaints like these:

Alternatives

Anubis cites hashcash as the primary inspiration for their design, an anti-spam solution from the 90s that was never widely adopted.

The idea of “weighing souls” reminded me of another anti-spam solution from the 90s… believe it or not, there was once a company that used poetry to block spam!

Habeas would license short haikus to companies to embed in email headers. They would then aggressively sue anyone who reproduced their poetry without a license. The idea was you can safely deliver any email with their header, because it was too legally risky to use it in spam.

Here’s a sample haiku:

winter into spring
brightly anticipated
like Habeas SWE (tm)

Was this a good idea? I don’t know, but they really did sue a few spammers!

Workarounds

So you’re trying to read LKML, but catgirl says no… is there a solution?

My issue is I don’t want to use a desktop browser to mine the required value, so how can I get the auth cookie?

If we look at the response with curl, we can see the challenge in the HTTP headers:

$ curl -I https://lore.kernel.org/
HTTP/2 200
server: nginx
set-cookie: techaro.lol-anubis-auth=; Path=/
set-cookie: techaro.lol-anubis-cookie-test-if-you-block-this-anubis-wont-work=5d737f0600ff2dd; Path=/

That techaro.lol-anubis-cookie is the challenge, here is a quick C program to mine an acceptable token:

#include <stdio.h>
#include <string.h>
#include <openssl/sha.h>

#pragma GCC diagnostic ignored "-Wdeprecated-declarations"

int main(int argc, char **argv) {
    unsigned char hash[SHA256_DIGEST_LENGTH];
    int difficulty = 4;
    char *message;
    SHA256_CTX base;

    if (argc < 2) {
        fprintf(stderr, "usage: %s challenge\n", *argv);
        return 1;
    }

    message = argv[1];
    SHA256_Init(&base);
    SHA256_Update(&base, message, strlen(message));

    for (int i = 0; i < 1 << 18; i++) {
        char nonce[16];

        SHA256_CTX ctx = base;
        SHA256_Update(&ctx, nonce, snprintf(nonce, sizeof(nonce), "%d", i));
        SHA256_Final(hash, &ctx);

        for (int j = 0; j < SHA256_DIGEST_LENGTH; j++) {
            if (hash[j] != 0)
                break;
            if (j < (difficulty >> 1) - 1)
                continue;

            // Looks good
            puts(nonce);
            return 0;
        }
    }
    return 1;
}

Let’s run that and see what it says…

$ gcc -Ofast -march=native anubis-miner.c -lcrypto -o anubis-miner
$ time ./anubis-miner 5d737f0600ff2dd
47224

real    0m0.017s
user    0m0.016s
sys     0m0.000s

Looks okay, let’s verify that solution is correct:

$ printf "5d737f0600ff2dd%d" 47224 | sha256sum
000043f7c4392a781a04419a7cb503089ebcf3164e2b1d4258b3e6c15b8b07f1  -

It seems valid, so now we can get a signed auth cookie by sending back the value we mined:

$ curl -I --cookie "techaro.lol-anubis-cookie-test-if-you-block-this-anubis-wont-work=5d737f0600ff2dd" \
    'https://lore.kernel.org/.within.website/x/cmd/anubis/api/pass-challenge?response=000043f7c4392a781a04419a7
cb503089ebcf3164e2b1d4258b3e6c15b8b07f1&nonce=47224&redir=/&elapsedTime=0'
HTTP/2 302
server: nginx
location: /
set-cookie: techaro.lol-anubis-auth=eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJhY3Rpb24iO...OTYifQ...;

Success, this cookie is now valid for 1 week of access. Let’s validate what it sent us, the actual schema is visible in the code:

    claims["iat"] = time.Now().Unix()
    claims["nbf"] = time.Now().Add(-1 * time.Minute).Unix()
    claims["exp"] = time.Now().Add(s.opts.CookieExpiration).Unix()

source: lib/http.go

We can examine this auth token and see what Anubis gave us…

$ base64 -d <<< eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9 | jq
{
  "alg": "EdDSA",
  "typ": "JWT"
}
$ base64 -d <<< eyJhY3Rpb24iO...OTYifQ== | jq
{
  "action": "CHALLENGE",
  "challenge": "5d737f0600ff2dd",
  "exp": 1756185722,
  "iat": 1755580922,
  "method": "fast",
  "nbf": 1755580862,
  "policyRule": "dbf942088788cc96"
}

It looks like exp is the expiry date, so 1756185722, which is…

$ date --date @1756185722
Mon Aug 25 22:22:02 PDT 2025

Yep, about 7 days from the date I requested it. You can now place that into a cookie file for curl,lynx etc.

Interestingly, sending the same request the next day got me a new signed cookie!?

This seems like a bug – exchanging a mined token for an auth cookie should immediately remove the challenge from the store, or there is a double spend vulnerability.

This error benefits me, I have to mine less tokens, but I’ll open an issue 😇

Update: wow, fixed just a few minutes after opening an issue by the maintainer!

This dance to get access is just a minor annoyance for me, but I question how it proves I’m not a bot. These steps can be trivially and cheaply automated.

I think the end result is just an internet resource I need is a little harder to access, and we have to waste a small amount of energy.

Intro

Problem

Numbers

Alternatives

Workarounds

Notes