Summing Multiplicative Functions (Pt. 1.5)

Abstract. We will examine and implement the Min-25 sieve, another technique for computing partial sums of multiplicative functions. It is closely related to Lucy’s algorithm.

Welcome back to Multiplicative Mondays. It’s been a while!¹

Last time we learned some tricks and techniques for computing partial sums of multiplicative functions. If you missed it, I highly recommend at least reading some of the introductory material so you’re familiar with multiplicative functions and the standard ideas for computing their partial sums. This time, we’re going to dedicate some attention to the Min-25 sieve, so a working knowledge of Lucy’s algorithm is needed.

The idea is very straightforward, and comes from the observation that it is possible to efficiently perform Lucy’s algorithm backwards. That is, to start at counts of primes (or sums of primes, or whatever) and efficiently reversing the sieve process to get the desired sum.

Min-25 Sieve

Let’s start by seeing in practice how it’s possible to perform Lucy’s algorithm in reverse. We won’t worry about the Fenwick tree stuff yet, so we’ll be satisfied with an $O(n^{3/4})$ runtime here.
Generally it’s very fast.

The main equation we rely on for prime counting (and sums of primes, or any sums of a completely multiplicative function over primes) is

\[S_g(v, p) = S_g(v, p-1) - g(p)\left[S_g(v/p, p-1) - S_g(p-1, p-1)\right]\]

where $S_g(v, p)$ was the sum of $g(n)$ over $1 < n \leq v$ who remain after sieving out all of the composite integers with some prime factor up to $p$.
Also remember that we had $S_g(v, p) = S_g(v, p-1)$ if $p$ is not a prime.

Initially, we set $S_g(v, 1) = \sum_{n \leq v} g(n)$ for the partial sums of $g$, which we assume we can do in basically constant time. Usually $g(n)$ is just a power of $n$ and so its partial sums are handled by a small polynomial in $n$.

We also know we only need to sieve out the primes up to the square root of the value we’re interested in, and so $S_g(n, \sqrt{n})$ is the sum of $g(p)$ over all primes $p \leq n$.

Now, we’re going to attempt to take these sums over primes and un-sieve all of the composites back in, to hopefully end up back where we started with sums of $g(n)$ over $1 < n \leq v$. Ideally we won’t do any more work than we did to sieve the primes out in the first place.

We’re going to be referencing the article by box on CodeForces which is one of the main English references for this technique.

Suppose we want to sieve back in a prime $p$. This means we’re going to be calculating the contribution of composite integers whose smallest prime factor is $p$. Such composite integers are either a power of $p$, or a product of a power of $p$ times some integer whose smallest prime factor is bigger than $p$. In the second case we know the contributions of those integers whose prime factors are all bigger than $p$, since we can use the current state of the sieve for that.

So, we can put this together as

\[S_g(v, p-1) = S_g(v, p) + \sum_{i \geq 1} g(p^i) \left[S_g(v/p^i, p) - S_g(p, p)\right] + \sum_{i \geq 2, \, p^i \leq v} g(p^i)\]

The first summation is also restricted so that $v/p^i > p$, since we’re trying to sum over the sieved integers strictly greater than $p$ and at most equal to $v/p^i$. This gives us the condition $p^{i+1} \leq v$ which is important to restrict the number of terms we have to deal with.

We’ll follow box and slightly reindex the second sum so things are cleaner:

\[S_g(v, p-1) = S_g(v, p) + \sum_{i \geq 1, \, p^{i+1} \leq v} \Big(g(p^i) \left[S_g(v/p^i, p) - S_g(p, p)\right] + g(p^{i+1})\Big)\]

We are able to compute this sum directly since there are vanishingly many composite prime powers up to $x$. Once we’ve un-sieved each prime $p$, which we have to do in decreasing order this time, we will end up with $S_g(v, 1) = \sum_{1 < n \leq v} g(n)$ as desired.

Now, we assumed from the beginning that this sum was very easy to compute, since we use that to get the sums over primes in the initial Lucy step. The key insight behind the Min-25 sieve is that, if we have a different multiplicative function $f(n)$ that we wish to sum, and $f(p) = g(p)$, then we can un-sieve in a slightly different way and obtain the partial sums of $f(n)$ instead of $g(n)$. None of the un-sieving steps require any knowledge of the partial sums of our function over integers, only the partial sums over primes. Helpfully, the partial sums of $f$ and $g$ over primes are identical.

Algorithm (Min-25 Sieve)

Initialize S[v] as $\sum_{1 < n \leq v} g(n)$ for each key value v.

For primes p in 2..sqrt(x), in increasing order, and then for each key value v satisfying v >= p*p, in decreasing order, update the value at v by S[v] -= g[p] * (S[v div p] - S[p-1]).

At this point, S[v] is the sum of $g(p)$ over all primes $p \leq v$, for each key value v.

Next, for primes p in 2..sqrt(x), in decreasing order this time, and then for each key value v satisfying v >= p*p, again in decreasing order, perform the following:
4a. For each $i \geq 1$ such that $p^{i+1} \leq v$, update the value at v by S[v] += f[p, i] * (S[v div p^i] - S[p]) + f[p, i+1].

Let’s now implement this in Nim, using the FIArray container from our last adventures. First we need to modify our Lucy’s algorithm to compute sums of some completely multiplicative function, which is easy enough:

proc lucy(x: int64, g: proc(p: int): int64, G: proc(v: int64): int64, modulus: int64 = 0): FIArray =
  ##Modified standard Lucy algorithm.
  ##Computes sums of g(p) over primes.
  ##Given is g(p) for primes p, and G(v) = sum g(n) over 1 < n <= v.
  ##Optional modulus.
  var S = newFIArray(x)
  var V = keysFI(x)
  for i, v in V.pairs:
    S.arr[i] = G(v)
  for p in eratosthenes(S.isqrt.int+1):
    #since p is small we have
    let sp = S.arr[p-2] #= S[p-1]
    for i in countdown(V.high, 0):
      let v = V[i]
      if v < p*p: break
      S.arr[i] -= g(p)*(S[v div p] - sp)
      if modulus > 0: S.arr[i] = S.arr[i] mod modulus
  return S

And here is how the described unlucy step looks, implemented as an operation on the FIArray of prime sums:

proc unlucy(S: var FIArray, f: proc(p: int, e: int): int64, modulus: int64 = 0): void =
  ##Unsieves all of the primes. Second phase of Min-25 algorithm.
  ##Computes sums of f(n) over 1 < n <= v.
  ##Given is S[v] = sum of f(p) over primes p <= v, and f(p, e) = f(p^e).
  ##Optional modulus.
  let x = S.x
  var V = keysFI(x)
  let primes = eratosthenes(S.isqrt.int+1)
  for k in countdown(primes.high, 0):
    let p = primes[k]
    #since p is small we have
    let sp = S.arr[p-1] #= S[p]
    for idx in countdown(V.high, 0):
      let v = V[idx]
      if v < p*p: break
      #iterate over p^(i+1) <= v with i >= 1
      var i = 1
      var u = v div p # = v//p^i
      while u >= p: #p^(i+1) <= v
        S.arr[idx] += f(p, i) * (S[u] - sp) + f(p, i+1)
        if modulus > 0: S.arr[idx] = S.arr[idx] mod modulus
        inc i
        u = u div p

Let’s try something silly then. We can use these functions as follows:

proc f(p: int, e: int): int64 = 1
proc g(p: int): int64 = 1
proc G(v: int64): int64 = v-1

var S = lucy(1e12.int64, g, G)
echo S[1e12.int64]
unlucy(S, f)
echo S[1e12.int64]

This program takes about 1.6 sec to output 37607912018, the number of primes up to $10^{12}$, and then a further 2.8 sec to output 999999999999, which is the number of integers between $2$ and $10^{12}$. So it looks like everything is working, and we’re able to fully unlucy the prime counts.

Now let’s try to compute partial sums of the multiplicative function satisfying $f(p^e) = e$ for $e \geq 1$. This is not hard, we just change f(p, e) above:

proc f(p: int, e: int): int64 = e
proc g(p: int): int64 = 1
proc G(v: int64): int64 = v-1

let x = 1e12.int64
var timer = startTimer()
var S = lucy(x, g, G)
unlucy(S, f)
echo S[x]
timer.stop

It spends the same total amount of time and outputs 1943593277778.

It is extremely important to remember that this is the sum $\sum_{1 < n \leq 10^{12}} f(n)$, and usually when this technique is required, the sum including $1$ is asked for. You can get this by just adding $1$ to every key value $v$ in the FIArray.

The Old Way

This particular example we just did is a perfect candidate for using the powerful numbers trick, since $f(p) = 1$ means we can approximate $f(n)$ by the multiplicative function $g(n) = 1$, and then $(f/g)(n) = 0$ unless $n$ is a powerful number, in which case here we have $(f/g)(n) = 1$.

That is, magically, we have

\[\sum_{n \leq x} f(n) = \sum_{n \text{ powerful}} \left\lfloor \frac{x}{n} \right\rfloor\]

We can implement this by iterating over the powerful numbers, or we can use the unique representation of each powerful $n$ as a product $a^2 b^3$ where $b$ is squarefree:

\[\sum_{n \text{ powerful}} \left\lfloor \frac{x}{n} \right\rfloor = \sum_{\substack{b \leq x^{1/3}\\ b \text{ squarefree}}} \sum_{a \leq \sqrt{x/b^3}} \left\lfloor \frac{x}{a^2b^3}\right\rfloor\]

It is possible to compute, for each distinct $y := \lfloor x / b^3 \rfloor$, the sum $\sum_a \lfloor y/a^2 \rfloor$, in time less than $\sqrt{y}$. We will just do it by brute force which is more than good enough for our purposes. Here’s an implementation:

let x = 1e18.int64
let cbrtx = 1e6.int
var timer = startTimer()
var not_sqfr = newSeq[bool](cbrtx+1)
for i in 2..cbrtx:
  if i*i > cbrtx: break
  for j in countup(i*i, cbrtx, i*i):
    not_sqfr[j] = true
var total = 0'i64
for b in 1'i64..cbrtx:
  if not_sqfr[b]: continue
  let a_max = isqrt(x div (b*b*b))
  for a in 1..a_max:
    total += x div (a*a*b*b*b)
echo total
timer.stop

You may notice the summation bounds changed significantly.. this version can compute the sum of $f(n)$ over $n \leq 10^{18}$ as 1943596433648484881 in only 5 seconds on my little laptop. It helps to not need to store any of the powerful numbers or anything like that, and just add their contributions directly.

Why Use Min-25 Then?

The problem we’ll solve to motivate ourselves to use this new method is to sum the multiplicative function defined by $f(p^e) = e$ when $p \equiv 1 \bmod 4$ and $f(p^e) = 0$ otherwise.

Can we use something like the powerful numbers trick here?

At primes, $f(p) = 1$ if $p \equiv 1 \bmod 4$ and $0$ otherwise. So our approximating function is $g(n) = [n \equiv 1 \bmod 4]$. Unfortunately though, $g(n)$ is not multiplicative. For example $0 = g(3)g(7) = g(21) = 1$ breaks multiplicativity. Oh no!²

You can try thinking more about how to fix this, but Min-25 is very well suited to handle this case. To deal with the dependence on primes and their residues mod 4 you will have to have some variant of Lucy’s algorithm on arithmetic progressions implemented (see my prime counting post for how to do it).

Here’s how our Min-25 implementation looks:

let x = 1e12.int64
var timer = startTimer()
var S = lucyAP(x, 4)[0] #counts of primes = 1 mod 4
timer.mark "Lucy step completed."
proc f(p: int, e: int): int64 = 
  if p mod 4 != 1: return 0
  else: return e
unlucy(S, f)
timer.mark "Unlucy step completed."
echo S[x]
timer.stop

The Lucy in arithmetic progressions takes about 3 sec, and the unlucy step takes about 3 sec as well. Here, I get the answer 66671480822. It’s not as fast as other specialized techniques, but it is extremely flexible and well suited to many multiplicative functions that are much more troubling to try doing most other ways.

One Further Note

If the multiplicative function $f(n)$ you want to sum has a certain value $f(p)$ at primes, most often it is a short polynomial in $p$. It’s much easier when it’s just a power of $p$, but we can deal with this situation.

Suppose $f(p) = a_0 + a_1 p + a_2 p^2 + \ldots + a_k p^k$.

The way to apply Min-25 sieve here is to compute sums of each monomial over primes separately. In other words, use

\[\sum_{p \leq v} f(p) = \sum_{i = 0}^k a_i \sum_{p \leq v} p^i\]

This will require $k+1$ different Lucy steps, and then Min-25 will do a single unlucy step at the end to compute $\sum_{1 < n \leq x} f(n)$.

If you were doing the powerful numbers trick, we would approximate with

\[g = u^{a_0} \ast N^{a_1} \ast N_2^{a_2} \ast \ldots \ast N_k^{a_k}\]

where, for all $n$, we define $u(n) = 1$, $N(n) = n$, and $N_i(n) = n^i$.
The powers here are Dirichlet convolutions, for example $u^4 = u\ast u\ast u\ast u$.

When $a_i$ are not integers, it may get a little less nice, and could require you to take a convolution square root or something else complicated³. So long as your summation limit is not unreasonable it’s very simple to just use Min-25 in that case.

Code

The code for this blog post is available here on GitHub.

Thanks to shash4321 for proofreading this and giving good feedback.

I have had a lot of things cooking here but haven’t really had the motivation to work on them. You’ll notice this isn’t even multiplicative sums 2. I have something I want to write about for sums 2, and this one is a bit of a shorter post compared to the last one. Whatever, it’s fine. I’ve also been working on a puzzle game which some people have seen for a very long time. Actually I restarted development on it in a new game engine, Godot instead of GMS2. I’m liking it a lot better. I’ve also been going outside and touching grass more often. Maybe I’ll find some corner of this site to put my photos on. Anyways, thanks for your patience on all this. ↩
Recall that the idea of the powerful numbers approximation trick is to find a multiplicative function $g$ such that $f(p) = g(p)$ for primes $p$, so that we have $(f/g)(p) = 0$ and $f/g$ is supported on powerful numbers which are sparse. If $f$ is multiplicative but $g$ isn’t, then $f/g$ won’t be multiplicative and we’re very sad. ↩
If we have a known function $g$ we can sum, and we want to compute sums of a function $f$ such that $f*f = g$, then start with $F(1) = \sqrt{g(1)}$, and then use the hyperbola method for each key value $v$. You can separate out the term corresponding to $F(v)$, which will give you all the $F(v)$ in a total of $x^{3/4}$ time. If you want to do convolution cube roots or worse, things get much more painful if you try to do it like that. Probably there’s some way to do it but I haven’t had a need for it and don’t have a reference on hand. ↩