(Tutorial) Basic Usage of the Manifold TSM R Package

von-Mises Fisher distribution

We simulate \(n=1000\) samples from a von-Mises distribution with mean direction \((\pi/2, \pi)\) on the sphere, using the rvmf function from the Rfast package.

# Simulated VMF
n = 1000              
centre = c(pi/2, pi) 
centre_euclid = sphere_to_euclid(centre)   
set.seed(4)
z = rvmf(n, centre_euclid, k = 6)
x = euclid_to_sphere(z)

Now this can be estimated directly, using various methods such as MLE, but if we could only see a portion of this data, then truncated score matching can be used. Say there is a boundary set up such that all measurements below a constant line of \(\phi = \pi/2 + 0.2\) were missing. This can be constructed as follows

dV = cbind(rep(pi/2+0.2, 200), seq(0, 2*pi, len=200))
outside_boundary = x[,1] > (pi/2+0.2)
truncated_x = x[outside_boundary,]

The mean direction can be estimated by calling sphere_sm with a specified boundary function, one of "Haversine" or "Euclidean".

est_hav = sphere_sm(truncated_x, dV, g = "Haversine", family = vmf(k=6))
est_euc = sphere_sm(truncated_x, dV, g = "Euclidean", family = vmf(k=6))

Note that the family argument is vmf(k=6). This is a wrapper function that specifies which distribution to use. The argument k=6 corresponds to the concentration parameter being fixed a t 6, if this weren’t supplied and vmf() was used alone, the concentration parameter will be esitmated. By default the sphere_sm function uses the von Mises Fisher distribution without specifying a value for k. Now that the points are estimated, we can plot them using ggplot.

plot_data = data.frame(colat = c(x[!as.logical(outside_boundary),1],
                                 x[as.logical(outside_boundary),1]),
                       lon = c(x[!as.logical(outside_boundary),2],
                               x[as.logical(outside_boundary),2]),
                       Data = c(rep("Not-Observed", sum(!as.logical(outside_boundary))),
                                rep("Observed", sum(outside_boundary))))
centre_data = data.frame(
  colat = c(est_hav$mu[1], est_euc$mu[1], centre[1]),
  lon = c(est_hav$mu[2], est_euc$mu[2], centre[2]),
  Centres = c("Haversine", "Projected Euclidean", "Real Centre")
)
 ggplot(plot_data) + geom_point(aes(x=lon, y=colat, col=Data), alpha=.7, size=2) +
  scale_color_brewer(type="qual", palette=3) + theme_minimal() +
  xlab(expression(theta)) + ylab(expression(phi)) +
   geom_point(data=centre_data, aes(lon, colat, fill=Centres), size=4, shape = "diamond filled")

The point has been estimated reasonably well. Now lets try the same approach without fixing the concentration parameter.

est_hav = sphere_sm(truncated_x, dV, g = "Haversine", family = vmf())
est_euc = sphere_sm(truncated_x, dV, g = "Euclidean", family = vmf())

Again, the centre has been found with good accuracy. We can inspect the output of sphere_sm to see what value was estimated for k.

est_hav
#> $mu
#>    colat      lon 
#> 1.539329 3.112377 
#> 
#> $k
#>        k 
#> 5.415736 
#> 
#> $family
#> [1] "von Mises Fisher"
#> 
#> $val
#> [1] -2.044959
est_euc
#> $mu
#>    colat      lon 
#> 1.584297 3.111693 
#> 
#> $k
#>        k 
#> 5.481328 
#> 
#> $family
#> [1] "von Mises Fisher"
#> 
#> $val
#> [1] -1.713839

A different Boundary

We can experiment with a different boundary, implementing the same approach to the same dataset but a different truncation region. Start by simulating different data from the same distribution and constructing the boundary as before:

set.seed(7)
z = rvmf(n, centre_euclid, k = 6)
x = euclid_to_sphere(z)
dV = cbind(c(rep(1.2, 200), seq(1.2, 3, len=200)), c(seq(0, 3.6, len=200), rep(3.6, len=200)))
outside_boundary = x[,2] < 3.6 & x[,1] > 1.2
truncated_x = x[as.logical(outside_boundary),]

So in this case, the boundary is anything that fits \(\phi > 1.2\) and \(\theta < 3.6\), roughly a box around the top left portion of the data. We call sphere_sm as before.

est_hav = sphere_sm(truncated_x, dV, g = "Haversine", family=vmf())
est_euc = sphere_sm(truncated_x, dV, g = "Euclidean", family=vmf())

The results are slightly less accurate than before, but still hold.

Kent Distribution

We adopt a similar approach for the Kent distribution, simulating data, truncating it at an artificial boundary and experimenting with fitting the model to see how well the estimated mean direction \(\mu\) is. Firstly, we use a boundary similar to before, truncated at \(\phi = \pi/2+0.1\).

z = rkent(n, k = 10, m = centre_euclid, b=3)
x = euclid_to_sphere(z)
dV = cbind(rep(pi/2+0.1, 500), seq(0, 2*pi, len=500))
outside_boundary = x[,1] > (pi/2 + 0.1)
truncated_x = x[outside_boundary,]

Since the Kent distribution has five parameters, estimation can be difficult and slow in some cases. For the Kent distribution, we assume that the concentration parameter k and ovalness parameter b are known, and specified in advance.

est_hav = sphere_sm(truncated_x, dV, g="Haversine", family=kent(k=10, b=3))
est_euc = sphere_sm(truncated_x, dV, g="Euclidean", family=kent(k=10, b=3))

And another boundary, similar to before.

set.seed(12)
z = rkent(n, m=centre_euclid, k = 10, b=3)
dV = cbind(c(rep(1.2, 200), seq(1.2, 3, len=200)), c(seq(0, 3.6, len=200), rep(3.6, len=200)))
x = euclid_to_sphere(z)
outside_boundary = x[,2] < 3.6 & x[,1] > 1.2
truncated_x = x[as.logical(outside_boundary),]

est_hav = sphere_sm(truncated_x, dV, g = "Haversine", family=kent(k=10, b=3))
est_euc = sphere_sm(truncated_x, dV, g = "Euclidean", family=kent(k=10, b=3))

Daniel Williams
Daniel Williams
CDT Student

I have a PhD in statistics/machine learning/data science/AI (whatever you would like to call it) from the University of Bristol, under the COMPASS CDT. I previously studied a masters in mathematics at the University of Exeter. My research was primarily on truncated density estimation and unnormalised models. But I am also interested in AI more generally, including all the learnings, Machine, Deep and Reinforcement (as well as some others!).