# (Tutorial) Basic Usage of the Manifold TSM R Package

## von-Mises Fisher distribution

We simulate $$n=1000$$ samples from a von-Mises distribution with mean direction $$(\pi/2, \pi)$$ on the sphere, using the rvmf function from the Rfast package.

# Simulated VMF
n = 1000
centre = c(pi/2, pi)
centre_euclid = sphere_to_euclid(centre)
set.seed(4)
z = rvmf(n, centre_euclid, k = 6)
x = euclid_to_sphere(z)

Now this can be estimated directly, using various methods such as MLE, but if we could only see a portion of this data, then truncated score matching can be used. Say there is a boundary set up such that all measurements below a constant line of $$\phi = \pi/2 + 0.2$$ were missing. This can be constructed as follows

dV = cbind(rep(pi/2+0.2, 200), seq(0, 2*pi, len=200))
outside_boundary = x[,1] > (pi/2+0.2)
truncated_x = x[outside_boundary,]

The mean direction can be estimated by calling sphere_sm with a specified boundary function, one of "Haversine" or "Euclidean".

est_hav = sphere_sm(truncated_x, dV, g = "Haversine", family = vmf(k=6))
est_euc = sphere_sm(truncated_x, dV, g = "Euclidean", family = vmf(k=6))

Note that the family argument is vmf(k=6). This is a wrapper function that specifies which distribution to use. The argument k=6 corresponds to the concentration parameter being fixed a t 6, if this werenâ€™t supplied and vmf() was used alone, the concentration parameter will be esitmated. By default the sphere_sm function uses the von Mises Fisher distribution without specifying a value for k. Now that the points are estimated, we can plot them using ggplot.

plot_data = data.frame(colat = c(x[!as.logical(outside_boundary),1],
x[as.logical(outside_boundary),1]),
lon = c(x[!as.logical(outside_boundary),2],
x[as.logical(outside_boundary),2]),
Data = c(rep("Not-Observed", sum(!as.logical(outside_boundary))),
rep("Observed", sum(outside_boundary))))
centre_data = data.frame(
colat = c(est_hav$mu[1], est_euc$mu[1], centre[1]),
lon = c(est_hav$mu[2], est_euc$mu[2], centre[2]),
Centres = c("Haversine", "Projected Euclidean", "Real Centre")
)
ggplot(plot_data) + geom_point(aes(x=lon, y=colat, col=Data), alpha=.7, size=2) +
scale_color_brewer(type="qual", palette=3) + theme_minimal() +
xlab(expression(theta)) + ylab(expression(phi)) +
geom_point(data=centre_data, aes(lon, colat, fill=Centres), size=4, shape = "diamond filled")

The point has been estimated reasonably well. Now lets try the same approach without fixing the concentration parameter.

est_hav = sphere_sm(truncated_x, dV, g = "Haversine", family = vmf())
est_euc = sphere_sm(truncated_x, dV, g = "Euclidean", family = vmf())

Again, the centre has been found with good accuracy. We can inspect the output of sphere_sm to see what value was estimated for k.

est_hav
#> $mu #> colat lon #> 1.539329 3.112377 #> #>$k
#>        k
#> 5.415736
#>
#> $family #> [1] "von Mises Fisher" #> #>$val
#> [1] -2.044959
est_euc
#> $mu #> colat lon #> 1.584297 3.111693 #> #>$k
#>        k
#> 5.481328
#>
#> $family #> [1] "von Mises Fisher" #> #>$val
#> [1] -1.713839

### A different Boundary

We can experiment with a different boundary, implementing the same approach to the same dataset but a different truncation region. Start by simulating different data from the same distribution and constructing the boundary as before:

set.seed(7)
z = rvmf(n, centre_euclid, k = 6)
x = euclid_to_sphere(z)
dV = cbind(c(rep(1.2, 200), seq(1.2, 3, len=200)), c(seq(0, 3.6, len=200), rep(3.6, len=200)))
outside_boundary = x[,2] < 3.6 & x[,1] > 1.2
truncated_x = x[as.logical(outside_boundary),]

So in this case, the boundary is anything that fits $$\phi > 1.2$$ and $$\theta < 3.6$$, roughly a box around the top left portion of the data. We call sphere_sm as before.

est_hav = sphere_sm(truncated_x, dV, g = "Haversine", family=vmf())
est_euc = sphere_sm(truncated_x, dV, g = "Euclidean", family=vmf())

The results are slightly less accurate than before, but still hold.

## Kent Distribution

We adopt a similar approach for the Kent distribution, simulating data, truncating it at an artificial boundary and experimenting with fitting the model to see how well the estimated mean direction $$\mu$$ is. Firstly, we use a boundary similar to before, truncated at $$\phi = \pi/2+0.1$$.

z = rkent(n, k = 10, m = centre_euclid, b=3)
x = euclid_to_sphere(z)
dV = cbind(rep(pi/2+0.1, 500), seq(0, 2*pi, len=500))
outside_boundary = x[,1] > (pi/2 + 0.1)
truncated_x = x[outside_boundary,]

Since the Kent distribution has five parameters, estimation can be difficult and slow in some cases. For the Kent distribution, we assume that the concentration parameter k and ovalness parameter b are known, and specified in advance.

est_hav = sphere_sm(truncated_x, dV, g="Haversine", family=kent(k=10, b=3))
est_euc = sphere_sm(truncated_x, dV, g="Euclidean", family=kent(k=10, b=3))

And another boundary, similar to before.

set.seed(12)
z = rkent(n, m=centre_euclid, k = 10, b=3)
dV = cbind(c(rep(1.2, 200), seq(1.2, 3, len=200)), c(seq(0, 3.6, len=200), rep(3.6, len=200)))
x = euclid_to_sphere(z)
outside_boundary = x[,2] < 3.6 & x[,1] > 1.2
truncated_x = x[as.logical(outside_boundary),]

est_hav = sphere_sm(truncated_x, dV, g = "Haversine", family=kent(k=10, b=3))
est_euc = sphere_sm(truncated_x, dV, g = "Euclidean", family=kent(k=10, b=3))

##### Daniel Williams
###### CDT Student

I have a PhD in statistics/machine learning/data science/AI (whatever you would like to call it) from the University of Bristol, under the COMPASS CDT. I previously studied a masters in mathematics at the University of Exeter. My research was primarily on truncated density estimation and unnormalised models. But I am also interested in AI more generally, including all the learnings, Machine, Deep and Reinforcement (as well as some others!).