(Tutorial) Basic Usage of the Manifold TSM R Package
von-Mises Fisher distribution
We simulate \(n=1000\) samples from a von-Mises distribution with mean direction \((\pi/2, \pi)\) on the sphere, using the rvmf
function from the Rfast
package.
# Simulated VMF
n = 1000
centre = c(pi/2, pi)
centre_euclid = sphere_to_euclid(centre)
set.seed(4)
z = rvmf(n, centre_euclid, k = 6)
x = euclid_to_sphere(z)
Now this can be estimated directly, using various methods such as MLE, but if we could only see a portion of this data, then truncated score matching can be used. Say there is a boundary set up such that all measurements below a constant line of \(\phi = \pi/2 + 0.2\) were missing. This can be constructed as follows
dV = cbind(rep(pi/2+0.2, 200), seq(0, 2*pi, len=200))
outside_boundary = x[,1] > (pi/2+0.2)
truncated_x = x[outside_boundary,]
The mean direction can be estimated by calling sphere_sm
with a specified boundary function, one of "Haversine"
or "Euclidean"
.
est_hav = sphere_sm(truncated_x, dV, g = "Haversine", family = vmf(k=6))
est_euc = sphere_sm(truncated_x, dV, g = "Euclidean", family = vmf(k=6))
Note that the family
argument is vmf(k=6)
. This is a wrapper function that specifies which distribution to use. The argument k=6
corresponds to the concentration parameter being fixed a t 6, if this weren’t supplied and vmf()
was used alone, the concentration parameter will be esitmated. By default the sphere_sm
function uses the von Mises Fisher distribution without specifying a value for k
. Now that the points are estimated, we can plot them using ggplot
.
plot_data = data.frame(colat = c(x[!as.logical(outside_boundary),1],
x[as.logical(outside_boundary),1]),
lon = c(x[!as.logical(outside_boundary),2],
x[as.logical(outside_boundary),2]),
Data = c(rep("Not-Observed", sum(!as.logical(outside_boundary))),
rep("Observed", sum(outside_boundary))))
centre_data = data.frame(
colat = c(est_hav$mu[1], est_euc$mu[1], centre[1]),
lon = c(est_hav$mu[2], est_euc$mu[2], centre[2]),
Centres = c("Haversine", "Projected Euclidean", "Real Centre")
)
ggplot(plot_data) + geom_point(aes(x=lon, y=colat, col=Data), alpha=.7, size=2) +
scale_color_brewer(type="qual", palette=3) + theme_minimal() +
xlab(expression(theta)) + ylab(expression(phi)) +
geom_point(data=centre_data, aes(lon, colat, fill=Centres), size=4, shape = "diamond filled")
The point has been estimated reasonably well. Now lets try the same approach without fixing the concentration parameter.
est_hav = sphere_sm(truncated_x, dV, g = "Haversine", family = vmf())
est_euc = sphere_sm(truncated_x, dV, g = "Euclidean", family = vmf())
Again, the centre has been found with good accuracy. We can inspect the output of sphere_sm
to see what value was estimated for k
.
est_hav
#> $mu
#> colat lon
#> 1.539329 3.112377
#>
#> $k
#> k
#> 5.415736
#>
#> $family
#> [1] "von Mises Fisher"
#>
#> $val
#> [1] -2.044959
est_euc
#> $mu
#> colat lon
#> 1.584297 3.111693
#>
#> $k
#> k
#> 5.481328
#>
#> $family
#> [1] "von Mises Fisher"
#>
#> $val
#> [1] -1.713839
A different Boundary
We can experiment with a different boundary, implementing the same approach to the same dataset but a different truncation region. Start by simulating different data from the same distribution and constructing the boundary as before:
set.seed(7)
z = rvmf(n, centre_euclid, k = 6)
x = euclid_to_sphere(z)
dV = cbind(c(rep(1.2, 200), seq(1.2, 3, len=200)), c(seq(0, 3.6, len=200), rep(3.6, len=200)))
outside_boundary = x[,2] < 3.6 & x[,1] > 1.2
truncated_x = x[as.logical(outside_boundary),]
So in this case, the boundary is anything that fits \(\phi > 1.2\) and \(\theta < 3.6\), roughly a box around the top left portion of the data. We call sphere_sm
as before.
est_hav = sphere_sm(truncated_x, dV, g = "Haversine", family=vmf())
est_euc = sphere_sm(truncated_x, dV, g = "Euclidean", family=vmf())
The results are slightly less accurate than before, but still hold.
Kent Distribution
We adopt a similar approach for the Kent distribution, simulating data, truncating it at an artificial boundary and experimenting with fitting the model to see how well the estimated mean direction \(\mu\) is. Firstly, we use a boundary similar to before, truncated at \(\phi = \pi/2+0.1\).
z = rkent(n, k = 10, m = centre_euclid, b=3)
x = euclid_to_sphere(z)
dV = cbind(rep(pi/2+0.1, 500), seq(0, 2*pi, len=500))
outside_boundary = x[,1] > (pi/2 + 0.1)
truncated_x = x[outside_boundary,]
Since the Kent distribution has five parameters, estimation can be difficult and slow in some cases. For the Kent distribution, we assume that the concentration parameter k
and ovalness parameter b
are known, and specified in advance.
est_hav = sphere_sm(truncated_x, dV, g="Haversine", family=kent(k=10, b=3))
est_euc = sphere_sm(truncated_x, dV, g="Euclidean", family=kent(k=10, b=3))
And another boundary, similar to before.
set.seed(12)
z = rkent(n, m=centre_euclid, k = 10, b=3)
dV = cbind(c(rep(1.2, 200), seq(1.2, 3, len=200)), c(seq(0, 3.6, len=200), rep(3.6, len=200)))
x = euclid_to_sphere(z)
outside_boundary = x[,2] < 3.6 & x[,1] > 1.2
truncated_x = x[as.logical(outside_boundary),]
est_hav = sphere_sm(truncated_x, dV, g = "Haversine", family=kent(k=10, b=3))
est_euc = sphere_sm(truncated_x, dV, g = "Euclidean", family=kent(k=10, b=3))