The Fault in Our Star Clustering

In college I took an astronomy class with Dr. Doty, a wildly entertaining professor known for his half-inch-thick glasses, boundless energy, and habit of explaining the universe through baseball metaphors (as one does). His final project was fantastically liberal arts: write your paper about astronomy, but through the lens of your major. A chemistry friend of mine did something with stellar spectra and emission lines. There was, I’m fairly sure, at least one person who turned in a literal book of poems. You get the idea. Sneakily hard, too… you can’t hide behind a rubric, because everyone’s rubric is different.

I was a CS major, one course shy of both a math minor and a classics minor. Small liberal-arts school. So that still left me with options. I had to pick which version of myself was writing the paper.

I picked the CS major (but with a Classics flair because I can’t help myself). I was also taking an artificial intelligence class at the same time (mind you, this is pre-LLM AI… what most people now call “data science” for clarity instead) and I had just learned about clustering, and I was a tired college junior looking for an easy win before heading off abroad. I already had clustering code from the AI class, so re-using it with minor adjustments for the astronomy paper was “basically free”. So my paper was going to be about clustering. Specifically: if you wiped the slate clean and started over, would humanity draw the same constellations?

And yes, duh: constellations are going to cluster together to some extent. I knew that going in. That was sort of the point… head already abroad, I was deliberately picking a question I was pretty sure I already knew the answer to, because “pretty sure I know the answer” is exactly the kind of question a lazy student wants on a deadline.

Buried under that lazy framing, though, is a genuinely fun question. A constellation is a story people agreed to tell about some dots. The Greeks named their constellations, the Chinese named theirs independently, and the two systems carved up some of the same regions even though nobody compared notes. So are constellations discovered or invented? If they trace something real in the sky, an algorithm that knows nothing about Greek myth should rediscover them just from where the stars sit, right?

That was the idea. I ran k-means on the star data coordinates, compared its groups to the real constellations, wrote up my findings, and got my (okay) grade.

What 2016 me got wrong

I dug the paper back up recently. Reading your own old work is… something. The programming was in MATLAB because we didn’t really have a choice (I have made my peace with the one-based indexing… mostly). The core of the whole thing was maybe 20 lines: sweep through different values of k, run k-means a few times at each one, keep the best result.

Quick aside if k-means is unfamiliar: you tell the algorithm “I think there are k groups in this data,” and it tries to find them. It picks k random center points, assigns every data point to its nearest center, scoots the centers to the middle of their new groups, and repeats until nothing moves. It’s simple!

Anyway, the clustering loop itself was fine. The problem was never the loop. It was the two things I fed into it and the question I asked of what came out. Two real problems, and they’re different kinds of wrong.

The first is that I took an easy shortcut with the geometry. Stars get located by two numbers: right ascension (like longitude, 0 to 24 hours, and it wraps, so 23.9h is right next to 0.1h) and declination (like latitude, -90° to +90°). I fed those two numbers straight into k-means as if they were x and y on graph paper. They are not x and y on graph paper. They’re coordinates on a sphere. I knew this at the time but treating them as flat was easy, and I was on a deadline, and it mostly worked (lol). The problem showed up at the edges: constellations sitting on the right-ascension seam got torn in half by the flat coordinates, and my fix was (and I quote past me from my paper) to “eliminate any constellations that wrap around the ends.” Just… set the wrapping ones aside and don’t consider them at all. Which, look, the analysis I did run was honest about the stars it looked at. But the seam wasn’t a property of those constellations; it was an artifact of pretending a sphere was a sheet of paper. I was patching symptoms instead of fixing the cause and, frankly, I didn’t really care, lol.

The second problem is more subtle, and it took me longer to see. The question wasn’t exactly scientific. A constellation is, more or less by definition, a bunch of stars that look close together, so “do the stars in a constellation cluster spatially?” is nearly asking whether nearby things are nearby. That’s the duh from earlier, except here it stops being a self-deprecating aside and quietly becomes the flaw baked into the whole experiment. And the answer came back weirdly mediocre anyway, which should’ve been a clue that I was missing nuance. The more interesting question, the one I should’ve been asking, is whether an algorithm would (or could?) carve the sky into the same partition a human would: same boundaries, same number of groups, same membership. That version can actually be wrong, which is what makes it worth asking.

The fix for the geometry is to put these points on a sphere, not a flat space. Once the points are on a sphere, you can cluster from there. Luckily, Python makes this easier than MATLAB did… and it’s a better solution:

def sphere_vectors(stars):
    # RA is in hours (0-24); multiply by 15 to get degrees, then radians.
    ra = np.radians(np.array([s["ra"] for s in stars]) * 15.0)
    dec = np.radians(np.array([s["dec"] for s in stars]))
    # Each star becomes a unit-length (x, y, z) direction on the sphere.
    return np.column_stack([
        np.cos(dec) * np.cos(ra),
        np.cos(dec) * np.sin(ra),
        np.sin(dec),
    ])

No seam, no pole distortion, no excluding constellations because they’re inconvenient. The modern HYG catalog also just has a constellation column, so I no longer have to slice characters off a designation string to guess. And the data cleaning is brutal in a way worth seeing… almost the entire catalog is stars no unaided human eye has ever resolved:

Keep every naked-eye star (apparent magnitude ≤ +6, Ptolemy’s old visible limit) and you’re left with 5,070 stars across all 88 constellations. No strategic exclusions due to laziness. Cluster those on the sphere and the torn seam from the previous figure heals; same projection, now whole.

Hover over any star to light up just its group (its real constellation, or whatever the algorithm lumped it with) and see how that group wraps across the sky.

Flip from the real constellations to the algorithm’s groups and watch how little the two disagree. Then, under the algorithm’s groups, swap between the 2016 way and sphere k-means (the fix).

Here’s the anticlimax I did not see coming: the geometry fix barely moved the needle. That actual 2016 pipeline scores about 0.77 (0.7712) on Normalized Mutual Information (how much knowing a star’s cluster tells you about its real constellation), on the 4,489 stars it kept. Doing it properly on the sphere, all 5,070 stars and all 88 constellations? Also about 0.77 (0.7651), actually slightly lower. Most constellations live far enough from the poles and the seam that the distortion just washes out and throwing the seam-crossers away, the thing I agonized over after-the-fact, changed almost nothing. My college laziness made no impact. Whudda thunk it. Sort of poetic, eh?

I’d “fixed” the experiment and barely moved the score. Same mediocre answer, fancier math.

So, I went back and looked at what the algorithm was actually getting wrong, instead of what I wanted it to get right.

k-means has a distinctive behavior that wasn’t great in this instance: it carves space into round, evenly-sized blobs, because that’s what “nearest centroid” produces. But a real constellation can be a long thin chain (Eridanus, the river) or a fat sprawl (Hydra)… shapes k-means structurally cannot want. And when I looked at which constellations it mangled, they were almost all the big sprawling ones. The compact, bright ones (Crux, Lyra, Gemini) it nailed.

And that was kind of the big clue… humans didn’t draw constellations from the faint stars. We drew them from the bright ones. Nobody selected the saddest, faintest, unnamed star for Orion… they connected Betelgeuse and Rigel and the three bright stars of the belt. The faint stars are noise we added to the catalog later with telescopes. While the idea of including all stars visible to the human eye was good in theory, it added way too much noise to produce a good signal for constellations.

And again, duh: of course the bright stars are the ones that make up the constellations, and of course they’re the ones sitting close enough together to get connected into a figure in the first place. Past me could’ve told you that without writing a line of code.

But the win was in the journey, not the destination. And given the anticlimax I just walked you through (the careful geometry fix that changed almost nothing), maybe the “duh” answers aren’t as straightforward as we’d choose to believe. Knowing the answer and being able to show it turn out to be very different things.

So I changed two things. First, restrict to brighter and brighter stars and watch what happens. Second, swap k-means for average-linkage agglomerative clustering, which grows groups by chaining nearby stars together instead of forcing round blobs like k-means. Luckily, scikit-learn makes it pretty easy-peasy:

from sklearn.cluster import KMeans, AgglomerativeClustering
from sklearn.metrics import adjusted_rand_score, normalized_mutual_info_score

X = sphere_vectors(stars)  # 3D unit-sphere directions

# The 2016 way: k-means on raw flat (RA, dec).
naive = KMeans(k, n_init=10).fit_predict(flat_features(stars))

# k-means, done right: same algorithm, but on the sphere.
sphere = KMeans(k, n_init=10).fit_predict(X)

# Chain-following: grows long, thin groups instead of round blobs.
linkage = AgglomerativeClustering(k, linkage="average").fit_predict(X)

nmi = normalized_mutual_info_score(true_constellations, linkage)
ari = adjusted_rand_score(true_constellations, linkage)

You can see the difference. k-means draws round blobs… linkage weaves paths. Much closer to how a person traces a figure.

This is the figure to actually play with: drag the slider toward the bright end and watch the reproducibility score climb as the faint noise falls away, or search for a constellation you know to see how intact it came back.

Push the slider all the way to the bright end and the score jumps. Here’s that same climb for all three methods at once, as one picture:

All three clustering methods improve when you only include brighter stars. The chain-following method (orange) pulls ahead once the shapes start to matter. This method moves us from a measly NMI of 0.76 to 0.92! And the ARI (Adjusted Rand Index, which asks whether pairs of stars that are together in real life land together in the clustering) climbs from a middling 0.4 to 0.65. Given the right input (bright stars), the algorithm actually draws something close to the map that humans drew thousands of years ago.

My original hypothesis (“the stars cluster into constellations”) was too vague to be either true or false. The true hypothesis (the bright stars cluster into constellations, and you need a method that follows chains to see it) is the one the data actually supports. I didn’t find the answer in 2016 because I wasn’t asking a question that had one.

So, are constellations real?

The numbers tell a consistent story, and I think it’s a good one. Of course, it’s not perfect. But among the bright anchor stars, the clustering recovers the human map surprisingly well. The constellations aren’t arbitrary scribbles, they trace real clumps of bright stars, which is likely why distant cultures kept landing on overlapping regions all those years ago. But the exact borders aren’t inevitable. Search for Eridanus in the chain-following figure above (recovered about 25% intact) versus Crux or Lyra (basically perfect). Which is the most human answer possible. The raw material is real, if imperfect. Orion’s bright stars really do huddle together up there. But the hunter, the belt, the sword, the dog at his heel? Human creativity and randomness at its finest. It’s likely some future stargazer with no Greek in their education would look at the exact same bright shape and see something else entirely… and that’s okay!

The bright stars do cluster. The stories we drew around them don’t quite match. I think that’s better than a clean yes. And it’s a much better answer than the one I turned in to Dr. Doty.