LK-99: What can machine learning tell us about the candidate superconductor?
The suspense continues about a possibly world-changing discovery
By Dr. Shreyas Honrao, Senior Materials Informatics Scientist, and Dr. Austin Sendek, Chief Executive Officer
The scientific world has been abuzz for the last two weeks with news of a possible ground-breaking discovery: the world’s first room temperature, ambient pressure superconductor. Dubbed LK-99, this material was first proposed in a tantalizing preprint posted to arXiv on July 22 by a Korean research team.
In the two weeks since, scientists all over the world have scrambled to study, reproduce, and scrutinize the findings. As of the time of publication of this article on August 8, 17 more preprints referencing LK-99 have already appeared on arXiv. Some studies are carefully providing various degrees of support for the extraordinary claims, including density functional theory calculations that show flat bands at the Fermi level, while others are calling it all junk. In any case, it seems certain we will know if LK-99 is the real deal rather quickly.
Why? Because LK-99, with full chemical formula CuPb9(PO4)6O, appears to be actually relatively easy to synthesize. It’s also quite serendipitous that LK-99 is made up of only abundant elements Cu, Pb, P, and O – so, if the claims of room temperature superconductivity are true, widespread commercialization may not be far behind.
But that doesn’t mean the crystal structure is simple. To date, other room temperature superconductors have required incredible amounts of pressure to actually work — but it is thought that the unique crystal structure of LK-99, in which copper atoms are placed in unusual, high energy positions, may actually create a strain effect, or “internal pressure” that gives rise to superconductivity without needing external pressure.
Machine learning methods, like the ones we are developing at Aionics, have emerged as a promising tool to aid scientists in identifying latent patterns in material structure that may elude human intuition. So: what can machine learning tell us about LK-99?
A typical machine learning-based analysis begins with extracting descriptors of materials to feed into a predictive algorithm. You might think of these descriptors as a set of unique fingerprints that are distinct for every material, enabling a complex, three-dimensional crystal structure to be represented in a machine readable numerical format.
Once descriptors have been extracted for the given set of materials, the job of a machine learning model is to use empirical data to choose which set of descriptors is highly correlated with the property of interest. This requires a broad and rich dataset, or else it’s impossible to know whether a given descriptor is useful for predicting the property at hand.
If LK-99 is truly a room temperature, ambient pressure superconductor, it should have characteristics unlike any material we’ve ever seen before — unlike any superconductor we’ve seen before, for that matter. Can we tease out how LK-99 is unique?
To find out, we aggregated three datasets of materials: the first dataset contains only LK-99; the second is the SuperCon dataset containing 16,414 known superconductors with varied critical transition temperatures used by Stanev et al. in their analysis, and the third contains more than 50,000+ known (non-superconducting) crystal structures as aggregated by the Materials Project.
To begin the analysis of LK-99, we start by looking at its chemical formula: CuPb9(PO4)6O. On its face, this composition doesn’t seem particularly exotic. If you had to sift through the chemical formulas of all tens of thousands of known materials to find the room temperature superconductor, you’d likely blow right past this one.
How does it stack up next to all other known materials based only on its chemical formula (or “composition”)? We extracted 239 unique compositional descriptors from our three above-referenced datasets, based on the MAGPIE package developed by Ward et al. These descriptors run from simple (e.g. the fraction of the chemical formula that is oxygen) to complicated (e.g. the weighted average ground state magnetic moment of all the elements in the formula).
Using this descriptor set, LK-99 appears to be… pretty normal!
There are only three descriptors for which LK-99 is two standard deviations outside the mean (“2σ”) or more, making it truly exceptional: the fraction of the chemical formula that is lead (4.4σ), the highest minimum group number element in the chemical formula (2.2σ), and the number of valence electrons in the highest valence electron element in the formula (2.0σ). 810 materials have higher Pb percentages than LK-99, with the highest being, of course, pure metallic Pb. There are 4,252 materials with higher minimum group number elements than LK-99 — the Noble gases are (trivially) the highest. And 3,317 entries have higher valence electron elements in their chemical formulas, with the highest being pure polonium.
The location of LK-99 in compositional space is visualized below: for all three of our datasets (LK-99, superconductors, all known materials), we reduced the dimensionality of the 239-dimensional descriptor space down to two using uniform manifold approximation and projection (UMAP). This gives us a way to visualize how “far” materials are from each other in abstract compositional space.
Two things jump out here:
(1) There is generally good overlap between the known superconductors (green) and all known materials (light blue), suggesting that superconductors generally do not have distinct chemical formulas among the universe of known materials.
(2) LK-99 sits in an area of compositional space that is generally populated with both superconductors and regular materials. From this perspective, LK-99 does not appear to sit out “on its own” somewhere in space.
Is this weird? Perhaps not. While this analysis is only cursory, this result suggests that our compositional descriptor set is not necessarily adept at differentiating between regular and superconducting structures in the first place — so it’s perhaps not surprising that LK-99 is also not well-differentiated compositionally.
The spread of LK-99’s compositional descriptors against the background distribution of all known features is approximately Gaussian: 77% between 0-1σ, 21% between 1-2σ, and 1% above 2σ. If the descriptors were chosen uniformly at random from the background distribution, you would expect to see 68% between 0-1σ, 27% between 1-2σ, and 5% beyond 2σ. This suggests, again, that the composition of LK-99 is not particularly anomalous. This is visualized in the histogram below, which looks nearly Gaussian – and that one descriptor reflecting its high Pb content is clearly visible.
It appears LK-99 is not particularly unique from a compositional perspective. But how about from a structural perspective? To answer this, we run a similar analysis using a structural descriptor set.
To extract insights on the structure of LK-99 relative to other known superconductors and all known materials, we extracted 333 descriptors using the MatMiner package — also built by Ward et al. These descriptors are not quite as interpretable as the compositional descriptors, because they are generally derived from taking complicated functions of the three-dimensional distribution of the atoms. A simple and interpretable structural descriptor might be something like the packing fraction or the average bond length; but in the case of these 333 MatMiner features, we’re extracting information based on the predicted x-ray diffraction (XRD) spectrum and the radial distribution functions around the atom.
The structural descriptor analysis returns an interesting result: of the 333 descriptors, 91.9% fall between 0-1σ, 7.5% fall between 1-2σ, and only two descriptors (0.6%) fall beyond 2σ. This result makes LK-99 sound even more “average” from a structural perspective than it did from a compositional perspective! The two descriptors for which LK-99 is an outlier are the radial distribution function from 1.5-1.6 angstroms (2.9σ) and the radial distribution function from 2.5-2.6 angstroms (2.6σ). For the former, LK-99 exhibits a value of 245.4 while the average across all materials is 23.5; for the latter, LK-99 has a value of 181.3 while the average across all materials is 28.5. There are 2,174 known materials with larger values of the first feature, with the largest being a value 1698.9 in AlPO4. There are 2,308 materials with larger values of the second feature; the largest is Zr64S127 with a value of 2280.2. The distribution of descriptor values relative to the background distribution of all materials is shown below.
Does this mean structural descriptors can’t find anything interesting about LK-99? Not necessarily. In fact, when we reduce the dimensionality to two using UMAP in structural space, we see some different behavior than when compared to the compositional case — this is plotted below.
Two points to note:
(1) Superconductors occupy a distinct subset of the structural space occupied by all regular materials; in other words, there are certain spaces where regular materials exist but superconductors do not. There appears to be some differentiation between the two in structural space.
(2) LK-99 exists in a location of structural space that is populated by known materials but not well-populated by any known superconductors. This would suggest that LK-99 is more like a regular material, and less like known superconductors.
This suggests that structural space may yet hold some clues to what makes LK-99 unique, and that a deeper dive is warranted. If the claims of superconductivity do hold, its distance from the known superconductors may suggest a new mechanism is at play. Or, if the claims are untrue, this projection might suggest we could have “known it all along” because it does not sit among known superconductors. But we won’t make any claims either way — we need to know more about the structural differentiators of LK-99 first.
It seems that structural descriptors have identified something unique about LK-99, but with this cursory analysis it’s hard to say exactly what is unique.
The challenge of predicting superconductivity based on atomistic structure is a historically difficult one — in fact, it’s been vexing the physics community for decades. In essence, the structure-property map is so complex because superconductivity is an electronic structure phenomenon, and electronic structure is not trivially correlated to atomistic structure. The hope for machine learning is that we can skip directly from atomistic structure to properties without needing to solve Schrödinger’s equation for electronic structure.
In order to know whether this can be done, descriptors will need to be fed into a machine learning model trained to predict the critical superconducting temperature of a material. Whether or not an accurate machine learning model can be trained to predict superconductivity in the first place is an interesting question — and the question of what that model would predict about LK-99 as a superconductor will be even more interesting.
For now, all we can say from this analysis is that LK-99’s composition does not seem extraordinary when compared to the distribution of all known materials, but we see a hint that its crystal structure may be distinct enough to offer clues of its potential significance.
The ability of machine learning – even relatively simple and “shallow” (i.e. non-deep learning) models – to drive fundamental inferences about materials is exciting and we expect it to stay at the forefront as new research on superconductors unfolds in the future.
Chief Executive Officer
Sr. Materials Informatics Scientist
Subscribe to our newsletter and stay updated with the latest from Aionics.