WHOLE
EARTH
CODEC

Act 1:
Inverting the
Observatory

The massive Large Language Models (LLMs) of today could not exist without the internet proliferating data exponentially over the past decade. GPT-3 is a foundational language model trained on 45TB of text—some 85% from the internet, including the Common Crawl, a dataset of scraped, publicly accessible websites. Yet despite its eerie capabilities, GPT-3’s training data ultimately consists of a tiny subset of language itself.

The much broader set of potential data remains inaccessible due to privatization, incompatible formats, or lack of digitization and aggregation. While current foundation models are language-based, nothing limits these models to dealing with text alone. The true potential of these models rests in their ability to synthesize multi-modal streams of information into a single knowledge architecture. This proposal radically expands the scope of foundation models, moving beyond anthropocentric, language data towards the wealth of ecological information immanent to the planet.

Many actors are already attempting to gather such information, albeit in piecemeal ways. The Global Nucleic Acid Observatory and the Australian Acoustic Observatory seek to capture metagenomic information of watersheds and ecological bioacoustic sound data, respectively. Both approaches invert the traditional model of the observatory, looking inwards towards the Earth rather than outwards into the cosmos. Still, they remain limited in scope and do not integrate the data they collect into a broader composition.

Another structure provides a template for how a planetary-scale observatory might be conceived. The Event Horizon Telescope is a global network of synchronized radio telescopes that coordinate observations to image a black hole for the first time. Such a model recasts the entire earth as a distributed observatory, with each individual telescope contributing to a composite image of the cosmos. Imagine another mesh observatory but with an inverted gaze, taking the Earth, rather than the cosmos, as the object to be observed.

What if the cognitive infrastructure of such observations took the form of a foundation model: one which represented not only a small subset of human language, but the wealth of information in the biosphere as well?

Act 2:
The Whole
Earth Codec

The imaging process of the Event Horizon Telescope is no different to how we see: photoreceptors absorb electromagnetic radiation and trigger electrical signals along neurons, beyond which higher levels of processing assemble the complete image. The transformation from one signal to another is called transduction, which describes both the processes by which we see, hear, smell—but also what machine sensors do.

We are all plugged into a limited bandwidth of a planetary ground truth of various forms of radiation, vibration, and energy, yet integrate the stimuli into a cohesive umwelt. What if this sensory integration took place beyond the scale of the individual organism, at the scale of the entire planet? Enter the Whole Earth Codec, an autoregressive foundation model trained across multiple modalities, which enables interoperability between disparate forms of data and allows an expansive planetary intelligence to emerge.

Data

The Codec ingests broad spectrum data across modalities. The distributed network of its mesh observatory consists of different sensors receiving different types of stimuli: image, audio, chemical, lidar, pressure, moisture, magnetic fields, etc. What forms of data are produced are just as broad as the forms of sensing. The data stream from an individual sensor consists of measurements taken at a modality-relevant sampling rate, e.g. twenty times a second for earthquake seismometers, once every thirty minutes for common AQI sensors. The rate can also differ between sensors within the same modality, such as a microphone attuned to birdsong at 44 kHz versus one for turtles at 24 kHz. Regardless of modality, a UTC timestamp and GPS satellite signal is attached to each sample. This anchoring allows the model to make associations based on temporal and spatial correlation across disparate modalities.

Pre-training

Foundation models are pre-trained on a massive corpus of unsupervised data, and the Whole Earth Codec is no different. In data streams of sequential samples, the tokenization method is apparent: the data from these sensors are batched into an aggregated document and each sample from its stream of measurements is a single token. Other forms of non-sequential data require different tokenization techniques, such as images broken into pixels and sequenced in raster order.

To handle multimodal input, separate encoders are trained for each type of data. These encoders transform disparate forms of input into dense, high-dimensional embeddings within a single, massive cross-modal latent space. The model is trained to project temporally and spatially correlated forms of data into nearby embeddings within the space. Decoders of different modalities are then trained by translating the latent space embeddings into sequence predictions. Due to the massive scale of information, the model only makes a single pass over available data. As new data is gathered and aggregated, the model can simply continue training and updating weights.

Privacy

The distributed sensing network from which the Whole Earth Codec observes the planet is privy to vast amounts of data, yet sensitive information is protected through structured transparency. Input privacy refers to the ability to process information that is hidden from you and to allow others to process your information without revealing it to them, while output privacy refers to the ability to read the output of an information flow without being able to reverse-engineer further information about the input. Through federated learning, data from the mesh observatory is processed in local servers within a secure enclave, communicating weights rather than data to the coordinating server containing the main foundation model. This maintains the input privacy of all data ingested by the model. For particularly sensitive information, adding noise to each datapoint preserves output privacy without impacting overall learned predictions; this technique is known as differential privacy. The sum of privacy-enhancing technologies deployed entails a trustless paradigm for training the Whole Earth Codec. Traditional relations of opacity/transparency and antagonism/mutualism are complicated by mutually assured observation.

Governance

Sovereignty is derived from this technology, but its implementation spreads from the bottom up; the Whole Earth Codec cannot be owned by any one entity. The mesh observatory is a conglomerate of public and private sensors, networked together by organizations ranging from government institutions to research universities to individual landowners. Much like the Internet Engineering Task Force, the standards and protocols of the Whole Earth Codec are maintained by a supra-national body and proposed, developed, and reviewed in an open process. They maintain interoperable protocols for training and deploying the foundation model, governing processes including data transmission, federation, and weight aggregation.

Capabilities

By integrating myriad channels of sensing data into a shared latent space, the Whole Earth Codec can make emergent associations between a multiplicity of temporal and spatial scales. Unmoored from the umwelt of any single organism, its sensorium is privy to an amalgamated landscape of previously indiscernible relations. Through the same abrupt specific capability scaling prevalent in LLMs, task performance sharply improves as the size of the training corpus expands; this motivates the Codec as a planetary project rather than a fragmented one.

Leveraging the pre-trained baseline, fine-tuning uses a smaller, labeled dataset to update model weights, often for specific capabilities or to address domain shift. The Codec forms the substrate for a rich ecosystem of third-party, fine-tuned models with improved performance on downstream tasks. Within the ecosystem, there are fine-tuned models developed by an economy of research universities and private startups, available open-source or through pay-to-play APIs. Openly available models proliferate in everyday use among climate-minded hobbyists, but industries such as insurance will pay a premium for high-performance proprietary software.

Act 3:
New Planet
Sensorium

One of the key features of foundation models is emergence; the self-supervised model and extremely large scale of the training data makes it hard to predict exactly what downstream capabilities may appear. This emergent nature challenges the traditional observatory model, in which scientists conduct observations in order to test specific hypotheses. What is an observatory when the object of observation is unknown? Through the assemblage of the Whole Earth Codec, secondary and tertiary modes of observation might emerge: a metagenomic sequence might tell one about land use patterns, for example. Observations are assembled rather than conducted. The following speculative case studies explore the potential impacts of the Codec on geopolitics and economies, and how it may trouble assumptions of sovereignty, surveillance, and global coordination.

Cloud
Seeding

Due to the burning of unrefined coal for heating, Mongolia has high air pollution. To remain under internationally mandated pollution levels, the environmental ministry experiments with cloud seeding to induce rain and clear out pollution. The strategy is effective, but citizens in northern China catch wind of the program and accuse Mongolia of “stealing rainfall.”

Chinese researchers develop a fine-tuned model from the Codec to detect cloud seeding, trained on satellite imagery and air chemical analysis. The model is open-sourced and picked up by a team of university researchers in Somalia. While playing around with the model, they discover that cloud seeding is also occurring in drought-stricken farmlands by agricultural companies looking to increase crop yield. Sensors proliferate in the region and further fine-tuned models are developed to detect weather modification.

Forest
Blight

Canada’s monocultural timber forests are plagued by pine beetles and their symbiotic fungi. In order to mitigate risk, insurance companies implement a proprietary model for predicting imminent beetle diffusion. This information is kept from landowners, who see large hikes in their insurance premiums in areas where the beetle is predicted to spread. Insurance companies are reluctant to prevent further infestation as they profit from premium increases.

Landowners learn that acoustic monitoring data, combined with site-specific chemical measurements, can predict beetle spread. Pooling their data, they fine-tune the Codec to produce a better model than the insurance companies and enable targeted treatment of the beetle infestation. High-resolution predictions enable targeted eradication of the beetle infestation, improving forest health. To encourage participation in the network, access to the model is granted based on contribution; one must install sensors to increase the fidelity of the network.

Gene
Drive

CRISPR-Cas9 has enabled a genetic engineering technique known as a gene drive: a particular suite of genes can be propagated through a population, permanently changing the genome of an entire species. There is an international moratorium on gene drives, but it is notoriously difficult to regulate. The European Union uses the Codec to implement a large-scale nucleic acid observatory: a mesh network of water sensors that conduct metagenomic sequencing of the surrounding watershed. Federated learning prevents sensitive genomic information from being compromised, as edge devices share only learned weights to the fine-tuned model.

In the United Kingdom, there is popular resistance to what is perceived as a mass surveillance system. Politicians reject the fine-tuned model in favor of a homegrown system based on national data. In the EU, the program proves effective in preventing new gene drives. In the EU, the program proves efficient in preventing new gene drives; any modifications to a metagenomic baseline are immediately detected. Meanwhile, the UK model lacks the scale necessary for its performance to meet the same benchmark.

An ecoterrorist group exploits this vulnerability by engineering a gene drive that modifies beef cattle in order to render their meat tough and inedible, reducing meat consumption and related emissions. Farmers are bailed out and the UK concedes to participation in the Whole Earth Codec’s nucleic acid observatory to prevent future incidents.

Conclusion

As James C. Scott explains in Seeing Like a State, what can be seen by a state, platform, or protocol determines what can be managed, pillaged, or profited from. While the Whole Earth Codec has been framed as a new model of the observatory, it would be remiss to think its observation passive. All three case studies exhibit a recursive dynamic: the act of observation re-discloses the biosphere as an informatic landscape, a technogeography wherein the biosphere and technosphere co-constitute one another.

The Whole Earth Codec complicates this dynamic by introducing an appendage for processing, integration, and newfound synthesis. Capable of universal transfer among data streams, the codec trades cybernetic loops for n-dimensional knots. The future trajectory of foundation models remains opaque, but it is of paramount importance that they be as representative as possible of the entire spectrum of information the planet produces about itself.

Guided by planetary-scale sensing rather than myopic anthropocentrism, the Whole Earth Codec opens up a future of ambivalent possibility within high-resolution prediction and newly interoperable intelligence.