Could I publish my DNA?

That was the question I asked on an Automattic internal P2 discussion board before I left the company.

Not as a metaphor. Not as an artistic rendering. Just my raw genetic code, stored as text, displayed on a WordPress page. Could I publish my DNA? Would it be technically possible?

The human genome clocks in around 3.2 billion base pairs. That’s 3.2 billion letters — A, T, C, and G — if you’re writing it out. As plain text, that would be roughly 3.2GB. A hefty blog post, but technically feasible.

But could it be compressed? At Automattic, the discussion quickly veered into encoding schemes—could we store DNA more efficiently? Since there are only four bases, it was suggested that two bits per base could work.

That would cut storage requirements down, we thought. Still large, but more manageable.

The two bit scheme went something like this…

So there are 3.2 billion base pairs. Though there are 4 types of nucleobases (A, T, C, and G), the only four possible combinations are AT, TA, CG, and GC.

That means you need two bits to describe a single base pair, and 3.2 billion of these two-bit numbers. So in theory, you’d need (3.2B * 2)/8 bytes, which is roughly 745 MB.

But because you want to store it as text, things will be a little different.

Each text character takes one byte to store, and that’s the smallest possible unit. The most straightforward way to do this would be to use one character for each of your DNA base pairs. That will mean 3.2 billion bytes, or roughly 2.98GB.

However, you can get clever, by assigning an one byte character to a sequence of 4 base pairs.

For example, if you declare that…

AT = 00
TA = 01
GC = 10
CG = 11

You can create a table like this:

CharacterBase pair sequence
a00000000
b00000001
c00000010
d00000011

…and so on.

In that case, because there are 28 possible rows, you end up with 256 unique variations, which is conveniently the number of possible characters in extended ASCII. I am not exactly versed in how characters are stored in WP
databases, but you could have a single character per line of that table. And that table would be used to decode the written text in extended ASCII to read the DNA.

All that to say, you could just need 3.2B divided by 8 bytes to represent your DNA sequence using the extended ASCII set. Which takes us back down to 745MB.


A Small Example Explored

When exploring this concept and post later, I wanted a hard fast example written out to make sure I fully understood this.

An example of how this would look is cab written in a WordPress website, would mean 00000010 00000000 00000001. Or AT AT AT GC AT AT AT AT AT AT AT TA if we are mapping to that ASCII table. Every letter written on the site becomes data.


Exponential Exploration of Nucleobase Combinations

I needed to explore the concept of nucleobases a bit more to see if I was understanding this correctly as well so I popped over to ask a question to ChatGPT-4o.

How many possible combinations of nucleobases are there?

Anna

The number of possible combinations of nucleobases depends on how long the sequence is.

Since DNA has 4 nucleobases (A, T, C, G), here’s how combinations work:

  • For a 1-base sequence:
    → 4 combinations (A, T, C, G)
  • For a 2-base sequence:
    → 4 × 4 = 16 combinations (e.g., AT, CG, GC, etc.)
  • For a 3-base sequence:
    → 4 × 4 × 4 = 64 combinations
    (which is interesting because codons—three-base sequences—are what the body uses to code for amino acids)
  • For a sequence of N bases, the number of possible combinations is:
    4^N

So for the entire human genome (~3.2 billion base pairs), the number of possible unique DNA sequences is:

4^3,200,000,000
→ A truly astronomical number. Like, more-than-stars-in-the-universe level astronomical.

ChatGPT-4o

Epigenetic Data

Back on the Automattic P2 post, someone asked: What about publishing epigenetic data? DNA is just the blueprint, but epigenetics—how genes are expressed—adds another layer. That dataset would likely be in the terabyte range.

And then I wondered: What if an AI could query it? If I published my DNA, could someone—someday—run a natural language search on it? Could we build a search engine for ourselves?

How much of who we are could be reduced to data? And what happens if/when we start treating ourselves as queryable databases?

This is when the project stopped being so hypothetical and veered into considerations if the actual publication were to happen.


More and More of Ourselves

What would it mean for us if we start digitizing our DNA? More and more of ourselves.

What does it take to capture a picture of a human. Is it the DNA, the epigenetic data, the lived experience. Or a combination of all that. And something more?

I didn’t publish my DNA then. I haven’t published it now. But the questions lingered. If I could encode it, make it queryable, make it public — would I? Should I?


And what are the risks?

Your genome can reveal predispositions to diseases like cancer, heart disease, neurological conditions, and more. Once it’s public, you can’t take it back — even if you delete the site, it could be archived or copied.

Your DNA overlaps with your family. By publishing yours, you may inadvertently expose traits or risks in them — without their consent. It could be used for genealogical tracing, identification, or even forensic investigations without your knowledge.

In the U.S., the Genetic Information Nondiscrimination Act (GINA) prevents health insurers and employers from using genetic data against you.

But GINA does not cover:

  • Life insurance
  • Disability insurance
  • Long-term care insurance

Insurers could potentially refuse coverage or increase premiums based on publicly available data.

AI models may someday analyze public genomes in ways we can’t imagine yet. Your data could be included in datasets without your knowledge, used in predictive models, cultivated, spliced, or utilized in ways we haven’t even considered or may not even comprehend yet.

You have the right to publish your DNA — but does anyone else have a say if they share part of it? How do we or should we try to balance individual autonomy with family and community privacy?

WordPress is public and indexable. Once it’s out there, your genetic blueprint becomes part of the searchable internet. Should we treat DNA data as sacred, like medical records? Or open-source, like code?

If making your own data public can be used against you, then maybe the problem isn’t transparency — it’s the system. A healthcare model that punishes disclosure, punishes risk, and prioritizes profits over people is already broken. This may be just one way to see the cracks more clearly.


And what if it’s already happening?

Consumer genetic testing. Wearables. Health apps. Biometric scans. Most of us are probably already inside datasets we’ve never seen. Aggregated, anonymized, then reassembled — de-anonymized again. Sold. Modeled. Searched. Trained on.

So maybe publishing your genome isn’t radical. Maybe it’s just being honest.

If the data is out there, at least I want it to have my voice.


Ethics of Open-Source Humanity

Publishing your genome could be an act of radical transparency, a statement about digital identity, or a form of biological autobiography.

But if this catches on, what norms or tools do we need to protect people and support ethical publishing? And should there be an opt out feature, even after opting in?


Ripple Effects

Publishing your DNA could be like releasing the source code to your body and your lineage. Some might consider it bold, curious, and potentially revolutionary — but the long-term impacts go beyond the individual.

But don’t all of our actions, to some extent?


2 responses to “Biological Autobiography: the Right to Publish (in 3.2 Billion Characters)”

  1. washington sean Avatar
    washington sean

    Good post. I think you are really poking around some interesting questions especially as they pertain to digital twinning, web3.0 and tokenomics towards an emergent super-organism. How our individual DNA as one of the ‘nodes’ in this super-organism might be available for both ‘read’ and ‘write’ capabilities is an important consideration in the changing nature of computing, as we are moving from silicon based to more biological solutions.

    According to Decible.Bio (a new bio tech firm) “A plants DNA is like firmware: the basic plans for building plants…” and “Epigenetics is the control software for when and how those plans are used.”

    https://www.decibel.bio/technology

    But what is largely missing from the more materialist perspective on what comprises a material human (or a plant) is what David Chalmers famously termed the “hard problem of consciousness.” What is consciousness and how is it related to the material manifestation of any given number of base pairs of DNA?

    Here are some abstracts and a brief article on DNA as storage:

    https://www.nature.com/articles/s41570-024-00576-4

    https://link.springer.com/article/10.1007/s13534-024-00386-z

    https://www.tomshardware.com/pc-components/storage/ai-breakthrough-claimed-to-make-dna-data-retrieval-3-200x-faster-with-better-accuracy-but-still-slower-than-standard-storage

    Also — I am prone to believe that AI models already analyzing public genomes in ways we can’t imagine. See WasteWaterSCAN program as developed by Michael Bloomberg, Brin Family Foundation, Stanford and Emory — it is a giant shit computer that can find even more than the proverbial needle in the haystack…It can find the worlds tinniest violin playing a solo concert on the tip of that needle.

    1. Anna AM Avatar

      Are we sovereign organisms or are we just nodes in a data commons?

Leave a Reply

Discover more from Anna.logy

Subscribe now to keep reading and get access to the full archive.

Continue reading