Could I publish my DNA?
That was the question I asked on an Automattic internal P2 discussion board before I left the company.
Not as a metaphor. Not as an artistic rendering. Just my raw genetic code, stored as text, displayed on a WordPress page. Could I publish my DNA? Would it be technically possible?
The human genome clocks in around 3.2 billion base pairs. That’s 3.2 billion letters — A, T, C, and G — if you’re writing it out. As plain text, that would be roughly 3.2GB. A hefty blog post, but technically feasible.
But could it be compressed? At Automattic, the discussion quickly veered into encoding schemes—could we store DNA more efficiently? Since there are only four bases, it was suggested that two bits per base could work.
That would cut storage requirements down, we thought. Still large, but more manageable.
The two bit scheme went something like this…
So there are 3.2 billion base pairs. Though there are 4 types of nucleobases (A, T, C, and G), the only four possible combinations are AT, TA, CG, and GC.
That means you need two bits to describe a single base pair, and 3.2 billion of these two-bit numbers. So in theory, you’d need (3.2B * 2)/8 bytes, which is roughly 745 MB.
But because you want to store it as text, things will be a little different.
Each text character takes one byte to store, and that’s the smallest possible unit. The most straightforward way to do this would be to use one character for each of your DNA base pairs. That will mean 3.2 billion bytes, or roughly 2.98GB.
However, you can get clever, by assigning an one byte character to a sequence of 4 base pairs.
For example, if you declare that…
AT = 00
TA = 01
GC = 10
CG = 11
You can create a table like this:
| Character | Base pair sequence |
| a | 00000000 |
| b | 00000001 |
| c | 00000010 |
| d | 00000011 |
…and so on.
In that case, because there are 28 possible rows, you end up with 256 unique variations, which is conveniently the number of possible characters in extended ASCII. I am not exactly versed in how characters are stored in WP
databases, but you could have a single character per line of that table. And that table would be used to decode the written text in extended ASCII to read the DNA.
All that to say, you could just need 3.2B divided by 8 bytes to represent your DNA sequence using the extended ASCII set. Which takes us back down to 745MB.
A Small Example Explored
When exploring this concept and post later, I wanted a hard fast example written out to make sure I fully understood this.
An example of how this would look is cab written in a WordPress website, would mean 00000010 00000000 00000001. Or AT AT AT GC AT AT AT AT AT AT AT TA if we are mapping to that ASCII table. Every letter written on the site becomes data.
Exponential Exploration of Nucleobase Combinations
I needed to explore the concept of nucleobases a bit more to see if I was understanding this correctly as well so I popped over to ask a question to ChatGPT-4o.
How many possible combinations of nucleobases are there?
Anna
The number of possible combinations of nucleobases depends on how long the sequence is.
Since DNA has 4 nucleobases (A, T, C, G), here’s how combinations work:
- For a 1-base sequence:
→ 4 combinations (A, T, C, G)- For a 2-base sequence:
→ 4 × 4 = 16 combinations (e.g., AT, CG, GC, etc.)- For a 3-base sequence:
→ 4 × 4 × 4 = 64 combinations
(which is interesting because codons—three-base sequences—are what the body uses to code for amino acids)- For a sequence of N bases, the number of possible combinations is:
→ 4^NSo for the entire human genome (~3.2 billion base pairs), the number of possible unique DNA sequences is:
4^3,200,000,000
ChatGPT-4o
→ A truly astronomical number. Like, more-than-stars-in-the-universe level astronomical.
Epigenetic Data
Back on the Automattic P2 post, someone asked: What about publishing epigenetic data? DNA is just the blueprint, but epigenetics—how genes are expressed—adds another layer. That dataset would likely be in the terabyte range.
And then I wondered: What if an AI could query it? If I published my DNA, could someone—someday—run a natural language search on it? Could we build a search engine for ourselves?
How much of who we are could be reduced to data? And what happens if/when we start treating ourselves as queryable databases?
This is when the project stopped being so hypothetical and veered into considerations if the actual publication were to happen.
More and More of Ourselves
What would it mean for us if we start digitizing our DNA? More and more of ourselves.
What does it take to capture a picture of a human. Is it the DNA, the epigenetic data, the lived experience. Or a combination of all that. And something more?
I didn’t publish my DNA then. I haven’t published it now. But the questions lingered. If I could encode it, make it queryable, make it public — would I? Should I?
And what are the risks?
Your genome can reveal predispositions to diseases like cancer, heart disease, neurological conditions, and more. Once it’s public, you can’t take it back — even if you delete the site, it could be archived or copied.
Your DNA overlaps with your family. By publishing yours, you may inadvertently expose traits or risks in them — without their consent. It could be used for genealogical tracing, identification, or even forensic investigations without your knowledge.
In the U.S., the Genetic Information Nondiscrimination Act (GINA) prevents health insurers and employers from using genetic data against you.
But GINA does not cover:
- Life insurance
- Disability insurance
- Long-term care insurance
Insurers could potentially refuse coverage or increase premiums based on publicly available data.
AI models may someday analyze public genomes in ways we can’t imagine yet. Your data could be included in datasets without your knowledge, used in predictive models, cultivated, spliced, or utilized in ways we haven’t even considered or may not even comprehend yet.
You have the right to publish your DNA — but does anyone else have a say if they share part of it? How do we or should we try to balance individual autonomy with family and community privacy?
WordPress is public and indexable. Once it’s out there, your genetic blueprint becomes part of the searchable internet. Should we treat DNA data as sacred, like medical records? Or open-source, like code?
If making your own data public can be used against you, then maybe the problem isn’t transparency — it’s the system. A healthcare model that punishes disclosure, punishes risk, and prioritizes profits over people is already broken. This may be just one way to see the cracks more clearly.
And what if it’s already happening?
Consumer genetic testing. Wearables. Health apps. Biometric scans. Most of us are probably already inside datasets we’ve never seen. Aggregated, anonymized, then reassembled — de-anonymized again. Sold. Modeled. Searched. Trained on.
So maybe publishing your genome isn’t radical. Maybe it’s just being honest.
If the data is out there, at least I want it to have my voice.
Ethics of Open-Source Humanity
Publishing your genome could be an act of radical transparency, a statement about digital identity, or a form of biological autobiography.
But if this catches on, what norms or tools do we need to protect people and support ethical publishing? And should there be an opt out feature, even after opting in?
Ripple Effects
Publishing your DNA could be like releasing the source code to your body and your lineage. Some might consider it bold, curious, and potentially revolutionary — but the long-term impacts go beyond the individual.
But don’t all of our actions, to some extent?

Leave a Reply