Your Digital Legacy: How Long Will It Last?
“I mean, they say you die twice. One time when you stop breathing and a second time, a bit later on, when somebody says your name for the last time.” ― Banksy
“I wonder where the oldest gravestone is?” Have you ever played this game when you were a child visiting the cemetery? Walking around trying to find the 1st person that was buried and remarking in awe at how many decades (or centuries) they lived before you?
I’ve searched many times and sometimes came upon a dead end—a gravestone so weathered that even moving my hands across its face, I could not tell if the year ended in a 6 or an 8 or a 9. While there was still a stone that stood as the last possible representation of that person’s life, the most fundamental piece of data we use as an identifier, their name, had been forever erased from this earth. Sarah Smith or Jack Jones or Frank Fenneheimer from 1805-1888: their story now represented by a stone atop a pile of dirt atop a box containing an unidentifiable skeleton.
“Everything on the Internet is Permanent.”
Of course, we think the pendulum has swung the other way in the digital age. It's trivial to copy 1’s and 0’s of our digital present (e.g. photos, videos, emails, texts) and transfer them to a million different machines. This can be good and bad. It’s great that we no longer have to rely on the limitations of our biological brains to augment our memories. The tricky part is that people can dredge up things we’d rather forget about: embarrassing moments, things we are not proud of, etc.
These negative aspects have been played out in the media, particularly with among celebrities. These then serve as examples to point to when people warn others that “once you put something out there it’s out there forever.” One cited resource is The Wayback Machine by the Internet Archive. This organization effectively stores snapshots over time of publicly available websites. It’s not a perfect solution by any means. However, it does a reasonable job of capturing most text and image-based media and allowing it to be (reasonably) searchable.
However, this persistence and archiving of ALL data every posted to the web is not absolute. In fact, I would argue that it is becoming the rare exception versus the default behavior. Let’s start systematically challenging this belief.
Data Production is Exploding
This infographic highlights an absurd amount of data being generated, transferred, and consumed every minute in 2018. And it doesn’t even address the fact that every minute, over 500 hours of video are uploaded to Youtube. A non-profit organization is unlikely to expand their storage capacities by 15GB per minute to accommodate the compressed version of this information. Want to take on uncompressed HD? Now we’re looking at 1.5TB to 15TB per minute.
Now take into consideration that it’s 2019. The rate of data production will continue to double every 12-18 months. Eventually, this will outstrip the doubling of storage capacity (at least based on present scaling). This means that at some point we will need to both ruthlessly filter what is important enough to store and ruthlessly purge what we want to continue to keep. Sure, advances in technology will occur and unlock a significant amount of additional capacity. However, our capability to produce data (particularly in the age of 4K video, virtual reality, blockchain technologies, etc.) may continue to outpace these advances indefinitely.
When my daughter was born, we immediately went into CAPTURE ALL THE MOMENTS mode and were taking a ridiculous amount of photos per day. And because digital photos are so cheap, it’s not uncommon to take 15 different photos at slightly different angles while trying to find the perfect one. Later on, it might be necessary to go back and purge to free up memory.
People get bored or tired and move on. They close down Facebook or Flickr accounts. They shut down websites that were experiments or that served a different phase of their life that they no longer connect with. They give away and donate old hardware and potentially erase hard drives before they go out the door. Not all data gets kept.
People Lose Data
I’ve been an advocate for having multiple backups for the last 15 years. Unfortunately, in my haste to upgrade a laptop, I was haphazard in my process, and I lost 100,000 emails. This was a slightly traumatic experience because these messages contained conversations and memories that I can no longer access because I never committed them to long term memory. I thought I would have always been smart enough to keep a copy.
However, mistakes happen. People fat finger a command or forgot to replace a credit card for a backup service. Shit happens, and hard drives get lost or broken or stolen. Passwords get compromised, and hackers can intentionally break in and wipe machines clean.
Setting aside the strict, scientific definition of the 2nd Law of Thermodynamics, the basic rule of thumb here is it takes ongoing energy and effort to maintain order. Consider that 2 decades ago, we managed photos in photo albums. Now we might have photos taken from multiple devices (phone 1, phone 2, tablet 1, etc.) and stored in multiple locations (Photos, Facebook, Instagram, Twitter, Flickr, etc.). Each of these storage services may be free or paid while being subject to changes in terms of service, company ownership, etc. Fast forward 2 decades later and many of those companies will have closed shop, changed business models, or revamped their services.
Tumblr and Flickr recently caused a bit of a stir to
when each announced major changes to their services. Tumblr stopped updating the iOS app, and its service stability started
service announced they were either closing completely (Tumblr) or that they were now charging a fee for what used to be unlimited free hosting (Flickr). This caused a mass exodus of content, much of it lost along the way out. However, can you blame either of these services? To maintain these services, developers and IT operations has to continually maintain and upgrade servers, operating systems, application dependencies, security patches, etc. Technology keeps evolving, and there is energy required to keep these services up to date and online.
LinkedIn has effectively become a repository of living resumes. However, much of that information can be gated behind a login screen that prevents archival bots from accessing it all. Of course, there are easy ways around that, but that’s only for free services. Some sites require a paid membership. Some even go further requiring approval before accessing. And of course, there are company intranets and services that should forever remain behind a login screen to protect a company’s IP. The point is that not all data ever crosses the point where it’s accessible to the general public.
Another term for this “gated content” is that of the Deep web - Wikipedia, where not all information is indexed and therefore findable through a search term. You either need direct access and/or authentication to get to it.
The Dark Web
Unlike the public web (which can be accessed by most standard browsers), other sites require special tools and authorization. Again, this means that data is not only more challenging to obtain, but unless it’s copied to the open web, it’s unlikely to be archived.
The Right to Be Forgotten
Since 2006, discussions around "The Right to Be Forgotten" have made their way into legislation like GDPR, where individuals can request identifying information to be removed from the system (under penalty of stiff fines if not complied with). In a sense, this is just the beginning of laws and legislation to ban or purge certain types of information from the web.
The NSA (aka “No Such Agency”) may find sensitive information exposed in systems like the Internet Archive. Are they just going to wipe their hands of it and say “welp, it’s already out there.” Absolutely NOT! These agencies can effectively use the force of law to purge information deemed sensitive.
Even Archives Need Archives
Assume everything said thus far was a non-issue. Let’s imagine The Wayback Machine had a near infinite bucket of storage. Let’s assume it could find and copy every single bit of information that was ever connected to the Internet. Let’s even assume it had double or triple redundancy on every bit of information. What happens if a 0-day virus attacked its computers? What happens if there was a security breach by hackers? What if there was an inside job? What if there was a programming error that corrupted data? What if future features (AI) made a unilateral decision to remove data based on reasons no human could conceive of? While the likelihood may be 0.01% per year, it’s still there, and over enough time the potential is there.
Nothing Lasts Forever
The beautiful arches in Moab are just breathtaking. Millions of years of plate tectonics followed by the upheaval of earth’s crust followed by weathering from wind and water. These arches will likely last long after humans no longer exist, but they too will eventually fall apart given enough time and wear.
Our legacy is much more ephemeral. Our family trees may exist in print stored in a library basement. Coincidentally, this is where a flood damaged the oldest existing records of my mother’s grandmothers records). Or our family trees may exist in hard drives scattered around the world (Ancestry.com). Still, many records are even more fragile: photos uploaded to Instagram, a post on Reddit, or a video on Facebook. Some of these may last a year or a decade, but many will not persist as long as we live. I aspire to live to be 100, which means I hope to keep my digital identity for the next 60 years.
That’s a long time! Most people probably don’t concern themselves with these things. What matters is the moment in an active conversation in Twitter. Still, there will be a time (there always is) when you go looking for something that was said a while back. It will be difficult, if not impossible, to find it. What then?
While it may be an impossible task to make my content live for 60 years, I still run a personal blog because I know that I’m more in control of my destiny if I own the platform that hosts my data. However, this is not a perfect strategy. My Drupal 7 site will eventually become unsupported by the community. My hosting provider will eventually not allow me to host it because of a lack of security upgrades. What then? I want my next solution to be one that can persist in the long haul without requiring high monthly fees or a lot of effort on my part to manually maintain it.
Hence my desire to experiment with HAXCMS. It might be the best balance between the authoring experience, accessibility, and long term archiving of my digital legacy. Until then, I leave you with one final challenge.
How long will your digital legacy last?
Thanks for reading! If you enjoyed this, can I ask you for a favor?
I would like the opportunity to connect with you on an ongoing basis with the intention that I continue to provide you with valuable information and insights to help transform your life personally and professionally. To that end, it would mean a lot to me if you performed one or more of the following.
- Sign up for my newsletter to get new articles sent right to your inbox.
- Follow me on Twitter or connect with me on LinkedIn. Don't forget to say hi!
- Share this article with anyone that might benefit from it.
Thanks again for your time and attention! It means the world to me to know that you gave me this opportunity to connect with you.
About Rick Manelius
Quick Stats: CTO of Contact Mapping. Author of Winning the Lottery Within. Graduated from MIT in '03 (BS) and '09 (PhD). Life hacker and peak performance enthusiast. This blog is my experiment in creative writing, self-expression, and sharing what I've learned along my journey. For more information, read my full bio here or contact me.