No Way Out

The Wayback Mission: Mark Graham on Preserving the Past with the Internet Archive

Mark McGrath and Brian "Ponch" Rivera Season 3 Episode 108

Send us a text

Unlock the secrets of the past and secure the future with No Way Out! Hosts Moose and Ponch dive into the digital frontier with Mark Graham, a veteran of the U.S. Air Force and a key figure at the Internet Archive, a nonprofit founded by Brewster Kahle in 1996 to preserve the web and humanity’s published works. Often called a modern-day Library of Alexandria, the Internet Archive has grown from its early days of recording the web to archiving treasures like Grateful Dead bootlegs and Pentagon news clippings. This episode explores the challenges of saving history in the digital age—from emulating lost tech like Flash to harnessing AI’s potential to tackle humanity’s biggest issues. It’s a must-listen for curious minds eager to understand how we archive the web and why it matters. Tune in to discover how the past shapes tomorrow and why preserving it is key!

Mark Graham on LinkedIn

Internet Archive 


AGLX Confidence in Complexity short commercial 

Stay in the Loop. Don't have time to listen to the podcast? Want to make some snowmobiles? Subscribe to our weekly newsletter to receive deeper insights on current and past episodes.

Find us on X. @NoWayOutcast

Substack: The Whirl of ReOrientation

Want to develop your organization’s capacity for free and independent action (Organic Success)? Learn more and follow us at:
https://www.aglx.com/
https://www.youtube.com/@AGLXConsulting
https://www.linkedin.com/company/aglx-consulting-llc/
https://www.linkedin.com/in/briandrivera
https://www.linkedin.com/in/markjmcgrath1
https://www.linkedin.com/in/stevemccrone


Recent podcasts where you’ll also find Mark and Ponch:

The No Bell Podcast Episode 24
...

Mark Graham:

The man who came up with Frank Burns. Colonel, frank Burns was a mentor of mine and Frank was part of that team. He was the guy who came up with the phrase be all you can be, be all you can be.

Brian "Ponch" Rivera:

yeah, which the Army you know, that was the greatest. That was the most well-known ad slogan at the time that they discontinued it for an Army of One, which was unbelievable.

Mark Graham:

That's really something. An army of one which was unbelievable.

Brian "Ponch" Rivera:

That's really something. Yeah, there was a General Stubblebine that was an intelligence officer, that was you can start recording this.

Mark Graham:

You are recording it actually, right? Oh, yeah, we are Okay.

Brian "Ponch" Rivera:

Yeah, we are because it got too interesting, it did.

Mark Graham:

Very quickly. Well, I don't know Stubblebine, but the person I'm most familiar with is Jim Shannon, and Jim Shannon was the creator of the First Earth Battalion and I have some of their paraphernalia. Like I say a yes, okay, so people do that. I'm pretty sure we have First Earth Battalion material there. If we don't, I'll definitely have to remedy that. Maybe someone could check that. But there was the main text that described the theories.

Brian "Ponch" Rivera:

Yeah, that's the kind of stuff that Punch and I certainly talk about. I mean, we talk about a lot of things that would be not mainstream, I guess, when it comes to military strategic thought, maybe like the incorporation of, say, psychedelics to treat PTSD, or you know he didn't know himself was such a controversial figure because he was not an establishment type person Right and, in particular, this organization.

Mark Graham:

it was actually called Delta Force, I think it was called and not to be confused with the Delta Force the fast action alert team, and this was, I think, in the 70s, and they had a mandate to explore the human potential movement, yes, which was blossoming in California. You know, hotbeds of activity included Esalen, for example, and people like Barbara Marks, hubbard, and certainly psychedelics was a big part of that, but also neuro-linguistic programming and NLP and, as Frank has explained to me, that at one point he had given NLP courses in the Pentagon for some of the upper brass and that a congressperson caught wind of this and required him to then give complementary courses to congresspeople, the idea being you know, this is kind of getting into this whole question of, like you know, asymmetric warfare that we don't want to have. You know, we want to have both sides, if you will, of the equation have access to the same technology, the same tools.

Brian "Ponch" Rivera:

Interesting. Yeah, because it would have been contemporary to Boyd in his work. With Boyd's work, specifically, that same discussion occurred. You know, should we classify everything? So, like everything that we're versed in, that we read and we've read in the archives of the Rook Reader University in Quantico? You know Boyd was going to classify everything because, to your point, you know, you don't want the enemy to get this type of thinking in their hands.

Mark Graham:

You know, yeah, yeah, interesting. So on the topic of go ahead, oh no, it's okay, we're just starting to get to know each other here a little bit. You know, yeah, and I don't know, disclose, I'm a veteran as well.

Brian "Ponch" Rivera:

Oh, what branch.

Mark Graham:

US Air Force right Air Force yeah, air Force, just a short tour, like John Boyd, I guess so, except I wasn't a colonel, I was a senior airman when I left. I did a short four-year tour from 79 to 83. Okay, all right, well, I started out training to be a linguist and switched over to computers, so I spent a little more than three years in the Pentagon.

Brian "Ponch" Rivera:

We're really glad to meet you in our mutual connection of Mike and the Library of Alexandria is a topic that comes up a lot with the burning down and the archives weren't preserved and that's something that you're hard at work at Very much so Very much.

Mark Graham:

so. You know, I was actually at the Computer History Museum out here in Silicon Valley on Sunday and the founder was there. He'd been there for like 30 years and he says to me he goes you know, mark, you know, I hope you're taking this seriously. I mean, this is really important stuff you guys are doing and then did kind of you know, catch me up a little short, like take stock of it, like said you know, this is our modern day. You are the internet archive, you know, not me personally the whole organization of 120 some people. You guys are the modern day library of Alexandria.

Mark Graham:

And so it is important that you take care of it. Yeah, I'm sorry, yeah.

Mark McGrath:

No. So on that, mark Graham from the Internet Archives, I want to go back in time, thomas Jefferson talking about a virtual network that transmit ideas across time and space. We know about the. We brought up the Library of Alexandria. We also know that Thomas Jefferson had a lot to do with how it influences information flow and how we are excited about information flow flow and how we are excited about information flow. In fact, thomas Jefferson dreamt about a library that would provide unfettered access to the sum of human knowledge, right? So this continuation of that through your work, which I believe is what you're doing, is how do you ensure that human knowledge survives through time?

Mark McGrath:

So, starting there kind of walk us through how the archive came about. Just a light one.

Mark Graham:

Yeah, we're just going to start out of the shallow and kind of get deeper. No well, first of all, we work to ensure or to increase the probability of. You know, we tend to avoid absolutes. So I'll just say that from the start, despite the fact that we have this mission statement, which is all about absolutes and superlatives, which is universal access to all knowledge. So the origin of this venture goes back to a guy named Brewster Kahle, who started two companies on the same day, one, internet Archive, as a nonprofit and the other one, alexa Internet, as a for-profit and um, and he ended up selling alexa internet to amazon, jeff, jeff bezzo. So then, uh, worked for jeff for a while, and alexa internet was one of the very first finding tools, if you will, for the emerging internet.

Mark Graham:

This is, we're talking, the very, very beginning days of the web and actually kind of the tail end of Gopher.

Mark Graham:

Before that, brewster had founded a company called Waze, white Area Information Service that he sold to AOL back before AOL had ventured into the web as we know it today. So anyway, he ended up selling Alexa Intranet to Jeff Bezos for a fair amount of money and was then able to dedicate all of his attention to Internet Archive and, from the very early stage in the beginning, had this mission, this audacious idea of universal access to all knowledge. And, in particular, what he did was, in effect, he hit record on the new media of the web. So we've been working every day since then, for the last 28 plus years, getting better at that, getting better at recording more material that's made available via the web, and in that time, have expanded and diversified into a range of other media that human beings publish in. Our mandate is to work toward acquiring and, if necessary, digitizing, preserving, organizing and making available the published works of humankind, and so that's what we get up every day and work toward. I could go on.

Brian "Ponch" Rivera:

No, I mean, I use it a lot right. I use the archive a lot. I think that, well, maybe we should start too. We could talk about some of the detractors that are trying to stop what you're trying to do, because there is such an imperative to not only preserve that knowledge but also to make it available so more people can gain not just knowledge but also understanding.

Mark Graham:

Yeah, you, yeah, I guess we could go there if you wanted to. But actually, why don't we set the groundwork a little bit around the media? Because I mentioned that we certainly work to archive much of the public web more than 1 billion URLs a day. So I think this year we'll probably exceed more than 1 trillion web pages archived actually, and a web page can be made up of lots of individual URLs. So we're coming up on a bit of a landmark later in 2025. And I can dig more into the web archiving work that we do. But in addition to that, we archive books and academic papers and material on microfiche and microfilm and vinyl 78s, cds, television, in particular television news, radio, radio, news programs and other obscure media as it comes along.

Brian "Ponch" Rivera:

I love. The CIA reading room is one of my favorite places. I also like that you have all the DTIC, so a lot of the white papers that we use are in there.

Mark Graham:

Yeah, defense Technical Information Center, I think. Dti service service. Yeah, actually, you know, my favorite, okay, is FBIS. Do you know that one? No, oh, my God, okay, this is going to blow your mind.

Mark Graham:

So FBIS is the Foreign Broadcast Information Service, and this was an effort that started well, in some ways, it started about 100 years ago and it took on different names and different forms over time. For many decades it was a collaboration between the US and Britain they had their own version of this and for many decades it was a division of the CIA, an unclassified activity of the CIA. Actually, a version of it existed before the CIA was formed too. What it did was it employed at one point, more than a thousand analysts around the world to pay attention, to collect, scan, synthesize publicly accessible news and then send it back to Washington, where it'd be put together and then turned into a variety of kinds of reports. So, foreign broadcast information service, it was the broadcast from around the world on radio, on television, on newspapers, et cetera, and so, therefore, policy people and others in Washington, they could have some sort of an idea of, in fact, what the world was thinking, what people were being told was a reality in different languages and in different geographies.

Mark Graham:

I did a lot of work. When I got out of the Air Force in 83, I shifted my attention to the nuclear war problem and was active in a variety of nuclear war-related activities. I created a computer network for activists called PeaceNet and I got involved in a lot of US-Soviet citizen diplomacy work, which included traveling to the Soviet Union and bringing American-published books there, for example, and connecting with other people that were working with at that point, the emerging internet, and helping to establish and then operate an email service, the first commercial email service into and out of the Soviet Union. So in any ways I was very interested in what was happening in the Soviet Union in the 80s and one of the primary ways that I could do that was by getting access to these FBIS reports that my government had the good sense to pull together. That gave us an insight into what was happening there.

Brian "Ponch" Rivera:

So it gave more perspectives to help with the Yep. Yeah, I have to tell you too, I am the one, the other, I guess my favorite collection is you know, I'm a oh my God yeah.

Mark Graham:

Grateful Dead.

Brian "Ponch" Rivera:

Collection on archive Rock and roll. Oh, it's unbelievable.

Mark Graham:

I think we have, I don't know I almost tracked 13,000, maybe something like more than 10,000 concerts there. Yeah.

Mark Graham:

The story behind that was, of course, as you know, they're from San Fran, right? Oh yeah. But the Grateful Dead community, where some may argue, are the creators, the originators, progenitors of this idea of sharing a sharing economy. And, in particular, there was the people. Tapers would create tapes I think at that point it was mostly cassette tapes, physical cassette tapes, cassettes and dats and dats, yeah. And then they would pass them around, they would share them, and apparently the band was fine with this as long as no one made any money on it. They were like, you know, yeah, like share and share alike, and so they did this for a long time. And then, at one point, you know, brewster offered them unlimited storage, forever for free, and one of the main people was like I don't believe you, I don't know. No, I appreciate, try me. Just, you know, whatever. And they did and like, as you're experiencing, it works. That's unreal.

Brian "Ponch" Rivera:

So today, now, decades later, yeah, it's probably still that, one of the one of the things we're most well known for within certain well, I can, you know, I'm a, I'm high school, 94 right, but I I can remember filling shoe boxes full of bootlegs and then like passing them around and trading them, and then you could only get the tapes and then you would, you would, you would you know record, uh, and like the dual cassette decks or whatever yeah, side by side.

Brian "Ponch" Rivera:

I had one of those yeah, and then when I look at that collection, I just say this is what. What I would have imagined would have been like star trek level in my own life when I came to the grateful dead libraries.

Mark Graham:

Absolutely it's unbelievable passion. You know, yeah, that's, that's great. So, yeah, we got a lot of that. We have a lot of not just the grateful dead. We have a very large collection of of live concerts and these these are concerts that have been uploaded by band members. What, what are some other things?

Brian "Ponch" Rivera:

like nuggets that, like you know, you talk about fbis, we talk about cia reading room dtic grateful dead like what are some other? Real treasure troves, because when you do go on the archive, if you don't have a map, I mean it is a lot of stuff. So how, what are some other like just amazing things you would guide people to and say, hey, go start here and then play around with this you know.

Mark Graham:

I mean, obviously people are drawn for their own passions, but things like well, knitting you know, plans for knitting is quite extensive, or my mom's going to love that.

Brian "Ponch" Rivera:

My mom's in the knitting cult.

Mark Graham:

Yeah, okay, well, you know you might lose her because she just may fall into that and never, never come out.

Brian "Ponch" Rivera:

Is that a separate, standalone thing, like the dead or the CIA?

Mark Graham:

Well, I mean it's their collection we organize. So first of all it's at a very high level. There's the kind of two sides. There's the archiveorg, which is a series of collections, and then there's the Wayback Machine, and the Wayback Machine is like a special viewer, if you will, for the material that we archive from the public web. But back onto the archiveorg side, I'd say another would be old-time radio and it's amazing. It's like a certain age groups of people and I'll meet someone and I'll go oh my God, you guys do the old-time radio. That's fantastic. I listen to it every day.

Brian "Ponch" Rivera:

How are those things preserved like so, in the old days, say like the 30s, they had a broadcast going out of radio city, say I hear in manhattan, like what they would archive on tape or on record. Or how would they save that think?

Mark Graham:

I think it mostly that was on tape and then people then digitized it and then uploaded it to us. We have someone on staff who's been focusing on ham radio, so shortwave radio, and it's very extensive and we're talking like the newsletters and the books and the reports and all of the ephemera related to that field. And this person, that's all they do day in and day out is collect and digitize and make available that particular and curate that particular material for that the people that are interested in that that genre.

Mark McGrath:

Hey, mark, I'm wondering if you could share with us some of the challenges in preserving data like migration or emulation, and the reason for this is, I'm curious, from a perspective of business owners and preserving the past for the future, because if we misremember the past, then we can't improve the future.

Mark Graham:

So what are the challenges? That, when people use our material, the mere act of them using the archive in all the different varieties of interfaces and emulations and media types etc. That we work in, we learn about what's working, maybe what could be working better, maybe what isn't working so well, and that gives us an opportunity to continuously improve and to be able to stay current with what's available to people today. And so a very specific example of that would be a media called Flash. Right, so you know, flash was a media that was very popular on the Web as we know it about 25 years ago, and it was used in a variety of ways. It could create like animated images, it could be used to stream a video, and many, many websites incorporated Flash. In fact, some whole Web services were built entirely on Flash, and for a variety of reasons, in much of this, driven by the late Steve Jobs, flash ended up being killed, being abandoned by Adobe and the creator, and then the browsers over time dropped support for Flash. So today, none of the main modern browsers will support the playback of a Flash object. None of the main modern browsers will support the playback of a flash object. So for some period of time, if there were flash objects on the Wayback Machine that we had archived, they would play back in browsers. But now, if you're running Chrome or Firefox or Safari or something like that, you're not going to be able to play back a flash object.

Mark Graham:

So what we did was we found software, open source software. What we did was we found software, open source software, an app called Ruffle, and you can think of this as like a middle layer or an emulation layer that sits between your browser and our backend archives that was archived in the Wayback Machine. Then this Ruffle software will take the underlying code and it will, in real time, transform it and emulate a new kind of code in JavaScript that your browser will understand, and so you'll see what it was. That you would have seen before if your browser had been supporting Flash.

Mark Graham:

Now is an evolution, it's not a, it's not a zero, and a one thing it says actually many, many features of flash is there was a very complex environment, and so we do not support all of the flash features. We don't support action script level three, for example, yet and um, but then again, two, three years ago we didn't support any of it, and now we support a lot of it. So there are millions and millions of images and animations and other interactions that we now support through the Wayback Machine.

Mark McGrath:

I wonder if you can help with an analogy here. So we have a 78, we have a record, we get this vinyl thing and we don't have a record player. Right, we don't have a way to play it, a means to play it. What Ruffle does is it allows you to play it, but it may not play all the features that are available on that vinyl, maybe. So this is important, I think. So what's happening is maybe the HTML or the scripts that are getting migrated over and over don't come with the way to decode them. That decoding secret is the Ruffle. It's another platform that kind of emulates how they were played in the past. And can you talk about the dangers of losing something like Ruffle? What would happen 15, 20 years down the road if we lost that capability?

Mark Graham:

Well, I mean one might lose the ability to interpret that material. But a couple of things. First of all, ruffle is open source, so it's not like some corporation is going to take away one's ability to run it anymore. So it's out there. Much of the software that we use at the Internet Archive is open source. We contribute back to the open source community by improving the code and in other ways. So there is that. But the other thing is just because you can't read something doesn't mean that the preservation of it isn't useful, because maybe sometime in the future you will be able to read something, right, I mean? I'll give an example.

Mark Graham:

The island of Rapa Nui, aka Easter Island, had a script and it's referred to as the Rongo Rongo script that was written onto these wooden boards. It's still debated to this day whether the script was created pre or post Europeans visiting the island, european visiting Europeans visiting the island. There's been some recent radiocarbon dating of these particular pieces of boards that suggest that some of them go back to the 1500s, so it was before the original 1722 of the European. In any case, to this day there's an ongoing debate in academia about how to interpret the Rongo-Rongo scripts, and there's only about, I think, 28 or so of these pieces of wood still in existence, and so is. So I'll just ask you just the fact, the fact that we can't yet definitively understand what these glyphs mean, right? Does that take away the value of retaining these pieces of wood so that we can study them and maybe someday figuring it out? Well, of course not we may get the.

Mark McGrath:

We may interpret it incorrectly in the future. That's what I I'm getting at. We don't know right, so that context that comes with it is important, and preserving context is important, right.

Mark Graham:

Preserving context is vitally important, and so the couple of words that really drive a lot of our work here one would be context, as you noted, and another would be provenance. Maybe it's a particular special case of context that has to do with the origin of something. So, for example, when we archive a URL, we record material like the time and the date that object was gotten, where it was gotten from, the IP number maybe of the machine that actually accessed it, the particular kind of software that was used when we made that archive, et cetera, and then we present that to people. There's a link on the top right of every Wayback Machine playback page that says about this capture, and if you click on that link you can go back in time and you can see every page element on that web page. Because, remember once again, web pages are made up of lots of individual little pieces of material image files, javascript, css, html, maybe a video object, et cetera. And so you know, as an archivist working for a library, we care a great deal about a provenance, such that we can provide that particular context to readers in now and into the future. Can you give us some context as to the rate that, I hate to say information, but part of it is a matter of definitions. We have a lot of physics, for example, high-energy physics materials, cern and other entities, or, say, telescopes, climate change, data from satellites, et cetera, to produce massive exabytes worth of material. That's just like, maybe, in some cases, raw data.

Mark Graham:

So, yeah, I don't know that I can really be an expert witness to that particular case, but I will say that, generally speaking, a couple of trends here, and one is that before the digital world we had mostly paper and, yeah, there was some television, there was some television, there was some radio, but paper has had been for a long time the main ways that people preserved knowledge and shared knowledge, and that was very useful for a variety of reasons. First of all, paper tends to last a long time and also people tend to produce multiple copies of the things, multiple copies of the books, and so they physically spread them around. They exist in most libraries, despite the fact that the history of libraries is one of destruction. In fact, I'm going to reference your audience to a book called the Library A Fragile History. It's a relatively recent book. It's a tour de force of the history of libraries, which is pretty much a story of libraries being destroyed by churches and by governments. I think now you might expand that to include corporations and lawyers. Actually, despite the fact that libraries tend to not necessarily live forever, in many, many cases, you'd have a lot of libraries and a lot of books. Generally speaking, the preservation of printed material is pretty good.

Mark Graham:

Now, when we moved over to the digital world and the advent of web servers and the like, things shifted a bit. Instead of having maybe 1,000 or 10,000 or more physical books, Instead of having maybe you know, a thousand or ten thousand or more physical books, you'd have maybe like one web server living on one machine in one building run by one company. And over the last few decades, many, many things have happened Companies have gone out of business, buildings have burned down, machines have failed and any one of a number of variety of threats that have caused the only copy of certain digital material to go away. So I'll reference, for example, there was a recent Pew Research study that looked at a collection of web pages from 2013. And this study was done in 2024. And it found that for that particular collection of web pages from 2013, and it found that for that particular collection of webpages from 2013,.

Mark Graham:

Fully 38% of them were no longer available on the live web a little more than a month into this new presidential administration, and we've been involved in a project at the Internet Archive since 2008, called the End of Term Archive, where we work in collaboration with a number of other organizations.

Mark Graham:

Historically, we've done this with the Library of Congress, the National Archives and Records Administration, us Government Pub, as we can. So, once again, we've been doing this since 2008, every four years, and so we started this project. We do it tend to do it now in three phases before the election, after the election and then after the inauguration. So we're now in phase three. We're now working on the after inauguration archive, where we had identified tens of thousands of US government websites, and we go deep on them and we attempt to get as much of the material from those websites archived as we can. Well, we've seen whole websites taken offline. For example, just a few days ago, the that particular part of the Department of Education website. We have archived and are now available in the Wayback machine, so you could type in a URL and you could see what was there just a week or two ago.

Mark McGrath:

So, mark, I'm curious who should be responsible for preserving websites. Is it the Internet Archive? Or I mean, what happens if you're not around.

Mark Graham:

I'm from California. We try to avoid the word should. Okay, okay, that's what my therapist told me. No, there are no shoulds. I know that's a great question, so don't mean to make light of it. Look it really. Well, I find it curious. I'll just say I find it curious that this little, relatively small nonprofit out here in California does punch above our weight, if you will, and has historically and to this day taken on such a disproportionately large role in this seemingly really important effort to try to preserve the published works of humankind and make them available. But be that as it may, here we are, be here now, right, and so that's what we're doing Now.

Mark Graham:

There are many other efforts. So, for example, in America there's the National Archives, the National Archives and Records Administration and the Library of Congress. They have efforts that they've been developing for a long time to archive much of this material. For a long time the Library of Congress actually contracted with us so they outsourced much of their web archiving to the Internet Archive, the Internet Archive. We're a nonprofit and we survive through the kindness of others donations by more than 150,000 individuals last year. But in addition we have a program-related business activity called Archive-It, and Archive-It provides our web archiving services to schools and colleges, universities, museums, libraries and governments. So many governments around the world contract with us to do web archiving on their behalf. I already mentioned, for example, the Library Innovation Lab at Harvard. They took on an effort to try to archive data sets from the US government and they archived something like 300,000 data sets. So it's a big community of people, individuals, volunteers, organizations like ours, in some cases governments, government organizations and others that take on this work.

Brian "Ponch" Rivera:

We're talking about paper to digital, and I imagine there's a lot of collections yet to be archived. Potch and I are intimately familiar with one that we want to get archived and I want to help you with that, by the way.

Mark Graham:

So yes, we're going to follow up on this.

Brian "Ponch" Rivera:

We'll talk offline on that because it's critical. It's really critical to the work. But on that note, I mean of what's been digitized. Have we even put a dent in the stuff that's been collected over hundreds, maybe a thousand years?

Mark Graham:

we put a dent, um, but there's a lot of work to be done by. By some accounts there's been more than 120 million books um produced so far. By the way, um and uh, we've digitized maybe somewhere close to eight million of them. Google has digitized um, many more than that, maybe a number north of 40 million. I've heard as high as 40 million, but we're maybe number two, not really sure. A lot of work to be done there. There's a particular collection that I just started working on. You guys might be interested in this one.

Mark Graham:

The Pentagon had ran a news clipping service for a long time called Current News, and Current News would come out twice a day once at 7.30 in the morning. It was about 10 pages long, and one at 11.30 in the morning called the Main Edition. It tended to be 20 pages or longer and it was physically produced in the Pentagon by people who would come in at their early hours. They'd start arriving at about 2 am and then trucks would start delivering newspapers from all over the country and they would go through these newspapers and magazines and find articles related to military issues. Find articles related to military issues, literally cut them with a knife and then paste them up on eight and a half by 11 inch pieces of cardboard or something and then photocopy them and then staple them together and distribute. I think they distributed 4,000 a day throughout the building.

Mark Graham:

I collected close to 1,000 of these that I have physically with me now that I've just begun the effort to digitize and this particular collection of material. Now maybe some of the individual stories, although I'm not sure about some of them because they're from the early 80s. Maybe some of the individual stories exist in archives from the newspapers they were originally printed at right. Maybe Many of them probably don't, because they never digitized those newspaper archives. However, what I guarantee you doesn't exist is the curation of this material in the form of these 10 or 20-page compilations that were published every day. And why is this important? Because this particular collection of curated material was what our nation's military leaders were in many cases reading on a daily basis, and so it was helping to inform their worldview, and that, just it, fascinates me. Yeah.

Brian "Ponch" Rivera:

Yeah, that's really interesting. What are the to your knowledge? What are the mother loads out there that need to be digitized, that have not yet, that are in critical maybe like critical state or in danger, that you know that?

Mark Graham:

Yeah, you know, like for like, for example, I know, I know that all the presidents that have their own libraries and things like that do those things get digitized automatically or not? Like, not, no, not not necessarily like. I could probably attempt to give you a good answer there, but I would have to say that at times my eyes have glazed over when I've reviewed catalogs of materials, yeah, that have yet to be digitized. And uh, uh, you know, there was, for example, just one we we recently become aware of a collection of material at I think it's at Wright-Patterson Air Force Base a very large collection of Soviet-era military books and journals, and I doubt that that material has ever been digitized, that the Soviets have not done it, and if they have, it's not available to the West, and so that is a particular collection of hundreds of thousands of individual items for example.

Mark Graham:

That's just one that comes to mind.

Mark McGrath:

I do want to talk about John Boyd's archives here in a second, but before we do that question about how are you prepping everything for AI artificial intelligence what are you doing to prep the landscape for that?

Mark Graham:

Yeah, well, you know, we I'd say we generally think that we should let the computers read I'll just speak for myself here now and I think you know that AI is like a bicycle for the mind and that we as humans and you know the animals that we share this world with we frankly need all the help we can get. We've got a, we got a bunch of big problems that that we're facing. I've just list four of them pandemic, either intentional, through you know um attack, biological attacks, maybe climate change, nuclear war and then the pollution of our information ecosystem with mis and dis, dis, disinformation, so and, and these, they, these interoperate with each other, especially the last one with the other three. So I think, generally speaking, we've got some big problems and we need all the help that we can get, and I think that AI holds promise to be able to help us with some of these problems, especially when you start adding things like quantum computing to the next, which seems to be potentially right around the corner New announcements out of Microsoft this last week but at the same time, at least within the United States, there are some significant constraints in the law, and so the ability for AI companies, ai technologies, maybe even open source AI projects to be able to access material and use material either training or some sort of a rag process on the back end is constrained by a lack of clarity in what is allowed.

Mark Graham:

That is currently playing out in the course. So, instead of having you know a clear understanding of what can be done and what can't be done, it's being left to the courts to try to sort this stuff out. And so many, many individuals and organizations are being cautious. We are being cautious at the Internet Archive material, some of which we haven't mentioned in this conversation. Certain collections of material that we consider by all accounts to be open and so that material that is open may be used, may be being used by some for a variety of applications, including AI training, but other material that is not generally open is no, not. We have a term of use. We remind people of that and we encourage people to use our library responsibly.

Mark McGrath:

Great, I appreciate that. So here's what I want to do. Mark and I are going to talk about why the archives John Boyd's archives are so important to us and kind of frame what's it like to go in there and why it's important that we have access to them and why we see the OODA loop as being different than the majority of folks that never been in there. So, Moose, when you and I go on there, we open a book and there's John Boyd's notes, marginalia all over the. You know all over everything. So we know the books that John Boyd read, but it's not enough to just download those books and have access to books that he read. It's more important to have access to the books that he physically wrote in right, and that's, I think, one of the more important things about having open access to John Boyd's work.

Mark McGrath:

Is that, Second Moose, what else I mean? We know it's open. We know that Mary Boyd has preserved this for everybody to have access to, but not everybody has access to it. You have to drive to Quantico, you have to get on base, you have to do these things to get in there. But any other thoughts as to frame a?

Brian "Ponch" Rivera:

picture for Mark.

Mark McGrath:

Graham.

Brian "Ponch" Rivera:

Yeah, I mean I tell you, mark, the take that we have on Boyd, which we know from his acolytes that worked with him directly, his daughter and others. I mean we're trying to advance and develop the authentic Boyd that was him, not the one that's commonly misunderstood and reduced to what most people know him for in there, and you're reading in his own writing, for example, his copy of Clausewitz on war, which is still a very definitive book for training officers in an advanced level. But you read the marginalia. You know John Boyd is absolutely tearing it to shreds, page by page by page by page, and to have all that marginalia saved in an archive. You know the book in the archive box. You know those are the sorts of things that people don't understand about Boyd and those are the things that kind of disconnected from his broader work on complexity.

Mark Graham:

We are here to help. Absolutely. Let's roll up our sleeves and just get this done. We worked with another figure, daniel Ellsberg A US Marine, I might point out. Indeed, he was, and Dan was a friend of the archive, and I consider Dan a friend. He had a profound impact on my early life, and also Brewster Kahle, our founder, and so we actually had someone physically working in Dan's home for the better part of a year digitizing much of his material. So, in that case, the Internet Archive or one of our friends actually, I think, maybe helped make that all possible, and we have a collection.

Mark Graham:

It's called archiveorg slash details, slash Daniel Ellsberg, and it has let me pull this up here. It has 1,621 items in it, and these items tend to be books and papers and articles either written by Dan or written about Dan, and I can tell you that Dan, of course, passed away fairly recently. Before he passed away, he gave me a call one day and he was like Mark, I'm trying to track down this copy of a particular book, but I think you had digitized, and it was in particular, because of the marginalia that was in the book that he was most interested in. So we have experience doing this. We can digitize books at scale depending upon the rights issues of the books, make them available to people. Different books have different rights considerations associated with them, and so we operate within the constraints of law.

Brian "Ponch" Rivera:

So if a government agency say like CIA, they're like hey, we're going to work with the archive and we're going to digitize the CIA reading room. I mean, does the CIA with the archive and we're going to digitize the CIA reading room? I mean, does the CIA pay the archive? Or how does that work? Not yet.

Mark Graham:

The CIA reading room is what I think a collection is called an archive. That work has mostly been done by, to the best of my knowledge, correct me if I'm wrong here, guys, but the National Security Archive out of Georgetown University, I think, is one. Is it Georgetown or George Washington? So the National Security Archive is one and they've done heavy lifting on this. But they file FOIA requests, et cetera. They file FOIA requests, et cetera. But there's also, as an individual, there's a couple of individuals who have just made it their life's passion to work for years and years and years acquiring materials, digitizing them and uploading them to the Intran Archive. So hundreds and hundreds of thousands of documents, in fact, maybe even millions of documents.

Brian "Ponch" Rivera:

Does it work like when an archive gets sealed? For example, I went to Marquette University and one of our more notorious alums, both undergrad and law school, is Joe McCarthy, and I had a professor at Marquette that wanted to go and do research on McCarthy's transcripts from college. What classes did he take and what were his grades? Research on mccarthy's, you know, like his transcripts from college, like what, what classes did he take and what were his grades? And like the, the joe mccarthy archive is sealed till, like you know, 2150 or whatever it is, I don't even know. But like, how did, how did the? What are the usual protocols for things like that? Are they waiting for people to die? Are they waiting for?

Mark Graham:

uh, you know, like, I think I think, generally speaking and this is actually one of the principles that we follow at the Internet Archive, which is that people who are rights holders can choose. So if, for example, if there's maybe some material that's been archived in the Wayback Machine, maybe somebody's blog say, for example, from 10 years ago, and for whatever reason they say, I don't want that in the Wayback Machine. I deleted my blog a long time ago and I'd like you to take it out of the Wayback Machine. So we first check to make sure that they are who they say they are, take some reasonable effort. There's some appropriate due diligence, and then I generally honor those requests. So I'd say it's really about the rights holders, anything else and and try to do the right thing by people as well.

Brian "Ponch" Rivera:

I can't really speak to that particular mccarthy archive per se no, it's just fascinating because it did sound like it was in, like it did belong to, like his wife or his estate or something like that.

Brian "Ponch" Rivera:

Yeah, and then maybe they had then, but I guess, a lot of colleges though, because you mentioned georgetown, yeah I feel like marquette was one of these that are sanctioned by the national archives to have those sorts of facilities that could be sealed and not opened um to secure those kind of documents yeah, there's also a program at columbia university that dealing deals with declassified documents in particular, quite a quite a significant program.

Mark Graham:

You you might want to look into that.

Brian "Ponch" Rivera:

Are there ever archives that you wanted, that you sued for, or like to put it like a FOIA request, or?

Mark Graham:

that kind of stuff. I mean, I've done FOIA requests and we certainly have people that do FOIA requests, that upload material from those results to the interim archive, but no, that's not really something that we spend a lot of time on.

Brian "Ponch" Rivera:

Yeah, just curious, yeah Interesting. I mean it's like the Wayback Machine is like a time machine, it is.

Mark Graham:

Absolutely. It's a time machine for the web, you know, because the web has no version control system and because we've as I've already, you know, shared the web is fairly fragile and it's not necessarily getting any better right With the rise of platforms. In many cases, you have power consolidated down into a small number of very large organizations that can change their policies on a dime, or, as we've seen most recently with the US government. Now USAID's entire website was taken offline, and not just UID's website, but and I don't have the exact numbers, but I think there were something like 29 different channels on YouTube managed by the USAID, and all of those videos were taken off of YouTube.

Brian "Ponch" Rivera:

Wow, wow.

Mark Graham:

So the internet never forgets.

Brian "Ponch" Rivera:

Sometimes it does.

Mark Graham:

Well, sometimes it does, but I think we have preserved most of those videos in the wayback machine just I mean, what technical question do do ai we could punch it out ais do?

Brian "Ponch" Rivera:

are there like api, abilities that ais can look into? Uh the archive or or no.

Mark Graham:

once again, the internet archive makes materials available for a variety of different audiences, and certain open material may be available and used by some of the AI companies. That's really orthogonal to the question of how it's done. And there are a variety of interfaces to the various services of the Internet Archive, including web-based iOS and Android apps, browser extensions and, of course, apis Fascinating.

Brian "Ponch" Rivera:

Well, we want to respect your time. We know we're coming up on your hard stop and we'd love to have you back. We want to take some time to ask a few questions. Offline Sounds good, mark. We really appreciate you coming on the show Been a pleasure. And sharing us how our digital orientation is expanding and helping us sense, make and make better decisions Right on. So all right, thanks. One day at a time, all right, that's it. Thanks, mark.

Podcasts we love

Check out these other fine podcasts recommended by us, not an algorithm.

Shawn Ryan Show Artwork

Shawn Ryan Show

Shawn Ryan
Huberman Lab Artwork

Huberman Lab

Scicomm Media
Acta Non Verba Artwork

Acta Non Verba

Marcus Aurelius Anderson
No Bell Artwork

No Bell

Sam Alaimo and Rob Huberty | ZeroEyes
The Art of Manliness Artwork

The Art of Manliness

The Art of Manliness
MAX Afterburner Artwork

MAX Afterburner

Matthew 'Whiz" Buckley