How AWS S3 is built

pragmaticengineer 5vL6aCvgQXU Watch on YouTube Published January 20, 2026
Scored
Duration
1:18:14
Views
191,114
Likes
1,333

Scores

Composite
0.51
Freshness
0.14
Quality
0.71
Relevance
1.00
14,568 words Language: en Auto-generated

AWS S3 is the world's largest cloud storage service, but just how big is it and how is it engineered to be as reliable as it is at such a massive scale? Milon is the VP of data and analytics at AWS and has been running S3 for 13 years. Today we discuss the sheer scale of S3 in the data stored and the number of servers it runs on. How seemingly overnight AWS went from an eventually consistent data store to a strongly consistent one and the massive injury and complexity behind this move. What is correlated failure, crash consistency, and failure allowances, and why engineers on S3 live and breathe these concepts, the importance of formal methods to ensure correctness at S3 scale, and many more. A lot of these topics are ones that AWS engineing rarely talks about in public. I hope you enjoy these rare details shared. If you're interested in how one of the largest systems in the world is built and keeps evolving, this episode is for you. This episode is presented by Statsig, the Unified platform for flags, analytics, experiments, and more. Check out the show notes to learn more about them and our other season sponsors. So, Milan, welcome to the podcast. >> Thanks for having me. >> To kick things off, can you tell me the scale of S3 today? >> Well, if you want to take a step back and just think about S3, it is a place where you put an incredible amount of data. And so, right now, S3 holds over 500 trillion objects. We have hundreds of exabytes of data. And we serve hundreds of millions of transactions per second worldwide. And if you want another fun stat, we process over a quadrillion requests every single year. And what's under the hood of all that is also pretty amazing scale. If you think about, you know, what's underneath the hood of S3 at the fundamentally were we're discs and servers which sit in racks and those sit in buildings. And if you try to think about all of the scale of what is under the hood, we manage tens of millions of hard drives across millions of servers. And that is in 120 availability zones across 38 regions, which is pretty amazing if you think about it. >> So deep down it it all starts with hard drives sitting inside servers, sitting inside racks, and then you have a bunch of these racks and then rows of them, buildings of them, right? And that's what you said. So there's tens of millions of hard drives deep down in in in the bottom of this. >> That's right. In fact, if you think about the scale of this, if you imagine stacking all of our drives one on top of another, it would go all the way to the International Space Station and just about back. And so like that, I mean, it's kind of a fun visual to have for us who work on the service, but you know, kind of fundamentally, it's it's really hard to get your brain around the scale of S3. And so a lot of our customers they they don't they they assume the scale is there. They assume that you know all of the drives are always there and they just focus on what S3 is to them which is it just works. It just works for any type of data and all of your data. >> Yeah. Even I mean even for me for the scale when you talk about exabytes I actually had to look up exabytes because I know of pabytes which is already massive. If if a company has like one or two or three pabytes of data it's it's tons. And exabyte is it is is it a yes it's a thousand pabytes is an exabyte and and you told me that you're you're you're thinking in that level. It's just hard hard to hard to fathom. >> Yeah, we I mean we have individual customers that have exabytes of data. Individual customers who have exabytes of data and what they call a data lake. Although last week I heard a great term. We had the um Sony group CEO talk about what Sony is doing with data and they refer to it as a data ocean and not a data lake but a data ocean and so like if you have exabytes of data in your data lake it is in fact a data ocean and that ocean is is kind of fundamentally S3. >> Can you tell me how S3 started? I I did some research and there was a story about a distinguished engineer sitting in a pub in Seattle. who knows it was true or not but I read that this this was a story that he was a bit frustrated with engineers at Amazon building a lot of infrastructure again and again. >> Yeah. If you think back into you know S3 development really started in 2005 and we launched as the first AWS service in 2006 and if you think about the technical problems of 2006 you know a lot of customers were building things like like e-commerce websites right like Amazon.com and so the engineers at Amazon knew that they had a lot of data that at the time was very unstructured data it was PDFs it was images it was backups and they needed wanted a place where they could store that at an economic price point that let them not think about the growth of storage. And so they built S3 and they really built it for a certain type of storage. And so the original design of S3 in 2006 was really anchored around eventual consistency. And the idea of eventual consistency is that when you put data in storage for S3, you know, we're not going to give you an act back on your put unless we actually have your data. So, we have your data, but the eventual consistency part is that if you were to list your data, it might not show up because it's being eventually consistent. It's there, but it might not show up on a list. And so, we did that at the time that consistency model at the time, uh, we built that because, you know, we were really optimizing for things like durability and availability. And it worked like a champ for, you know, e-commerce sites and things like that because, you know, when a human was interacting with an e-commerce site and an image happened to not show up exactly at the moment where you put the data into storage, it was okay because a human would just refresh. And so when we launched in 2006, here's a a fun fact for you. 2006 is actually when Apache Hadoop first began as a community as well. And so we had a set of what I think of as frontier data customers like Netflix and uh Pinterest who took a look at things like Hadoop and they put it together with the economics and the attributes of S3 which is you know unlimited storage with pretty good performance at a great price point. And they um they decided to build their you know what we first began to call data lakes at the time. they decided to build to extend the idea of unstructured storage and include things like tabular data. And so the first wave of frontier data customers were adopting quote unquote data links um in about 2013 to 2015. Those were the frontier data customers born in the cloud. And around 2015 to I would say 2020, we started to see all the enterprises take that same data pattern of how can I use S3 the home of all the unstructured data you know on the planet and extend it to tabular data and that's when about five years ago 2020 I started to see a ton of exabytes of you know basically parquet files and you know I I have worked on S3 for a minute I started working on S3 in 20 I guess it was 2013. I'd been at AWS since 2010, so kind of a while. And the rise of parquet was really interesting because what people did is they said, "Oh, okay. I like the traits and the attributes of S3 and I want to apply it to a table." And so I am going to run my own parquet data in S3. And then you know around 20 I would say 19 2020 we started to see basically the rise of iceberg and iceberg at the time you know is incredibly popular and it gives the table attributes to the underlying parquet data and customers started to do it in you know many of my largest data links across different industries and different customers and so one of the things that we did in 2024 is we introduced S3 tables >> just for those who don't know what iceberg is. So, it's it's an open- source data format for like massive analytic workflows. Right. >> That's right. If I ask our customers of these data oceans why they care so much about iceberg, it's because they want to be able to have what a lot of customers are calling this decentralized analytics architecture where, you know, they can have lines of businesses or different teams within their company that pick what type of analytics to use as long as it's Icebro compliant. And so if iceberg is the common metaphor for data for tabular data then you have choice you have flexibility and choice for what type of analytics engines you use in a decentralized analytics architecture and so I think that's one of the reasons why iceberg has just taken off is that it makes it easy to use data at scale but it also gives a business owner this you know the chief data officers or the CTOs of the world it gives them future proofing for analytics they can replace their analytics they can change it out. They can adopt new types of analytics and AI because you have this iceberg at the bottom turtle of S3. We lost S3 tables in 2000 um in December 2024. This year we've had over 15 um new features that we've added to S3 tables. Um and uh and then this year of course we launched the preview of S3 vectors in July and then last week we were generally available and so you know the story of S3 it's like a story that our customers have written for data but it's been super fun to work on all these different evolving attributes >> as an engineer. What is the kind of basic architecture and the basic terminology I should know about when I'm starting to work with S3? When we first launched in 2006, the whole goal for SRE is to provide a very simple developer experience and we've really tried to stick with that. In fact, when the engineers and you know when we're sitting around and we're talking about what do we build next, we always go back to that idea of how do you make things really simple to use to use S3. And so fundamentally S3 we have a lot of different capabilities now, but it's really about the put and the get. the put of the storage in and the get of the storage out and where we can do that really well at scale that that is kind of the heart of S3. Now we have a ton of extra capabilities that we've launched over time but you know fundamentally when customers think about using S3 they think about the put and the get. >> Yeah. So like put data get data and I guess some of the other like operations it's a bit like HTTP right? There's also delete, list, copy, a few kind of other like I guess primitives >> there is and you know if I think about where we have gone over time we've added capabilities on top of that just based on what developers are trying to do. Okay, let's just take put. Okay, um we recently added a set of conditionals to the put capability and like last year we did put if absent or put if match. Um this year we did a copy if absent or a put if match and we did delete if match. And the the core thing about for for us with conditionals is that we can give developers the capabilities of doing things like the put but to do it based on the behaviors of their application. >> Outside of the the get and put the basic operations I guess the base terminology that you should just know about is the buckets, objects and keys, right? That's how we think about our data. >> Yeah. And now it's not just objects. If you think about um the two latest um primitives or building blocks we've introduced as as native to S3, one of them is the iceberg table with our S3 tables and the other one is is vectors. And you know under the hood of an S3 table is a set of parquet files that we're managing on your behalf. But that's not the case for vectors. A vector is just basically a long string of numbers. And that is a new data structure for us and it's sitting um in S3 just like your objects. >> Milan was talking about the building blocks of S3 like the put, get, tables and vectors. Speaking of primitives for building applications leads nicely to our season sponsor, work OS. Work OS is a set of primitives to make your application enterprise ready. Primitives like single sign on authentication, directory sync, MCP authentication and many others. One feature does not make an app enterprise ready. Rather, it's the combination of primitives altogether that solves enterprise needs. When your product grows in scale, you can always reach for new building blocks for infrastructure from places like AWS or similar. Similarly, when you need to go up market and sell to larger enterprises, work provides the application level building blocks that you need for this. Work has seen the edge cases, the enterprise complexity and solves this for you so you can focus on your core product. One example of such a building block is adding authentication to your MCP server. This is a typical screen when you're about to authenticate with an MCP server. If you would have to build it from scratch, it gets pretty complex to set up the ooth flows behind the scenes. But with work, it's a few simple steps. Add the AltKit component to your project, configure it via the UI, then you just direct clients of your MTP server to authorize via AltKit, verify the response you get via some code, and that's pretty much it. This is the power of well-built primitives. To learn more, head to works.com. And with this, let's get back to S3 and how it all started. So I I I'd like to still go back to the beginning of of S3. When it was launched, it was pretty shocking for the broader community because S3 launched with a pricing of 15 cents per gigabyte per month, which was about a third to fifth cheaper than anything else. The going rate at the time was something like 50 cents or 75 cents. And on the first day, I I read that like 12,000 developers signed up immediately. A lot of companies immediately or very quickly moved over and then the surprising thing was that S3 kept cutting prices. It was unheard of before. You were there in the 2010s when some large price cuts happen. Can you tell me what was the thinking inside the S3 team on the this unusual pricing it seemed customers would have been willing to pay more and also the the cutting of prices continuously even today? I think today it's something like 2 cents or 2.3 cents, something like that for the same storage as as as it was 15 cents on launch. >> Yeah. You know, I think part of this goes back to what the goal is for S3. Okay. And so the mission of S3 is to provide the best storage service on the planet. Okay. And our goal too is that if you think about the growth of data, IDC says that data is growing at a rate of 27% year-over-year. But I have to tell you, we have so many customers that are growing so much faster than that. >> Yeah, I was about to say it sounds pretty low. >> I know like that. But that's an average across everything. We have a lot of customers that grow twice or three times that that rate. But if you think about that, okay, you think about all the data that's being generated from sensors, from applications, from, you know, AI, from all these different >> from just taking photos. I mean, every day, right? >> Photos. That's right. like you know and you know if you think about your phone too think about the resolution and how the resolution of the cameras on their phone have grown you just have this like kind of what Sony talked about with the data ocean. Okay. And in order to have all that data and to grow it you have to be able to grow it economically. You have to be able to grow it at a price point where you don't really think okay what data am I going to delete now because I'm running out of space. You don't have that conversation with S3 customers because of of two things. One is, you know, we do lower the price of either storage or the capabilities of what we're doing. Like for example, we lowered the cost of compaction for S3 tables pretty dramatically within a year after launching S3 tables. It's not just that it's like the overall total cost of ownership of your storage. We give you the ability to tier and to archive, right? Storage. We give you the ability to do something called intelligent taring, which is if you don't touch your data for a month, we'll give you an automatic discount on that data because we're watching your storage and you don't touch it for much, we'll give you up to 40% discount on that storage. And it's like dynamic discounting so you don't even have to think about it. And so our whole goal is that you can grow the data that you need to grow because we know that's being used to pre-train models. We know it's being used to fine-tune and do any type of post-raining of AI. We know you're using it for analytics. We know you're using it for all these different things either now and in the future. And so our goal is so that you can keep your data and you can use it in a way that advances whatever the thing is that you're doing, whether it's life sciences or you're an enterprise, you know, in in manufacturing, right? whatever you need the data should be there and you should be able to grow it and keep it and use it any way you want. >> Yeah, I I did want to ask you about this part. So there's intelligent taring which was launched in 2018. So like 12 years after S3 was launched. One thing that really got my attention Amazon Glacier which is was launched in 2012. So a long time ago and it's you can store data that you don't need immediate access to. You're okay waiting for some time to uh to get access to it. I think maybe even hours. when it launched it was only one cent per gigabyte per month which was again this was something back then the going rate for storage was like 15 cents so almost like almost 10 times cheaper. How do you do that? Like what what is the architecture and thinking behind how you're able to have this trade-off of like look if you don't need your data quickly we can do it a lot cheaper. How h how could I imagine the kind of trade-offs that that you and the engineering team were were were thinking of making? Well, you know, I mean, a as you know, I mean, you're an engineer yourself and you know, as you know, a lot of engineering is about constraints, right? And that is the fun part about working on on S3 is that when you think about constraints, you think about constraints that we have for availability, you consider you think about constraints that we have around, you know, the cost of storage, we start to get really really creative. Okay? And in S3, because you know we build all the way down to the metal of S, you know, of the drives and the capabilities that we have in our hardware, we're able to drive, you know, efficiencies at every single part of our stack. Okay? And so our engineers when they get together and they and they talk about the constraints, they talk about the design goals, we'll do something like we'll set a a a target for you know the cost of a bite and we'll drive for that and we'll drive for it at every single part of the process. And the part of the process that we are also including is is you know it includes a data center. How do our data center tech uh technicians be able to operate the the service of S3 from a hardware and a data center perspective like the physical buildings just like we do the same thing for the software and the layers of S3 itself and when you have that when you have that ability to to run across the whole stack all the way down to the physical buildings and we're thinking about so deeply about the cost and the lifetime of every bite it you're able to do things like like Glacier. You mentioned something really interesting that when S3 started it was eventually consistent which means that you know data eventually arrives it it might not be there and you might be behind and there's a lot of things that you can do with this and and it gives you some constraints but you mentioned that the reason that the team launched this because durability and availability was more important and I I assume of course cost as well but during those initial phases while S3 was eventually consistent what what kind of benefits does it give to have eventual consistency? Is it a cost uh constraint? Is it just easier to do high availability systems from from an engineering perspective? >> Well, I mean from an engineering perspective, the the main optimization was it was availability. It was not necessarily durability, but it was availability. Okay. So, if you take a step back and and um look at the original design of S3, we were really focused very hard on availability. So, so let's take a step back. Okay. So when you talk about consistency, it's the property where the object retrieval, the object get reflects the most recent put to that same object. Okay? And so if you think about, you know, what parts of the system of S3 that really hits, a lot of it just kind of starts with our indexing subsystem. So if you think about the indexing subsystem in S3, that holds all of your object metadata. And so that's like its name, its tags, its creation time, and the index, our index is accessed on every single get or put or list or head or delete, any API call like that. And so um every single data plane request where you go back into our storage system to go get an object goes through our index. And if you think about it, more requests go through our index in our storage system because for example, it's serving thing like head requests and list requests that don't actually end up going back into our storage system at all. That's those are, you know, metadata or index requests. So, you know, if you think about our indexing system, we have a um a storage system in there. Okay? And that is a really central concept, a storage system in the middle of our indexing system. So you need you need a storage system for your index in index system, right? >> That's right. And so um we have to configure and size the system to deliver on our you know our design promise for our for both availability and and durability. Okay. And so the data is basically in our in our in our index system is stored across a set of replicas and it uses something called you know it's it's basically a quorum based algorithm. Okay. And a quorum based algorithm tends to be very forgiving to to failures. And so if you think about how we implemented quorum in our index system, we start first from servers that are running in these separate availability zones. And the reason we do that is that it a it lets us avoid correlation on a single fault domain. Okay. And since the failure of like a single disk, a server, a rack, a zone, it only affects a subset of data, it never affects all of the data for a single object or even a majority of the data for a single object which we have sharded across you know a wide um spread of um of servers. So like this this core of availability for us is this idea that we spread everything. And so when a read comes in, it's coming into the S3 front end and we just heavily cache objects across our systems. When a read comes in, >> it could route at random and you could create a situation where you're creating an an inconsistent read. >> And so when we have quorum at the index storage layer, we can see reads and writes overlap, but in in the cache, they don't because we're optimizing for availability. So, so, so ju just so I understand the the first part the eventual consistency correct me if I'm wrong that you can just you know write to all these distributed nodes and you ask one of them and if it doesn't have it no problem because it will be eventually consistent you now have high availability because you don't need to worry about all of them being in the same correct and that's >> phase one of AWS and it was it it gives you availability and now you're now explaining how you're able to behind the scenes turn this a strongly consistent the strong consistency means that it's guaranteed to have the the whole systems state which is hard to do because you could have distributed failures etc >> and this replicated journal you know it took us a while to build I won't lie we don't talk about this stuff very very much okay because this is kind of the the secret sauce of S3 um but you know again like our engineers who are in the room they were thinking about how do you deliver on both the strong consistency without compromising availability. So I go back to constraints. Okay. So in in that case we were not trading off the um consistency and availability anymore. And so the engineers had to come up with a new data structure. Basically we do this in S3. Uh vectors basically is a new data structure that we came up with as well. But you know if you think about what we had to invent for strong consistency at S3 scale without relaxing the constraint of availability is we had to build this replicated journal. Okay. And the replicated journal is basically a distributed data structure where we're chaining nodes together so that when this write is coming into the system it's flowing through the nodes sequentially. Okay. And so a reader write in a strongly consistent system for S3, it flows through these storage nodes in the journal sequentially. And so every node is forwarding to the next node. And when the storage roads get written to, they learn the sequence number of the value along with the value itself. And therefore on a subsequent read like through our our cache, the sequence number can be retrieved and stored. And so now you have this strongly consistent and highly available capability in S3. And the heart of that is actually this replicated journey. >> Okay. But what's what's the catch on one on one end because there's always always something with trade-offs. You always have something. So on one end you obviously have more complicated business logic. And then I guess the second obvious question is what about failures? Because in the case of eventual consistency, you don't worry too much about one failure. Clearly in this case uh what if a node in the sequence fails either at the time at the first time or or later or how does the system monitor this recover because that that's I guess that's going to be the tricky part right >> there's another piece to this puzzle that we implemented which is um you know it's it's basically a cash coherency protocol and the idea is that um this is where we built what we think of as a failure allowance where in in this mode, we uh needed to retain the property that like multiple servers can receive requests and some are allowed to fail. And so it's kind of this combination of this replicated journal as a as a new data structure plus we implemented this new cache coherency protocol that gave us a failure allowance and those two things working in concert gave us this uh strong consistency. I will say too this does come at some um actual cost. I was about to say like you you nothing is free on engineering, right? >> There's hardware cost in this because you can imagine we're we're we we've done some more engineering behind the scenes, but I I remember um sitting in the room with our engineers on S3 and we did a debate on this. We we debated it. We said, you know, there's costs there's like actual costs to the underlying hardware for this and do we pass it along to customers or not? And we made that explicit decision not to. We said >> really? >> Yeah. We said that when we launch this, we should launch strong consistency. We should make it free of charge to customers and it should just work for any request that comes into S3. We shouldn't sort of say it's only available on this bucket type or what have you. This should be true for every request made to S3. And part of that mindset for S3 is like how can we provide these type of capabilities and how can we make it something that becomes a building block like part of the building block of S3 and you shouldn't have to think about the cost of it. This was the very surprising thing of this launch by the way that suddenly AWS said like okay everything is strong existent it does not cost you more latency wise your latencies have shouldn't have changed significantly I mean I'm sure when you roll out initially you do your measurements etc but but that was the promise and that was why I I couldn't really believe it when I I I reread history because it typically doesn't happen typically strong consistency does add latency or it increases cost if it doesn't have latency. There's always these trade-offs. And I mean, sounds like you either swallowed the cost or or cost caught up, but it's it's very unusual. So, >> if I think about that, one of the things that was also very important for us, and we haven't really talked about this as much, but it's it's we think about it a lot on the S3 team is correctness. Okay? So, it's one thing to say that you're strongly consistent on every request. It's another thing to know it. And so when we built this strong consistency, you know, I I I talked about our new caching protocol. I talked about this replicated journal as a new data structure. You know, that took a little bit of time to to do and to get right. But at S3 scale, we could not say that we were strongly consistent unless we actually knew we were strongly consistent. Okay. And so what does that mean? How do you do that at S3 scale when everybody is using it for every last workload? In fact, one of the reasons why people use it is because our scale is such that we're decorrelating workloads and you can run absolutely anything on S3. But how do you know? Milon just talked about how strong consistency made it so much easier to trust S3. Trust is something that is just as important when writing code, especially when with AI we write more code than before. And this is a good time to talk about our season sponsor Sonar. What is the impact that AI is having on developers? Let's look at some data. A new report from Sonar, the state of developer server report, found that 82% of developers believe they can code faster with AI. But here's what's interesting. In this same survey, 96% of developers said they do not highly trust the accuracy of AI code. This checks out for me as well. While I write code faster with AI agents, I don't exactly trust the code it produces. This really becomes a problem at the code review stage where all this AI generated code must be rigorously verified for security, reliability, and maintainability. Sonar Cube is precisely built to solve this code verification issue. Sonar has been a leader in the automated code analysis business for over 17 years, analyzing 750 billion lines of code daily. That's over 8 million lines of code per second. I actually first came across Sonar 13 years ago in 2013 when I was working at Microsoft and a bunch of teams already use Sonar Cube to improve the quality of their code. I've been a fan since. Sonar provides an essential and independent verification layer. It's automated guardrail that analyzes all code whether it's developer or AI generated, ensuring it meets your quality and security standards before it ever reaches production. To get started for free, head to sonarsource.com/pragmatic. And with this, let's get back to the importance of strong consistency at AWS. >> How do you know that you're strongly consistent? And that is why we used automated reasoning. >> What is automated reasoning for for those of us who are not as familiar with this, which will be most people outside of very few domains like S3. >> Yeah, it's I mean S3 uses automated reasoning all over the place. Okay. And automated reasoning is a specialized form of computer science. Okay. And girly, if you if you kind of think about if computer science and math got married and had kids, right, it would be automated research. It's >> is it formal methods or based on formal methods? >> That's exactly. >> Oh, yeah. I mean, I I studied computer science. So, yeah, that that's fun. >> So, it's actually proper formal methods that you're using. >> That is right. And we use formal methods in many different places in S3. But one of the first places that we adopted was for us to feel good that we actually had delivered strong consistency across every request. So what we did is we proofed it, right? We basically built a proof for it and then we incorporated our proof on check-ins into this index area that I talked about, right? Where you have your caching and then you have your storage sub layers of the index capabilities. And so when somebody anybody is working on our index subsystem now and they're checking in code into the code paths that that are being used um for uh consistency we are proofing through formal methods that we haven't regressed our consistency model >> and can you just give us a rough idea because the formal methods that I have have studied they were pretty abstract the things like designing languages how to have like the different operators and of course there are maths involved as well. But what are are they like primitives like servers, network etc and models being built, data flows like how how can I imagine a a a simple proof of of something >> inside S3 roughly at a really high level. >> Yeah. I mean if you if you go back to the fundamental notion of a proof, you are proving something to be correct. Okay. And so the places that we use these proofs, we use them in consistency where we built a proof across all the different combinatorics to make sure that the consistency model is correct. We use it in cross region replication to prove that a replication of data from one region to another arrived and we use it in different places within S3 to prove the correctness of API. In all of these cases, you know, we talk about durability, we talk about availability, we talk about cost, but just as strong of a a principle, a design principle for us across S3 is correctness. It's a correctness of uh, you know, a thing, an API request, you know, an operation as it were. And the key thing for us too is that you don't you don't want to just proof it once. You want to proof it on every single check-in and you want to proof it on every single request so you can verify you can validate and verify that um you are doing in fact what you say you do and I think for us you know at a certain scale math has to save you right because at a certain scale you can't do all the combinatorics of every single edge case but math math can save you and help you on this uh at S3 scale and so we use we use formal methods in many different places of estry. We have some research papers too. I can send you some links to some research papers where you talk about >> Yeah, please please do and we will and we will put it in the the show notes below so anyone can check it out because I think it's it's really interesting. I I feel formal methods are not really a thing in a lot of startups and even infrastructure startups yet but it it sounds very reassuring to me to actually have an ongoing proof of that. And speaking of which, I I I want to ask about one thing that is related to this durability. Uh Amazon S3 has very very like high durability promises. I think it's 11 9 which I I had to like do a double check on because in in uh backend systems whenever you say three nines it's like h when you say four 9ines of availability we're not talking dur availability four 9s is already hard to achieve and beyond that it just gets very expensive and I have never heard of 11 9 of durability now this is durability and not availability one question that I I got when I when I I shared this stat uh publicly what people one thing people were asking and I was also thinking How can you prove that not just in a formal way but you're now storing as you said 500 trillion uh objects which is now large enough that just by this durability promise you should be you might be losing some of them do you actually like validate it on the actual data as well on outside of the proof because I assume in the proof you will have assumptions on hardware failure rate which might or might not be not be true. So my my question is that at Amazon S3 level when when you you are able to look at the are we living up to for example our durability promise how do you go about that and and what are your findings? >> Yeah. So we just spent a lot of time talking about our index subsystem uh because that is the subsystem that is related to consistency but when you think about durability I mean you think about it all you know at different levels of um the S3 stack but we really think about it in the storage layer. And so if you think about it in the storage layer, you have this design, this promise of you know the design here and underneath that is a combination of things. It's software but it's also the physical layout of where our data is across everything that we have in S3. And you know one of the things that I talked about is that we have you know disks and servers which sit in racks which sit in buildings and we have tens of millions of these hard drives. We have millions of servers and we have 120 availability zones across 38 regions. >> Yeah. And one availability zone is like two availability zones are two physically separate locations just to physically separate and sometimes they're a ways away from each other and in some of our regions we have more than three availability zones gives us a different domain a fault domain. If I were to think about durability I think the most important thing for us is our auditors. So if you think about a distributed system, we talked about the put and the get. We have many many many microservices that are all doing one or two things very well in the background. Okay? And so we have many different varieties of health checks, but we also have um repair systems and we have auditor systems. And our auditor systems go and they inspect every single bite across our whole fleet. And if there are signs that there is repair needed, you know, another repair system will come in place. And these are all, you know, in the in the world of distributed systems, these are all microservices working together, loosely correlated, but communicating through well-known interfaces. And so that, you know, collection of systems, which are over 200 microservices now, that all sit behind one S3 regional endpoint. And a fair number of those subsystems, those microservices are all dedicated to the notion of durability. >> So, so they will go and check and log and report back. So, do I understand correctly that in any given time frame at S3 someone or some people or or some systems can actually answer the question of what is our durability the past week, month, >> year and so on. >> Yes. >> Okay. Great. So, so you can actually verify your your durability promise that check if the math is mathing. >> Yes. And you know, part of our design is that at any given moment in this conversation that you and I have had just just today, we're we're having servers fail because servers fail. And so what we are building and what we've built in S3 is an assumption that servers fail. And so a lot of our systems are always, you know, they're first of all, they're they're they're checking to see, you know, where any failure might hit an individual node. How does it affect a certain bite? What repair needs to automatically kick in place? And so this system is constantly moving behind the scenes, if you will, while and that is a completely separate thing from the get and the put. The get and the put is what the customer sees. There's this whole universe under the hood of how do we manage the business of bytes at scale. >> I'm just thinking because for a lot of us engineers who are building like moderately sized systems I'll say compared compared to S3 they can already be big but a failure is is a big deal like you know like a a machine going down again. I have a small side project and my storage filled up and I started to give errors and this is a big deal because it rarely happens to me. This is the first time it happened in 3 years. >> Yeah. >> But I understand in your business or or when you work at S3 scale, this is just every day. And and the question is not when, it's just how often, how do you deal with it? I guess it's a different world. >> It is a different world. And the the trick is to really think about correlated failure. Okay. So, if you're thinking about availability at any scale, it's the correlated failure that'll get you. And >> and what is a correlated failure? >> Okay. So that's super interesting. So if you think about what I talked about with, you know, eventual consistency, we talked about quorum. Okay? And quorum is okay for one node to fail, but if all of the nodes go south, for example, and they're in the same availability zone or on the same rack, then you're really going to be messing with your availability of the underlying storage, okay? You've just lost your failure allowance that I talked about with the cache because they all fail together. And so like a correlated failure is an incredibly important thing to think about when you're thinking about availability. And so when we're designing around correlated failures, the thing is that we have to think about is like do we expose or how are those workloads exposed to different low uh levels of failure. So when you upload an object to S3 with a put, we replicate that object. Okay? We don't just store one copy of it. We store it many times. And that replication is important. It's important for durability. But what's interesting about it, it's also important for availability because if any of those correlated failure domains fail, like if a whole a fails, there's still a copy somewhere else and the data is still available somewhere even though an availability zone has failed or a rack has failed or a server has failed or so forth. Okay. And so that idea of how do you manage and design around correlated failures with both our physical instit infrastructure is super important for S3 for both availability and durability. Uh we also do things like we think about something called crash consistency. I mean Gregly you you can tell I can go on and on about this so you just have to stop me. >> No but but this is the this is the interesting stuff. >> All right. So the whole idea of crash consistency is that a system any system that you build it should always return to a consistent state after a a fail stop failure. And if you can like do things like reason about the set of states that a system can reach in the presence of failure and you just always assume the presence of failure. Then you also assume the presence of consistency and availability. then you just design all of these different microservices to all work together in an underlying um uh capability like S3. But that's what our engineers do. They think about like crash consistency. They think about correlated failures, you know, they think about failure allowances and caches, right? And it's it's all that deep distributed system work that um that our engineers come in every day to work on. Can can we talk about how you think about failure allowances because again there there is a concept of error budgets outside in other companies as well. I feel it's a bit like loosely handled whereas I feel this is kind of your bread and butter. So what is a failure allowance and how do you measure it and what do you do if you if you overstep it or overspend it. >> Yeah I mean I think that the idea of a failure allowance is want to have it like you have to have it. If you assume for no, you know, that you'll never have a failure, you will you'll actually have a very bad day for your customer. And so we account for failure allowances. And but the the most important thing is let's just talk about the failure allowance in our cash. So how do we manage that? Well, we manage it in such a way that you'll never experience it because we size it, right? And if you're sizing the cache and you're making sure that the underlying capabilities and the hardware are always there and we have like I talked about those distributed sub subsystems those microservices that are all interoperating under the hood. We have a ton of them that do nothing but just track metrics right and like you know that the sizing of our cache is all related to the metrics and the um >> the size of our underlying system. >> All the metrics. Yeah. >> Yeah. That's right. And so one of the really big benefits of running on S3 is because our system is so huge, you have these massive, you know, uh layers, right? And the massive layers are all managing things like correlated failures and and um and failure allowances. And because they are so huge at the scale of S3, any application that's sitting on top of S3 gets the benefit of it. >> Let's take a break a minute from S3 to talk about a one-of-a-kind event I'm organizing for the first time. The Pragmatic Summit in partnership with Stats Sig. Have you ever wanted to meet standout guests from the Pragmatic Engure podcast, plus folks from Kadesh Tech Companies and learn about what works and what doesn't in building software in this new age of AI? Come join me 11 February in San Francisco for a very special one-day event. The Pragmatic Summit features industry legends and past podcast guests like Laura Tacho, Kent Beck, Simon Willis, Chip Huan, Martin Fowler, and many others. We'll also have insider stories on how engineering teams like Cursor, Linear, OpenAI, Ramp, and others built cutting edge products. We'll also have roundts and carefully created audience where everyone and everyone is interested to meet and chat with. Something I'm hoping will make this event extra special. Seats are limited and you can apply to attend at pragmaticsummit.com. Talks will be recorded and shared and paid subscribers will get early access afterwards as well as a thank you for your additional support. I hope to meet many of you there and I am so excited about this event. And now let's jump back to S3 and the massive scale of the service. To get a sense of what what what the reality is like working as an engineer, an engineering leader inside an organization like this. I read a quote from a distinguished engineer Andy Warfield who who said I'm I'm just quoting what what what he said. Early in my career, I had this sort of naive view that what it meant to build large scale commercial software that it was basically just code. The thing I realized very quickly working on S3 was that the code was inseparable from the organizational memory and the operational practices and you know the scale and and the scale of the system since you you've now been more than a decade in S3. How do you think of this this beast this this really complex system hundreds of microservices data that is hard to fathom you know unless you think of the hard drive stacking all the way to the space station and how do you engineers kind of wrangle this because it does feel a bit intimidating I'm not going to lie >> well I think so much of this just comes back to the culture and the commitment on the team and you know I've worked on S3 for a very long time now and I have such deep respect effect for the engineering community on S3. And you know, honestly, I mean, this is true for all of the services in our data and analytics stack, but we have engineers in S3 and they come in every single day with this deep commitment to the durability and availability and the consistent of your bite. And so the type of conversations that we have are so interesting because we have people and really you know these are people who are early out of school there are people who've been working on S3 we have engineers who've been working on S3 for 15 years and everything in between the creativity and the invention of S3 like you have this tension which is like on one side you're like you have to be very conservative with S3 right and on the other hand like I mean we have this princip ible engineering tenant called respect what came before and that's an Amazon engineering tenant which is if it has worked for many many years you have to respect that but then there's this also this tenant these two tenants are a little bit in tension with each other which is kind of what makes it so fun Amazon engineering tenant is called um be technically fearless and I believe that the S3 engineers are just amazing at this at respecting what came before because if we build new capabilities in S3 We have to maintain the properties the traits of S3 which is it just works and you get that durability availability etc. But at the same time we have to be technically fairless because our ability to go into the world of conditionals our ability to go into the world of you know native support for iceberg or for vectors means that we are extending this this foundation of storage in a way that helps customers build whatever application they need now and in the future. And so that combination of the two things that is sort of when I think about our S3 engineering team I think they come in every day and they embody that. >> Now going back to the evolution of of S3 from unstructured to structured data. You were mentioning how Hadoop uh the data warehouse was what was a big use case where customers started to use it on top of S3 and then at at S3 you noticed your like what a lot of customers or some of your biggest customers doing and you kind of built it uh yourself with with more structured data and then S3 tables came along and then vectors would you mind sharing a little bit more on on how you evolve S3 because this was another question that when when I asked people about what they'd like to know about S3 one of the question was like like is it done Is is it finished or or is it still evolving? Because there is this notion that S3 can store anything already, right? Like any any object, any blob? What what new thing is there? And yet we have a lot of new things. >> Yeah. And if you kind of go back in time a little bit and you think about, you know, the rise of Parquet. Okay. So the rise of Parquet data in S3 started about 2020 and um we started to see more and more people store their tabular data in S3. And if you think about what iceberg provided, it provided a replacement for Hive. Okay, so if you think about Hive and Hadoop, Hive was basically giving your file system access into S3 unstructured storage. Iceberg is giving that iceberg that tabular access including the you know the compaction and all the table maintenance that goes along with it into your parquet data. And I actually think that the world's data for tabular data is going to live in the future in S3. And if you just think about the launch that for example Superbase did last week, Superbase announced that their Postgress database is now going to is just going to do secondary writes directly into an S3 table just like their Postgress extension for vector is going to um integrate directly with S3 vectors. And so if the world of database, if the world is data as a source, if you will, goes directly into an S3 table, what does that mean for the world's data? Okay, so SQL as we know is a lingua frana of data and the world's LLMs have all been trained on decades of SQL and therefore >> and Python, SQL and Python, >> Python and the stuff that's already out there. And so if you think about this, you know, we have many, many AWS customers who know the S3 API pretty darn well by this point. It's pretty simple API, but now you have the ability to interact with data in S3 through SQL. And what that means is that you don't have to be, you know, somebody who's building cloud applications or know S3. You just need to know SQL. >> And this is with S3 tables, right? >> With S3 tables. And so you can just write SQL into an S3 table and whether you're an AI agent or a human, right? You're introducing the lingua franka of data as a native property of S3 with S3 tables and I think you're just going to see that take off in the upcoming years. >> And your latest launch is S3 vectors. C can you share a little bit what it takes to build a new data primitive like vectors just just behind the scenes how long it takes how the seams comes together and and maybe what are some engineering challenges of launching some something like this and again we're talking about vectors right so like you can you use embeddings whenever you have LMS you create an embedding it's a vector you want to store that somewhere you will need to do search on it there's specialized vector databases there's specialized vector additions etc so I'm assuming this is the the functional that that S3 vector supports it very nicely. >> Yeah. And you know I mean today a lot of customers use vector databases just like back in the day a lot of people put their you know their tabular data in just databases. Okay. And they just use the structure of the database in order to you know take advantage of being able to query their their data. But they didn't really need to use a database. They just put it in a database. And then S3 came along and then we introduced this way, you know, with with the help of open formats like Apache Parquet and being able to store that structured data in S3. That's kind of what we're doing with vectors right now. Okay. And if you think about vectors, vectors are basically a bespoke um data type. A vector at the end of the day is a very very long list of numbers. And vectors have been around for a long time and they've been in vector databases for a while, but they really kind of took off in people's, you know, data worlds in the last couple of years with the rise of, as you said, the embedding models. Okay. And so if you take a step back and you think about one of the great ironies of data, it is that you have to know your data to know your data, right? You have to know what your schema is. You have to know what the data types are. You have to know where it is. And as these data lakes become data oceans, you have this situation where it gets harder and harder to know what's in your data, right? And the beautiful thing about embeddings is that embedding models will understand your data so that you don't have to understand your data. And the format that these embedding models puts this semantic understanding of your data is in fact a vector. And so when we talk to customers and we, you know, they're so excited about how these embedding models are getting better and better, they want to apply more and more basically semantic understanding to their underlying data whether it's unstructured or structured that they have in storage and so they kind of want to store billions of vectors. But but just to say when they want you say they want to understand to correct me if I'm wrong but hypothetically you have a bunch of text data or maybe some image data and you're saying that a lot a lot of people customers teams they would like to write queries to say like hey can you find an image that looks like a puppy or can you find an article that contains this or this and and embeddings are as we know are great for that but then you need to actually create the embedding build the system etc. Right. >> Yeah. And like exactly what you're saying like I mean if you think about what what vectors can do, if you think about all the data that a given company has, you know, your knowledge across your business or your knowledge across your life isn't organized into rows and columns like a database. It's in PDFs. It's in your phone, right? It's in audio customer care recordings which capture the sentiment of how a customer actually feels about their interaction with you. It's whiteboards. By the end of this day, this whiteboard is totally filled up with ideas and it's in documents across dozens of systems. And so it it's not that you don't have data. You have tons of data. But understanding what data you have across all of those different formats is a real problem. And it's one that AI models can help you with. And so the capabilities of those AI models have gotten so much better in the last 18 to 24 months. But it we needed a place to put billions of vectors, billions of uh you know the semantic understanding of relationships and that's what we built S3 for. the state-of-the-art embedding models combined with the ability to have vectors across S3 is like a a really important part and it's not a database. I mean it's the cost structure and scale of just S3 but it's for vector storage. And then do I understand that did you need to build new primitives to store this like going down to the metal figuring out exactly where we do this or did you build it on top of your existing you know like again existing primitives as well like blob storage etc. It's actually a new primitive and so you know we had talked about S3 tables. S3 tables um is building on objects because those individual parquet files at the end of the day they're an object. Vector is totally different. So with vector we built a new data structure a new data type and you know it turns out that when you're building vectors searching for the closest vector in a very highdimensional space which is basically vector space >> yes >> it's often really hard to find the nearest neighbor and so you basically in a database you have to essentially compare every vector in a database and that's often like super expensive. And so what we do in S3 is because we aren't storing all of our vectors in memory, we're storing it on our fleet of S3, very large fleet, we still need to provide a super low latency. And in our launch last week, we were getting about um 100 milliseconds or less for a warm query to our vector space, which is actually pretty fast. It's not database fast, but it's pretty fast. And the way that we do that is we premputee a bunch of think of them as vector neighborhoods. Okay? And so it's basically a cluster a bunch of vectors that are clustered together in similarity like you know a a type of dog as an example. These vector neighborhoods if you will they're computed ahead of time offline. They're computed ahead of time asynchronously so that when you're doing your query it's not going to impact your uh query performance. And then every time a new vector is inserted to S3, the vector gets added to one or more of these vector uh neighborhoods based on on where it's located. And so when you are executing a query on S3 vectors, there's a much smaller search that's done to find the nearest neighborhoods. And it's just the vectors and the vector neighborhoods that are loaded from S3 into a fast memory. That's where we apply the nearest neighbor algorithm and it can result in like really good sub 100 millisecond query times. And so you know if you think about the scale for S3 will give you up to two billion vectors per index. You think about the scale of a S3 vector bucket which is up to 20 trillion vectors. And you think about that combined with a 100 milliseconds or less for warm query performance. that just opens up what you can do with creating a semantic understanding of your data and how you can query it. >> It sounds very interesting and also challenging to because you have to build this for scale from day one. I guess that's that's one of the I guess benefits and curses of working at S3 that everything that you launch you need to prepare for what will be extreme data elsewhere but here it's just Monday. We have S3 service tenants as well. And one of the tenants and one phrase that I use all the time and our engineers do too is scale is to your advantage. So if you are an engineer and you think about that and you think about one of your tenants for anything you build is that scale must be to your advantage. It just changes how you design. It means that you can't actually build something where the bigger you get, the worse your performance gets or the worse some some attribute gets. It has to be constructed so that the bigger you get, the better your performance gets. The bigger S3 gets, the more decorrelated the workloads are that run in S3. That is a great example of scale is to your advantage. And so when we built vectors just like we built everything in S3, we ask ourself how can we build this such that scale is to our advantage. How can we build this such that a 100 milliseconds or less is just the start of the performance that we're going after? And how can we make sure that the more vectors we have in storage, the better the traits of S3 for vector. >> I have a different question about the limitations of of S3. Uh I read that the largest object you can store in S3 is 50 terabytes. Um why is there a limit on the largest object? I mean I think we can imagine this will be through either multiple hard drives and so on but why did you decide to have a a limit? You know I'm just interested more in the thought process of how the team comes up with like okay this will be the limit and this is why >> I mean I think um first of all that limit of 50 terabytes is 10 times greater than what we launched with. We launched with five terabytes and now we're 50 terabytes and sometimes we sit and tell customers that and they go what am I going to store that's going to be 50 terabytes and we're like high resolution video right and so um you know if I think about >> known customer >> right and so if you think about this sort of thing you know like if you think about I don't know size size limits generally speaking we we do try to optimize for certain patterns And um when you raise the size of an object by 10 times like we did, we're just optimizing for the performance and scale of the underlying systems, it's like we increase the scale of our batch operations by 10 times last week, too. And the idea behind that is that the underlying systems were just optimizing for distributions of work that are the new norm for how people are doing things. And um we'll just keep on changing. We don't have too many limits to be honest, but we'll just keep on, you know, looking at what customers are doing across a distribution of workloads and seeing if there's something that that needs to be changed. The big thing for us, you know, again, we we did have a lot of conversations with customers and they're like, "Really? Like, I don't have that many individual objects that are that big, but with the increase of, you know, cameras and phones and things like that, we are seeing more and larger size objects and we just wanted them to be able to grow unfettered in S3." >> And so, how does S3 evolve and and how has the road map changed? Because so far what I picked up is everything that you told me is saying well you know our customers were doing this or that and you obviously here you live and breathe data so you see the patterns you see stats you see the the objects you also talk with them is it only you talking with customers seeing what's happening what's uh what they're struggling with what they're using more of and then deciding to improve that may that be the limits may that be figuring out we need a new data type because they're now building their own data types on on top of it or is is there also some some kind of more kind of all right here's a vision here's a road mapap of what we'll do >> it's a great question and in fact one of the things that we talk about all the time is the coherency of S3 right and so there are certain things that people always expect from S3 it's the traits of S3 it's the durability availability attributes that we talked about and so a fair amount of engineering goes on under the hood for that. Okay. And it's a set of capabilities that you know we may or may not have talked about today. In fact, if you think about I think back to 2020, I think we've launched over a thousand new capabilities since 2020 in S3. And some of them are what we think of as the 90% of the road map, which is what people ask for explicitly. Okay. And so, for example, you know, some of our our media customers want the bigger object size, and so we delivered that. We have other customers that do a lot with batch operations. But then we have some things that we invent because you know we look at what customers are doing with the data and we ask ourselves how can we build that. Vector kind of falls into that category. For vector when we looked at S3 and how S3 is evolving, we told ourselves like look you know we can continue to make S3 the best repository for data on the planet. And we will. We will. We have engineers that come in every day working to make that so. But there's this other element of how do you make sure that the data that you have is in fact usable and how do you make sure that it's usable in a way that's you know industry standard like that iceberg layer on top of our tabular data. But it's usable because AI models have now gotten so good at embeddings that you can have AI give you a semantic understanding of your data. If only you had the cost point of putting billions of vectors into storage. So you could actually understand and use your data in a different way. And so for us, a lot of it is kind of taking a step back and looking not just at what customers ask us for, but we want to remove the constraint of the cost of data, which is what we do in S3. And we want to remove the constraint of working with your data, which is what we do in S3 too. And when we can do both of those things, if we can make it possible that your data grows as your business needs it and you can tap into all the capabilities that you're getting with AI and how the world is changing for data, then then we have a shape. We call it a product shape. Then we have a product shape. >> Product shape. >> What's a product shape? It's sort of like an emerging like when I think about S3 I think of it as almost like this living breathing organism where the shape of the product is evolving but it's evolving with coherency around what you expect for the traits of S3 but it's evolving in a way that lets you steer into how you want to use data and how do you want to use data not just now but in the future and we will continue to evolve the product shape of S3 based on what you want to do with data. And so in a lot of ways, we're sort of transcending the the the boundaries of what object storage was or what a database traditionally was because now we have tabular formats, we have conditionals and we're we're evolving into this new shape and it is ultimately uniquely S3. >> It it kind of sounds like you have all these microservices. It's kind of evolving almost like a plant or a living organism. No. Yes, I uh I am in fact a former peacecore volunteer from forestry and so you know a lot of times I will go back to the natural world for my my metaphors and uh yeah I mean S3 is this living breathing repository of data that lets people do things with data that they never thought possible. It's just interesting because I think as engineers we don't often think to relate the systems that we build with like a a living organization when in fact I mean obviously there's code but as as you said there there's people there's servers there's failures that now happen at at a at a cadence you can almost just you you can probably predict how many hard drives are failing today in fact at at your scale already which again maybe is do you think it's because of the scale when things become large enough they start to have these characteristics because what what I find fascinating talking to you is the way engineering works inside of S3 feels very different to how it works inside a smaller organization your kind of startup which again does you know like terabytes of data or maybe even a few pabytes but but that's kind of it uh and you've seen some of the these organizations what what changes at at at this large scale what do you think that makes it it feels pretty different the the world that you and and the teams work in >> it does but you So in order for us to sustain the traits of S3 and to evolve it over time, we have to constantly go back to simplification. We have a very complex system with all of our different microservices, but I kind of go back to those microservices have to do one or two things really well and we have to stay true to that. Otherwise you know the complexification of a distributed system you know it's it's unmaintainable over time and for S3 this concept of okay there's a simple in S3 and the simple in S3 is a couple of things one it's a simplicity of the user model where not only do you have a simple API but now you have the simplicity of using SQL with S3 or you have the simplicity of being able to leverage these AI embedding models which makes semantic understanding understanding of your data so much easier than having to annotate you know a whole metadata layer. And so that concept of simplicity is in the user model of S3 but under the hood if you are sit on any of our engineering meetings you will hear our engineers talk about how do we make sure that we implement this capability with the greatest simplicity that we possibly can. Speaking of which, what type of engineers do you typically hire to to work at S3 in terms of what kind of traits potentially past experience do you look for? >> Well, we hire all kinds of engineers. Uh, you know, we have a lot of, um, engineers on S3 who are early career. Uh, they're straight out of school or they're at a, you know, undergrad or graduate school. And like I said, we have, like a ton of engineers who have been on S3 for a long time and everything in between. I think there's a really strong element in our teams um that work on data around ownership. It's a it's you know people feel this like personal sense of commitment. I feel it. I feel it every day I come in where I feel a personal sense of commitment to your bite, to the preservation of your bite, to the use youthfulness of your bite, to the ability for you to think about what your application does next and not the types of storage that you need or how you grow it. And that deep sense of ownership and that deep sense of commitment is a very very common thread across our data teams because we know that at the end of the day every modern business is a data business and everything that people are trying to do with traditional systems AI whatever is based on your data as shaping the core of your application experience. And so that data is our responsibility and we feel it very deeply. >> And what would your advice be to let's say a mid-career software engineer, someone who has a few years of experience working at at different places who would who is actually after listening to this gets really enthusiastic and decides like one day I'd love to work on a a deep strong infrastructure team like S3 for like let's say like more experienced folks. what are experiences, activities that you might look for that that might uh help you consider these folks more? >> There's a a strong value and relentless curiosity. Okay. And you know, I talked a little bit about coloring within the lines and how when you work on S3 or a large scale distributive system which continues to reinvent what storage means, you're not really coloring within the lines. you're just kind of looking, you're taking a step back and you're saying, you know, I I will draw what the lines are today and I will know that I might have to rub those out and draw new lines in the future for wherever things go. And so, you know, I have three kids who are um in university. I have two kids in university and one in grad school. And that is one thing that I, you know, I think is really important is to always take a step back, take a look at the latest research. And some of the papers that I'll share with you are around how we, you know, we either took formal methods and we brought them into storage systems, right? Or we thought about failure in a different way where that that creativity, that relentless curiosity and that creativity with engineering, I don't think you can go wrong with that. I think the next generation of software, no matter if it's built in S3 or elsewhere, it is all driven by the creativity of the engineering mind and it is in all of us. We just have to kind of unlock it and unleash it and we will bring we'll build amazing things like S3. And I also love that with S3, not only has S3 created something that did not exist and I think like it just was unimaginable because it didn't exist, but now I'm hearing startups that are building on top of S3. I think Turbopuffer is a good example. You know, they're building innovation because now they have a base layer and I I feel there's different levels of innovation. You decide where you want to innovate at the very lowest level, one level higher and and so on. And you just use the right primitives, right? In your case, this is just doing hardware and storage better than anyone. In the other layers, it will be using the right primitives better than anyone. >> Yeah, it's very exciting for us to see so many different types of infrastructure built on S3. Now, >> and as closing, what is a book or a paper that you would recommend reading that that you enjoyed and and why? >> I read a lot of different papers. I am fascinated by how quickly the evolution of embedding models are coming along now and in particular um a field of science that I'm quite interested in is the multimodal embedding model because as you know the world that we experience is multimodal and therefore the understanding that we have of data should be multimodal as well and so there's this whole field of science that's that's emerging quite rapidly um around multimodal embedding models uh and so I That is something that I encourage people who are working in the field of data to to look at because I think that is the next generation of data. If you think about you know the next world of data lakes I think it's actually going to be on um metadata. It's going to be on the semantic understanding of our data and uh understanding how that is created through uh vectors and how it's being searched and um done across multi uh multiple modalities I think is is an important area of both research and advancement. And so that's what I would encourage people to look at in the world of data. I think vector is going to be quite quite big particularly at the price point that we've introduced for S3 storage for vectors. And um I'm excited about it. I think, you know, I think we're just getting started with data and an understanding of our data and I can't wait to see what comes next. >> Amazing. And do you have any book book recommendations? >> I will give you a book recommendation um just in case your readers are interested. It won't be in the field of computer science. um it will be about the evolution of um the ecology around us and supporting the bees, the native bees and insects around us. So, a tiny bit farther a field, but I'll give you a book recommendation and if your your readers are interested, they can um they can take a look at how to support the bees of the planet. >> Well, Miline, thank you very much. This was fascinating and and very interesting to get a peak into this massive world of scale of data and and respecting the bite and and treating it and making sure that it's durable. >> It was great talking to you and thank you to both yourself. I know you're a fan of S3 and to all of your listeners who use S3. Uh we quite literally wouldn't be able to do what we do without the feedback and the encouragement from everybody who uses S3 today. So thank you for that. >> Just wow. I always suspected there's a lot of complexity behind a system like S3, but I just did not realize the scale of it. Whenever I worked on a systems with even hundreds of virtual machines, failure of one machine was a rare event and not something that we really counted on. During my conversation with my launch, she casually mentioned that several machines have failed during our conversation, which is something that the Sream knows and prepares for and treats it like an everyday event. I personally really liked how AWS has two conflicting tenants heavily used on the S3 team. Respect what came before and technically fearless. For such a massive system, it will be easy to say let's move conservatively because of how many companies depend on us. But if they did so, S3 would fall behind. Finally, I'm still in awe that AWS put strong consistency in place, roll it out to all customers and did not increase pricing nor they didn't increase latency at S3 scale. This is an absolutely next level engineuring achievement. In fact, it was probably one of the lesserknown enduring feats of the decade. I hope you found the episode as fascinating as I did. If you'd like to learn more about Amazon and AWS, check out the exclusive deep dive I did with AWS's incident management team on how they handle outages in the show notes below. In the Pragmatic Engineer, I also did other deep dives about Amazon and AWS. They are also linked in the show notes. If you enjoy this podcast, please do subscribe on your favorite podcast platform and on YouTube. A special thank you if you also leave a rating on the show. >>

Summary

This podcast episode provides an in-depth look at Amazon S3's massive scale, engineering challenges, and evolution from an eventually consistent storage service to a strongly consistent one, highlighting how S3 uses formal methods, distributed systems concepts, and innovative data structures to maintain reliability and performance at a global scale.

Key Points

  • S3 stores over 500 trillion objects and hundreds of exabytes of data across 120 availability zones and 38 regions, managed by tens of millions of hard drives.
  • S3 evolved from an eventually consistent system in 2006 to a strongly consistent one by implementing a replicated journal and cache coherency protocol, without increasing cost or latency.
  • The team uses formal methods and automated reasoning to mathematically prove correctness of complex systems like strong consistency at S3 scale.
  • S3's architecture is built around core principles: simplicity of API (put/get), durability (11 9s), availability, and correctness.
  • Engineering at S3 scale requires managing correlated failures, crash consistency, and failure allowances, with a culture of 'respect what came before' and 'be technically fearless'.
  • S3 continuously evolves with new data primitives like S3 Tables (for structured data) and S3 Vectors (for AI embeddings), enabling customers to use data in new ways.
  • The team prioritizes customer needs, innovating based on observed patterns like the rise of AI and embeddings, while maintaining core traits like durability and availability.
  • S3's scale is leveraged as an advantage; larger scale enables better performance and more diverse workloads, making the system more robust.
  • The team uses automated auditor systems to verify durability promises by inspecting every byte across the fleet, ensuring the system lives up to its promises.
  • S3's success is attributed to a strong engineering culture focused on ownership, creativity, and relentless curiosity, hiring engineers who are both technically fearless and respectful of the system's history.

Key Takeaways

  • To build reliable large-scale systems, prioritize core traits like durability, availability, and correctness, and use formal methods to mathematically prove complex properties.
  • Manage distributed systems by deeply understanding concepts like correlated failure, crash consistency, and failure allowances to ensure reliability.
  • Innovate by extending core capabilities (like adding SQL to S3 Tables) to make data more accessible and usable for customers, especially with AI.
  • Use scale as an advantage; larger systems can handle more diverse workloads and achieve better performance, but require robust engineering practices.
  • Foster a culture of ownership and technical fearlessness where engineers feel responsible for the system's correctness and are empowered to innovate.

Primary Category

AI Engineering

Secondary Categories

Machine Learning Data Engineering Programming & Development

Topics

AWS S3 cloud storage distributed systems eventual consistency strong consistency formal methods correctness durability availability failure tolerance vector storage S3 tables Iceberg Parquet data lakes data oceans storage pricing intelligent tiering correlated failure crash consistency failure allowance microservices distributed systems design formal verification automated reasoning engineering culture product shape

Entities

people
Mai-Lan Tomsen Bukovec Milon Andy Warfield Steve Huynh Dave Anderson
organizations
AWS Amazon Statsig Sonar WorkOS Pragmatic Engineer Pragmatic Summit Apache Hadoop Netflix Pinterest OpenAI Ramp Cursor Linear Superbase
products
technologies
domain_specific
technologies products data_formats engineering_concepts

Sentiment

0.85 (Positive)

Content Type

deep-dive

Difficulty

advanced

Tone

educational technical entertaining inspirational professional