Live streaming at world-record scale with Ashutosh Agrawal (ex-Jio / Disney+ Hotstar)

pragmaticengineer qXJ3S3T3xJY Watch on YouTube Published February 11, 2025
Scored
Duration
1:02:12
Views
119,470
Likes
2,292

Scores

Composite
0.51
Freshness
0.00
Quality
0.84
Relevance
1.00
12,658 words Language: en Auto-generated

we used to run something called as game day now this is a basically a simulation of how an actual match is going to be we used to generate not just traffic we are simulating the entire operating protocol we'll say that okay the match is going to start at 7:00 p.m. let's say okay now every system is supposed to scale beforehand there's a timeline to it there is a checklist to it and and so on no matter whether the teams are ready or not we will start the live stream and we'll start sending traffic wow really you did that they don't know what the traffic is going to come what kind of a pattern is going to come and anything because that's the exact set of our production you don't know what is coming in coming your week Ash shakaal was a software architect and principal engineer at the largest streaming service in India geoc cinema now known as Disney plus hot star in the middle of 2023 the system he architected set a world record at the time with 32 million parallel connected live streams during the finale of the Indian Premier League cricket tournament but how did they do it in today's episode we cover how live streaming Works behind the scenes and how large scale live streaming systems are architected the trade-offs in system design and how server side load stream lency and stream smoothness are all connected the importance of capacity planning and drills including what the gamej drill meant at geoc cinema and many more details if you're interested about large scale systems or live streaming this episode is for you if you enjoy the show please subscribe to the podcast on any podcast platform and on YouTube Welcome to the podcast thank you thank you Greg can you help us imagine what it was like going back to the day this world record was said it was 32 million roughly that many conqueror connections at the time what was it like you know like on the day you knew this big match was coming up there was a system that you did architect you preparing it for scale but then you know like you're getting a larger than ever uh Spike U what what what what what happened what uh were you in the office was was was it at home was it stressful was it chill sure so uh uh most of the world Recs which you have said in past was generally set on the finale of the event right and by finale you probably settled into the event these events are not like one of or one day event like uh how it happens on other platforms uh Indian Premier League is like a 70-day event so it's it's a 70-day continuous Madness right and every day is more than 15 to 20 million users watching and depending on the game depending on the players who are coming in so it's it's not like that to be very honest the finale is relatively more calmer than the opening week the opening week is where the most of the madness is there finale by finale you have you you understand the traffic pattern better you understand what your requirements are your Protocols are much more dep set in depth uh so in the initial part of the game it is all chill till till till you know what till you're operating under the numbers which you planned for the moment you start to see it getting into the yellow zone uh is when it Things become interesting because then you have to kind of get get away from the automated auto mode of like a car driving or running the platform to a manual mode which requires you to be looking at all the metrics looking at every every issue which is coming in ensuring that you're triaging it uh on a on a on a uh like on a on a very short period of time otherwise what will happen is that uh they will become noise so you have to also ensure that the the ongoing issues is a clean board uh and in terms of setup we typically prefer to be in the office into the same location because if there is an incident it is much easier to get into a room with all the key people and be able to take calls and make decisions uh we have done this setup in uh remotely as well we it like we we have we had a strict protocol on how we operate remotely to ensure that there's not much of Madness it's very easy to for the zoom call to go crazy like a zoom call uh the zoom call on which all of us are connected as over 200 people with multiple partners across the globe uh present over there to support the event right it's just not our team or my team it's it's like multiple partners of Partners who are present on the call who are helping us uh scale through the event this episode is brought to you by work OS if you're building a SAS app at some point your customers will start asking for Enterprise features like s authenication skim provisioning and fine graen authorization that's where work comes in making it fast and painless to add Enterprise features to your app their apis are easy to understand and you can ship quickly and get back to building other features workos also provides a free user management solution called authkit for up to 1 million monthly active users it's a drop in replacement for odd zero and comes standard with useful features like domain verification role base Access Control bot protection and MFA it's powered by radic components which means zero compromises in design you get Limitless customizations as well as modular templates designed for quick Integrations today hundreds of fast growing stups are powered by work OS including ones you probably know like cursor versel and perplexity check it out at work.com to learn more that is work.com this episode is brought to you by Cod rabbit the AI Cod review platform transforming how inuring teams shift faster without sacrificing code quality code reviews are critical but timec consuming Cod rabbit acts as your AI copilot providing instant code review comments and potential impacts of every P request Beyond just flagging issues Cod Drive provides oneclick fix Solutions and lets you define custom code quality rules using ASD graph patterns catching sub issues that traditional static analysis tools might miss Cod rbit has so far reviewed more than 5 million poll requests is installed in 1 million repositories and is used by 50,000 op Source projects try Cod rabbit free for one month at cod rabbit. using the code pragmatic that is cod rabbit. a and use the code pragmatic let's go into the specific details we're talking about live streaming can you explain us how does video streaming work behind the scenes at an architectural level sure so uh let let me share my screen to talk about how it works uh just give me a sec so uh there's something called a source feed okay uh this Source feed is uh coming from venue so venue by from venue actually actually Source feed comes afterwards uh there's from venues we have feeds like camera uh so like the venue is in this case like the stadium where the matches right and then You' got a bunch of cameras there yes so all the cameras are actually connected over fivr all of that come to PCR uh PCR is a production control room where like all the feeds from stadiums are coming and then there is a director so it's it's like a movie production where there is a director there's a producer there are people who are operating the cameras the director is telling uh to the uh to the to the person on the on the ground that which camera to use what angle to show and which uh which feed and so on so all of that is happening over there uh lots of people lots of yelling lots of you know screens you know like that's a place where we are not allowed to go in if it's on it's on air it's it's like nobody is allowed to go in over there to be very honest uh from PCR we get us something called as a source feed this is basically what the actual production feed is uh now this Source feed is something which uh comes to so I'm talking about the video part of it then I'll come to the uh the uh the server side or the backend part of it this Source feed is then fed uh into the uh from there's a something called as contribution encoder okay this contribution encoders role is to uh convert The Source feed into a more because the source feed is a very high quality you need to compress it a little bit so that you can stream it to the cloud and then from there the source feed is going goes into the cloud which is like a cloud ecosystem like an AWS or a gcp or what wherever the encoding is happening uh this is kind of again a peer to your uh private link uh so it's nothing over internet by the way so so so this is like your your master video feed high quality but already a little bit encoded or or compressed to manage it yeah so feeds can be in like like hundreds of MVPs or 150 MVPs right the contribution encoder will bring it to a standard profile of like uh 40 MVPs depending on whether the event is being streamed in 4k 180p or something like that uh from there it goes into cloud and within Cloud there is uh your distribution encoder okay and this is the system which is taking care of encoding it for hls Dash or whatever format the user is going to consume into and this dist with the the hls and other formats these are like formats that different players will be able to support right yes yes so the the output is like an hls stream or a dash stream uh different combinations of it like you have a different type of stream for mobile you different type of stream for TV uh because again the form factors are very different the network on which they're going to operate is very different so all those combinations we we output like in past we have outputed more than 100 uh variants of streams to the CDN uh and yeah just so I understand so you are transforming this into multiple like you create all of these different ones right like as you said up to 100 of them yeah so actually actually if you think about it uh the there are multiple Source feed by different languages okay so we uh we used to stream in like about 13 plus languages so there would be one source feed coming for each of the languages and with platform combinations and all of those the output feed gets into a range of like 500 plus kind of a number so it becomes H this whole thing is becoming a lot more complex than you would have thought yeah so and and then there is an orchestrator on top of it this is where our engineering comes in right this is mostly whatever I'm talking about is probably using some partner technology and and stuff like that then there is an orchestrator which is basically uh controlling all of these uh systems uh uh not not at the piece actually not at the venue and Source sheet it starts from somewhere from contribution encoding to your uh your Cloud inra orchestration what what endpoint to push to uh what should be the config of distribution encoder and uh what CD and endpoints to use to all of this is managed by the orchestrator so this is our engineering product which is orchestr trading because on on a single day we not just hosting uh IPL like we used to not just host IPL we used to host other events as well there a lot of football games going on there's a lot of uh other Esports events which are going on and so on so right we while are focus on the Indian Premier League but we also have other we used to have other 50 events to take care of so this orchestrator orchestrator role is to ensure that all of that workflows are set up properly and from here what happens is this orchestrator uh sort of generate a playback URL so we we now the orchestrator takes care of knowing that what the final playback URLs is going to be which is your playback endpoints okay and then once the feed and everything is started it will send it something something called as a Content management system this is what our users are interacting with so orchestrator will push all the generated playback URLs over here and and a playback URL so is this a like the Endo of a node or a machine that that has the right format you know like the the No it's it's not an endpoint of a machine it's an endpoint of the CDN where the the video would be available uh it contains it is so for every TV mobile combination of Apple Android and whatever variants exist for each of them we generate a playback URL so playback URL is a certain spec right and along with that spec we push it to the content management system this is your client apps are generally interact acting with your uh content management system for rendering the browsing experience right whenever you open the app you're seeing the content list of contents and everything so uh this is how it does but that's not that's not all right this is just for a browsing experience and typically to scale you would put put a layer of uh good enough caching in between to ensure that things are uh scalable in nature right like that uh that's the one of the secret source of uh making things work now now this is about browsing experience so far what we have spoken about is that you can open the app you can see the content you can click on the content you can see the details of this thing the moment you hit a play button you call a much more complex system uh which is generally a playback system I cannot talk about exact internals of it but this system is responsible for uh ensuring that the user is authorized to watch the video it will talk to a user system it will talk to encryption and uh DRM systems it will talk to the cont management system and then give you a kind of an encrypted URL which is what the client will use to play back and are we talking about live streaming or is this playing back existing streams this is this is for live streaming uh the only difference in a like a uh this the what whatever I'm uh showing in the diagram is all about live streaming in a VOD system you would not have this complex uh setup like over here how you have the contribution encoder cloud and distribution it'll be a little bit simpler there will be a single encoder which will uh which will encode and put it into the CDN and there will be a similar orator for VOD content which will take care of this so the workflows are almost similar but there are more complexities on this part on how to select which URL to playback in live streaming uh than on on on on a VOD system yeah one one one question I have is you mentioned cdns but you know cdns are great for caching stuff and making sure you know spreading it out so that they're on the edge but this is a live live stream right like every every I don't know 100 milliseconds you will have new frames arrive or or every I'm not sure how how that's done but how does does this square with the fact that you know like CDN are great for caching but they they might not be the best for for Real Time stuff or or are they good for real time so they are actually uh see the the the the the secret source of how live streaming works is in the hls and the dash spec to be very honest uh uh I I can talk a little bit more about hls so that you get a sense of how it works right uh see hls you have something called as a master manifest okay so whenever you'll see a playback it will it will look something like master. m3u8 okay in this master manifest you have uh multiple informations so I'll so it will contain information of saying that this is a 240p video and this is it will point to a child _ 240 uh p. m3u8 and so on then similarly there would be a this 48p and so on right so you would have layers layers of manifest right uh and this is how the so what happens is within each of the child manifest you would have a list of segments okay segment 1. TS it is we can use t a segment uh you know time period within a yes so how long are segments usually yeah so that is configurable that is all sub subject to how we design how we look at uh what kind of a latency do we want to offer to the users uh on on theing so typically there are between four to 6C is what widely used in the industry so what happens is four to 6C files right you know one after the other okay yes so what what the client is doing is essentially in hls protocol is it when you when you initialize a video call a master manifest then it get the list of all these child manifest and it is keeping track So based on the start start bit rate based on the network condition the user is it will pick saying that okay this user should start from 480p or 720p or so on okay and uh then the child is just calling this manifest again and again and if your segment duration is let's say 4 second it will call the child manifest every 4 second and asking for an update okay it's asking for the update will be the next file the next you know next segment yes so this is so this is like a window uh window manifest which has uh like two 200 plus segments typically you select a duration which you want to keep in the in the Manifest and the the player is kind of pulling this manifest and getting the list of segments so as soon as it sees a new segit it will go to the CN request and get it down uh get it downloaded and and put it on the buffer of its own so that is how it works so the role of the CN is that the CN is not operating at 100 Ms is what I wanted to call out CN opting at like a 4 to 6 second Gap but still with so much of caching it becomes tricky and this is where our engineering works is how we tune and find tune the CN configurations what is the right TTL to use uh and and so on right because if you use too short of TDL then everything is expiring and you're always missing a cash yeah and if you and if you use a very long TTL then you have a chances of hitting in stale data yeah yeah so it's it's it's it's a fine game but now I guess I Now understand you know when when there's a big game happening may that be the World Cup or or something like that and you're watching it on the stream every now and then uh you hear a massive scream from the neighborhood and then two or three seconds later you know you see the goal or or see the team scoring so I guess this explains right because there will naturally be a roughly up to few seconds of of of delay depending on which end point you're hitting depending on when you started uh when you started the stream yeah good good that you brought in this in right like I I'll explain so every time any encoding system would be more efficient if it has a look back period right so if an encoder wants to be more efficient see compression technology is like saying that hey I have a reference frame and these next frames are Delta from this reference frame so for it to be for compression to be more efficient if you can look back more and more you will be uh better right so there's something that's called as group of pictures or group of GOP right you say that my GOP is 2 seconds or my GOP is 1 second or 4 second which is to say that okay I'm going to look at optimized compression for this the so every encoding stage will add to that latency which you're talking about uh so that is one place where your latency is added then the other latency is added on the client side now when you're streaming on a uh internet uh versus a TV in in TV what will happen is that if you miss downloading a segment like in segment equivalent or certain frames you'll probably see a blackout and you'll move forward TV is not going to let you be uh it is not going to optimize for you to be at the right place right it is always the latest Point you're seeing because it's a broadcast prot it's real time it's real time right so if you miss something you miss something like if there's a if there is a rain and you miss to get that packet on the TV you'll miss it but in in Internet streaming uh every user is maintaining a buffer right and to ensure that you get a smooth streaming you will always have some buffer configuration on the client side which will be taking care of uh uh like your your smoothness right right so what happens is whenever you start a playback you're always starting a playback from 5 Second to 10 second behind the life point because if I keep you on the light point and there is problem for any reasons right it's a distributed computing it's a complex system for any reason if you miss out to download that particular segment uh you will end up seeing a rotor and you'll have a bad experience so it's it's it's a fine Choice when you're doing it uh about scaling versus number of devices you want to support versus uh uh continuous like like a better ux versus a more life point right I I like we we found a Sweet Spot somewhere where it works for the users again I can disclose the numbers but that's that's the complexity behind it yeah but I I I really like because in the sense of like okay well this is the reality but you know this is what it's about right tradeoffs like you can you can choose between having you know like some latency okay 5 seconds but then it works pretty smooth for most users and you know you don't have crazy infrastructure requirements or you know your our requirement was you know like I don't know someone came up with like you shouldn't have more than 100 500 milliseconds you would need to architect a lot bigger system a lot you know different trade-offs right so so I'll give you example right uh let's say you you have 4 second segments okay and uh you've got so you're polling for you're downloading 15 segments in every minute you're downloading 15 manifest calls you're making 30 calls to the CDN every minute now if I change the C uh the Manifest the segment duration to 2 2 seconds your volume of calls will double which means even though you're downloading the same amount of data the number of requests which you'll be hitting which is the number of compute Cycles which will trigger on the CN will increase right and and no matter whether we say the internet and cloud is infinite it is not really infinite there is there is finite amount of space and resources and everything so when we are designing for scale you to factor all of this in uh we have to factor all of this in from the perspective of capacity from CED in capacity so sure your network have gone to 5G and all of that right but there are so many choke points within the infrastructure that you have to account for you have to be aware of so and and that is where our play the the playback system becomes very crucial because it is aware of all these nuances and deciding the best experience for the user yeah I guess this is it just shows how different internet streaming is versus traditional television you know like the when you had an analog signal from the radio go out it's just you send it one way every device catches it and and they have it catches it if it misses it misses it as as you said or when you have our traditional cable you know the the cable connection to the TV again you just push out the signal is the same real-time signal so it just doesn't have this level of you know you don't need to worry about the internet infrastructure the the internet providers your bandwidth uh all all of this changing it's it's fascinating I do have one question related to this adaptive bit rate streaming is word that I've heard and you know I can kind of imagine what it means and you kind of already touched on it how does it work and who controls the Adaptive B is it the server is it the client so it's it's mix of it uh largely the capabilities is driven in the player but it's it's all about find unning like adaptive bit rating is uh is is if I were to Simply explain it you have a network speed of 4 Mbps every layer which you see within the video which is 240p 480p every every layer has an average bandwidth so 240v would be probably at a 200 KPS 720p would be at a about an MVPs and so on so what your player is doing is player is as in when it is downloading the segment it is also measuring your bandwidth because you're downloading a 4 MB of let's say a segment chunk how much time it did it took so it knows an average speed right and it when it sees that that it is drifting apart uh uh like your your download speeds are getting slow it will recalibrate and the player would jump to a lower uh lower layer or this thing right so it goes to the Manifest and says like hey can I have the you know 240p cuz I I can measure that but I'm I'm not fast enough to download all this yes exactly so that's where you see the blurring of the video and and that's and all of that seamless switch is taking care of the player but that is the simpler simplest implementation of the EXO player or the client side player now now if you do that and you try to run an IP match on that I don't think it will work you'll encounter numerous issues you'll get a bunch of customer complaints and so on so there are a bunch of parameters which player allows you to control like what threshold to choose to switch between layers what should be your starting bit rate what should be your uh uh threshold uh threshold for error what what is a buffer uh what is a buffer duration which is left before you make a decision to switch so all of that those parameters are something which is fine-tuned when we are doing engineering right and and some of these parameters are governed from the server as well so the server is also looking at how things are operating around the world uh like in in the entire ecosystem and then making some calls so if you have to do deg a the server can choose to do degradations uh by by limiting the the number of layers you accessing it so so so what you're saying is when you when my server is getting overloaded you also have the choice to you know start serving either like longer segments lower bit rate Etc to in order to you know like make these decisions yeah so so again from so the the the place where I was saying that I'm pushing to se in numerous combinations there are some combinations that are designed from a reliability perspective that hey we can offer the best experience up to a certain concurrency but after a certain concurrency we might have to switch users to a little bit more degraded so yeah it makes sense right yeah so so all of this is now part of the engineering which we used to do this is not something which ceden offers right and and so cedan is sure handling much more uh lot lot of segment delivery and stuff but that orchestration is where the tricky business is yeah so so so I guess is is it safe to say just you know there's that thing where you're streaming something may that be a live video either live video or video on demand and when it gets blurry I guess two things can happen either it's your client side deciding your bandw is not long enough or if it's a live event it might be that the you know the engineering team or or the system decided that in order to sustain all this High load it needed to uh switch back so to conserve the server resources yeah so the third aspect to it right your client uh now now this is where your uh eul uh kind of Concepts come into the internet right when as in when the congestion starts uh let's say you're connected to a tower now 100 people were watching earlier the tower has a finite capacity to stream Downstream to let say Sorry by Tower you mean like a a mobile Tower mobile Tower 4 5G Tower 4G or a 5G Tower right they have finite capacity on the number of clients they can handle like like an access point if they start to get congested they will start to naturally throttle you or start band with sharing which will also trigger slow download so this is not a parameter where I'm going and controlling and saying that hey the user should get blurry or this thing like that it is it is natural which will happen right depending on the infrastructure layout and where you are connected to there there's so many layers into you know what we started off is just you know like oh just stream live video oh it's so it's it's funny right when we get user complaints you have to look at at so many places like was it R system was it our client or was it like some intermediary Network and if it's intermediary Network how do you prove that there was a network inis because we not we we do not control those systems you do not have visibility on the metrics of those systems so we have to build Intelligence on the client to figure out all all of this yeah it's it's very complex when I used to work at Skype uh and you know it was on my LinkedIn people knew and sometimes people would either message me or or friends of a friend saying oh you know you work at Skype so I had this problem where had this video call and it got really blurry like why was that and I was like you know like like like you said I I didn't have as deep of understanding of everything that happened but you mentioned that you cannot monitor everything what is it that you can monitor because I I I I imagine you know just as an engineer you need to have as much information as possible uh to you know like figure out what is the best experience but there's going to be practical limits and also just kind of data collection limits so what is it possible to Monitor and what is it practical to monitor uh on the client side and then on server side so the amount of data collection and processing you can do is infinite right there's no end to data collection and analytics right what it comes down to is what will help you to uh in in a case of an in uh an issue or an incident what will help you to figure it out right and the way uh I I describe metric is that there are leading indicators and there are trailing indicators okay leading indicators are things which will ahead of time tell you that there is a problem about to happen uh so we would we would have 305 metrics which are defined as leading indicators and we are very sensitive we would be very sensitive towards those metrics like on the client side it would be uh the amount of buffer time you're seeing which is how many how many seconds did you see a buffer in a minute or uh what is a play failure rate which is basically when you start a video did you encounter a failure or while watching a video did you encounter a failure which is a fatal in nature like not not the app crash but the video crash kind of a thing so there are some of these metrics which will which will be treated as a leading indicator and these metrics would be collected on the client side there will be corresponding metrics on the server side uh we obviously you'll be monitoring bandwidth number of requests coming in latency response times bunch of other things which you'll be monitoring on the server site uh the the leading indicator get priority over the traffic like you would always want to have them process ASAP within a minute sometimes under 30 second also so that you reactive you're getting alert before anyone complains to you so that's that's where you optimize for then you have trailing data which is basically let's say you figured it out that there is an issue now you need to get into the details of it right which and and the funny thing with the live streaming is that your time is also moving right it's it's like it's it's just not that okay there is a system state or there is a state right that the state is also constantly evolving so you need to measure a lot you need to collect a lot of data and you need a lot of tooling to be able to visualize uh what is happening and how it is being rendered so so there's a lot of data collection which get processed a little slowly it's available on dashboards for us to uh consume and figure it out but uh that's how we look at metrix I'm I'm and then on the client side obviously you're collecting all of this data but then you need to like push it back to the surver right like every every you know again I'm assuming this is an enging tradeoff like if it's every second every 10 seconds every minute that kind of stuff I I don't think every second would scale for US imagine like 30 million users watching concurrently and every second they're sending data is like like I cannot scale that system to be very honest yeah but it's interesting because now you know like we have service clients downloading and then we also have the upload of of the metrics which is also this fine balance I assume uh when you when you decide okay even just this relatively simple example of how often should a client upload their leading indicators how did you decide on you know what that will be may that be every minute or so did you you know go to a whiteboard and figure out what does this mean for a server how much can we process how many metab bytes was it a mix of uh prototyping and doing load tests or or was it a combination it's it's combination right it's not an easy decision in fact uh during the event itself the the the frequency keeps changing as in when you're scaling as in when you're scaling you would be okay to compromise on certain aspects uh to scale the system and have some sense right you will also use some some form of sampling as well you'll use a balanced approach to figure out how to do so the and and you can collect all the data at every second level but you have to process there is cost associated with it as well right so and and just not cost there is again cloud is finite so even though CDN are finite even your uh the the cloud service providers also have finite capacity so you have to take into account that do you want to prioritize your playback systems and content Management systems or do you want to prioritize data collection so all of this has a priority and an association to say what what tier of the system falls into and that tier design defines how the system degrades so there's always a degradation framework which is to say that okay if if the match is not huge then yes we'll collect more optimistically at every 15 seconds or every 30 seconds or whatever that number is but as the game starts to uh become hot or or more and more traffic starts to come in then you would at runtime start to degrade systems which means that uh degradation happens in different shapes and formats you'll probably reduce data which is coming in you will change the interval at which you're collecting data you would start sampling uh all of those parameters start to come into the play and all of this is part of the design right this is all the exercise we do before the game uh on resource planning on capacity planning like capacity planning is a huge exercise and and that also decides what are the cut off points right sometimes you will say that okay I need to support up to let's say x million users or 50 million users or 100 million users that means you have to work backward from there and say that if I have to support 50 million users what are the P0 service requirements and then you say that okay this is the residual capacity so the P2 Services have to operate in this residual capacity which means that I can do certain things I can't do certain things so on capacity planning you you already kind of outlined how it works but you said as a big exercise what does a good capacity planning look like and and and you know like and why does it take so long because you know what what you describe here honestly yeah you know just do do the maths do this that but obviously it's more complex right so uh capacity planning is a very complex exercise in fact uh for for most of the video streaming the capacity planning for the next year starts at the end of the first like the previous year oh because you have to lay down infrastructure you have to work with providers to ensure that their their data centers have to scale up their Network have to scale up their power requirements have to scale up all of that has to be taken care of it's not that simple so there's physical infrastructure being involved right we not you cannot just like providers will not go and scale their infrastructure sure they have a 10% plan 20% plan but if you're operating at a much more intensity and adding more year on year then you have to work backwards from there so that is one aspect of it the other aspect is uh during during the tournament there can be overlap with other events so there might be reservations done by other companies like there's a big eCommerce sale going on let's if there's a Black Friday like Black Friday is not there but that so so so the point is you have to look at the resource you have to figure it out you have to ensure that you blocked enough resources uh for the time uh for the time period and the the other complex pieces how much to ask every year the the the of the game changes there is no set pattern how many people will come in so there there are certain models we will look at it and and see that okay this is the user traffic which has been coming in which was in the previous year and that's we have seen platform to grow by X percentage so we'll apply some modeling to it and figure it out what should be the next year's optimistic number uh actually not an optimistic number it's a very pessimistic number we operate with a very pessimistic number and then work backwards from there so uh so the the reason it's complex is because you have to work with so many providers understand how much resourcing and all of that is there in which means they have to go back within their engineering teams get that numbers share that number with us and just to just to make it really concrete are we talking about how many virtual machines you'll need how much bandwidth you'll need how much like what are the the things you know the the things behind the resources that you know you're going to have a list of like oh we need this much primar it boils on the same core system metrics like compute Ram dis and network yeah uh all of that boils on to that like every tier has those numbers which you have to figure it out uh when you're running uh servers it's going to be number of vcpus generally that is a limiting factor disk and Rams are not limiting in in Cloud uh Network becomes an interesting problem for streaming part of it not the a actually it becomes interesting on API API side also uh because yes video video you are doing like tens of tvps and probably more than that but API is also consuming a significant amount of uh data download so and and those networks are designed differently than the video networks because video networks are more designed from a heavy caching perspective that the offload should be super high but apis is mixed traffic there security there's different tiers of security there is pi data flowing in there is your firewalls and so on so you have to look at all of those parameters uh when I say parameters again it's Network consumption compute resources at each H figure out what is the least common denominator and work work from there yeah and then you did mention before that the cloud is finite uh now I didn't hear Too Many people saying this because usually when you you know like I'm a startup I'm building something that like I will be aware of the cloud is expensive but I'm never going to think that it's fite because as far as I'm concerned I can always pay more to you know scale up in your case that's not really the case right so and and this is because it's so you're so large right this is so large yeah yeah the the use case is so large that like year on year the growth is so much that it's very hard to preempt right and you're as I said that you're doing pessimistic planning because you want to support best best of best uh the infrastructure exercise the the the growth of infrastructure is relatively hard because I I I'll give you an example right so let's say in a in a city like Bangalore Mumbai there is a finite number of uh tvps available uh the cap cian capacity available now if you have to add more capacity the uh the providers need to purchase real estate they need to add links to it they need to deploy uh servers to it the servers need to be procured imported from another country they need to be configured onboarded as part of the network and so on so it's a slow process it's not it it can't happen overnight yes so basically what you're saying is let's say you as a you know goo or any other streaming service is like okay we think that that this game that's going to be in 6 months last year in Bangalore 5 million people watched I'm just saying a number this year we think it'll be 8 million because of our projections uh we think the current cian capacity based on our calculation is not sufficient so now you're going to go to the CM providers and try to make a case for them to invest that's that's pretty hardcore not even engineering this is like you know it goes beyond engineering yeah and and and you would you started a discussion saying that hey why do you why don't you Cote because I feel this is a much more harder problem than actually coding it to be very honest okay I'm sorting to I'm starting to understand it's important to stay connected but it seems like this seems just as important it's probably more important to do this thing and you know go to the vendor you know sit in a meeting with them and explain to them why why you need to order those servers from wherever yeah another another common question I'm asked on this subject is that hey oh sure Bangalore doesn't have Cap City why don't you go and get it from Delhi it's not that easy uh anytime like if if you understand it's a it's a caching infrastructure it's a tree right so Edge right it also Edge yeah so it needs to be closed the moment you the the more Upstream you go into a branch and try to get it the the the capacity available over there is finite so if if I were to say that hey let's root this excess of 2 million traffic let's say that's consuming about 2 tvps there has to be that amount of backbone infrastructure present in the country to be bble to rout it to another setting this episode is brought to you by augment code wish your AI copilot actually knew your code base augment code is different augment is built for professional engineering teams working on complex systems not hobbyists building to-do apps other AI assistants struggle with context augment understands your entire codebase architecture patterns dependencies and all we're talking instant understanding of systems with millions of lines of code in under 200 milliseconds teams like webflow lemonade and Kong use augment to ship better code faster no more Contex switching no more generic suggestions just an AI that works the way expert developers do fast and in flow plus with Enterprise great security and zero training on customer code you can trust augment with your systems start building with augment in your favorite IDE and see what it's like when an AI actually understands your code try it free at augment code.com prag itic so we already touched on and this this is I think one pretty good example of like an India APAC specific enging challenge what other engineering challenges have you seen that are pretty specific to APAC you know like if when you're building a system in in Europe or us maybe you don't even need to think about it but in your case you know it gave you a lot of headache so U capacity is one aspect of it uh the other comes is the mobility uh India is a country which is mobile intensive people don't watch it on TVs or laptops or this thing right India is a country which kind of had a Lea frog in terms of generation right people were not using any computers or any mobile devices started onboarding to mobile devices directly they did not see laptops or tablets or connected TVs and so on connected TV is still picking up but it's still not there at the level at which it might be there in the Western countries with Mobility comes the problem of that when the device is constantly moving it is changing radios and networks and cell towers and all that cell towers and all most a lot of audience are like the taxi drivers or the cab drivers or people who are on the move or getting off the office the the event happens in the evening people are getting off the work theid phone and watching exactly right like most of the ads used to run like that uh so people are watching on the move so that is that becomes tricky sometimes you're connected to a 5G Tower and then you go to a 4G Tower and 3G Tower and so on right like you're on a train and you're moving moving at a speed your towers are switching that is one aspect the other is a battery uh most of the games happen in the evening right and and if you started your day in the morning you probably charge your phone you're off the work you're not getting it out so your battery so that becomes an interesting problem to also tackle right like how do you ensure that you're not doing so much of computation on the device that you end up burning of the phone phone wow I I I would have not this would have not been on my top list of things to worry about but I guess there you go you need to worry about it you need to worry about it like your your streaming or your downloads or video consumption cannot eat up the phone battery otherwise the user will not watch so how what do you do around this like what kind of considerations do you have so uh lot of how much background processing you're running uh what's the phone brightness you're using what's the volume levels you're using uh what is the color intensity all of these parameters kind of the profile of the the video kind of controls how much battery consumption is going to going to happen uh how frequently are you pulling how frequently you're downloading uh what kind of complex uh encoding algorithm you're using so if you're using h264 versus h265 the Codex right like HC versus ABC each of the codc have its own complexity if you use a more compression then it means it will use more compute uh which means it will consume more battery so so do do you have this thing or where it's theoretically possible then to have on the client Side based on the battery level request a different type of stream you could do that Tech we could do that but uh the point I'm trying to make is that all of these parameters are taken into account when you're these are not probably decisions that you make at runtime because at runtime you would be more focused on like you would want to have as less variables as possible uh so this these would be considerations which you'll probably do when you're releasing features or when you're adding more the same when you're building it yes when you're building it because runtime like I I would I would want to have only one or two parameters which I'm at runtime uh more than number of settings it requires more coordination verifications testing and all of that right sure you can have automation but trust me automation doesn't work after a point like you need to still have some level of manual testing involved okay so an event is on and you know it's you knew it would be a big event and you know more and more people are are joining in and you know the load just keeps going up and you know eventually you said a world record but as this happens what how can you scale things up how can you both plan to scale up a system and what can you do on the spot and I'm assuming horizontal scaling I'm sure will be a thing but for example is vertical scaling basically having bigger boxes bigger devices also an option or a practical thing to do so U I think the vertical scaling I I'm I'm trying to think how much of vert see vertical scaling only happens on the database or the database layer to be very honest right computer is largely horiz computer is largely horizontal in nature uh uh we we try to like for example if you're using a actually databases have also now become horizontal in nature you can add more boxes and you will start to split and Scatter data but you would want to avoid any kind of that like scaling at the data layer uh at all when you're operating at load so most of your databases and caches are preemptively scaled pessimistically to a higher number uh compute is what you're controlling on the fly in fact Auto scaling doesn't work uh so you have to always is auto scaling doesn't work for this kind of a workload can you tell me more I I I I would have thought you know the whole point of Auto scaling is uh that it should work but obviously under the hood is a bit complex can you can you like give us a bit of a you know explainer 101 sure sure so why does Auto scaling not work yeah see the auto scaling works by uh adding a box at a certain rate right and and default Auto scaling which is provided most of the providers have a like a cool off periods and and bunch of parameters which is not entirely flexible so what happens is that when the game starts or when there is an Innings break and the audience come back that's a sudden search mhm and in our experience the Autos Skilling has never been able to respond to that in a very Speedy man so let's say before the so I'll give you a classic example right let's say before the Innings break you were operating at a 20 million concurrency I'm making that number out right now Innings break happened 15 million people chose to drop off 5 million straight back now when the Innings recover you would have 20 million people coming in but the auto scaling is not going to respond with that number in mind it will be like Oh I'm seeing this much of traffic RPS let me add more boxes most boxes but you have to understand that there were already 20 million people which are on the Stream So after the Innings break you have to go back right so if you use Auto scaling you'll be screwed so that's why we have our we used to have our own custom scaling systems and custom scaling providers and that will look into a uh concurrency concurrency is a metric which we would align to and say that this is the golden metric to scale against we will build models to translate that to every system Services it's because it's easier to talk in that language than actually talking in another language like every system having a different metric to scale against as complex than saying that everyone in the company is scaling against a common metric and we are we are very uh solid and sure about that this metric so you can you can compute and saying that okay I have Xmen number of congruent users which means that this is the traffic I would expect on XYZ services this is my user Journey 100 people open the app X Y number of people go onto the playback page then Z number of people open this so there's a entire formula which you can build out and then you can from there map back and say that okay this is the amount of request you would see and there would be certain systems I'll give you an example when the Innings break happen again Innings break or some kind of an event happens let's say a key baller or a key batsman a key batsman get out right what people do is people have a tendency to press back button the moment they press back button they go press back button you mean the exit stream or they want to just rewatch it again no they just want to go back like like close close the stream and go back to the homepage oh gotcha so exit like either the people will just swipe up and close the app but a lot of people will press a back button the back button will take these users to the homepage and homepage has a different set of API to be called which was not seeing the traffic at the rate it was seeing before Oh and they're going to have a big a big spike exactly so so all of that intelligence can't be taken like answering your question on why Autos scaling doesn't work is because of these uxs and user Journeys which which cause problems so we have to so basically you're you're saying that the whole the whole company you know or the team will know that this is our concurrent user Target and every system just needs to prepare to potentially in a single second get that many you know requests if you know people like in your your example was good hitting the back button and fetching the homepage yeah yeah so so so we have custom scaling and and and so all of this isn't present so that's how the systems are taken care of these custom scalers are constantly watching at the concurrency number whether the concy number coming and being reported is accurate or not is a big question so you need to have triangulations and proxy to figure it out whether the the concy number coming from the system a is correct or not because if that gets messed up your entire scaling gets screwed so so so you've you've given us already a very deep dive into to video streaming and we probably just only scratched some of the surfaces but how how did you learn all of these things outside of on the job are there additional resources that you do or you kind of educated yourself basically if someone wants to understand a lot more about live video streaming what are recommendations you would you would give them and you know and they don't have the opportunity to work at a company that like you did just yet yeah so this is I think this a nice part and the sad part you cannot Google our issues there's no stack Overflow for our issues or the things we experience to be very honest uh some of the things around the video is there's plenty of good resources around the video uh there there there is a GitHub link I'll probably find it and share it with you uh that is a very detailed explanation of how video works it starts from uh text and then goes onto how image was how image is done how image compression is done and then how image becomes a video uh frame by frame description there's a very good good good uh document around it and uh so I can share that so that is something which I learned rest all of the stuff is on the job like I don't think these problems are something which are available on the internet anywhere and on on the job what were things that helped you become you know a lot more experienced reliable senior like like what what kind of approaches or mentality helped you to soak it in just really quickly so uh I think the for me it has been uh planning and like the the entire exercise of uh uh going through and drills so so we we we used to run something called as uh game day okay uh now this is a basically a simulation of how an actual match is going to be we we used to you were like s simulating this highest traffic event ahead of time and and not just just traffic we are simulating the entire operating protocol so we'll have oh wow really you did that yes yes so we we'll say that okay the match is going to start at 7:00 a.m. 7:00 p.m. let's say okay now every system is supposed to scale before hand there's a timeline to it there is a checklist to it and on so on no matter whether the teams are ready or not we will start the live stream and we'll start sending traffic okay because in a live event scenario you're not you do not have a green light you're not carrying a flag to say that okay the train or the traffic can come in right the game is going to start at 7:30 p.m. it's going to start at 7:30 p.m. no matter whether you like it or not so we used to to do those kind of a drill and in those drills in in this drill just so I understand were you simulating you know like synthetic traffic or was it more that every team had to scale up you know like do whatever they synthetic traffic we would generate like we would generate terabytes of uh yes we would there was there was a team that was was spinning up you know like pretending to be users so we we used to have a framework to simulate that kind of a traffic we used to there's a platform called flood. I which we used to use which will actually flight it and we will not give teams access to that dashboard M yeah so so like they wouldn't know what's what's hting them they don't know what the traffic is going to come what kind of a pattern is going to come and anything because that's the exact set of our production you don't know what is coming and coming your way you need to have your systems being able to tell you whether they are in a healthy state or in a bad state so all of this is like like learnings from there learnings from there and and and most of the learnings like i' I've been into this this industry for about like seven plus years right and uh we it's not like we reached that 32 million number in a single year we we've done smaller numbers and figures so over the course of time also you have learned and and while building the systems we learn from the well and and one thing that I kind of I appreciated that you told us is you know I was very focused and I think everyone is focused on like oh when you set the record but you said something interesting which was it was a 70-day marathon and the beginning was actually a lot more challenging you know when it wasn't setting records but it was already the the traffic patterns the you know dealing with all of those things so I think it's it's a good reminder that you like sometimes it's not the the thing that gets the most publicity that is the hardest work it's the work that you put in before no the first the first week so it's a 70 to 80 Days by the way not 7 to 8 Days wow those of us who are not who are not following cricket and you know we're just used to you know either football or other games where it's just a lot shorter it's a very long time period it's it's a very long time it and it can get very intense on people as well like it's very stressful every day there is a match some other days there are two matches like back to back and and and two matches are whole complicated scenario because how do you switch from one match to another match traffic shifting and all of that becomes super complex it's not that straightforward from an operational point of view so uh yeah the first week is very intense the days before the first week is very intense uh but after that you start to settle in uh we also try to aim that uh every year on year we have improved on our operation protocol automated a lot more so I I think someone is telling me that so I move from hotar to GSA right at hotar uh the level of automation that R is like people can actually be on a coffee and the live stream match up up to 20 to 30 million match can just run without people oh well you know over time it it gets easier yeah so for for engineers who would like to get better at architecting systems what were things that that that that helped you uh in between you know building them coding them being in in these meetings were were there you know are and I'm talking not really about live streaming now this is more like generic software engineering would you have advice on activities they can do uh behaviors they can pick up that that helps you just learn faster so uh there's certain things which I have followed I can share what I do and probably that can if if that helps so I think one of the things around scaling and building complex systems which I've learned uh the hard way I would say is that you must understand anything and everything will which can fail will fail Murphy's Law okay it's it's people are very overconfident about their systems they're like hey we have done everything we have scaled databases we have done transaction systems and all of that and so on and so on they don't they underestimate okay and and I can tell you like I have countless example of where people have come to and said that hey r are scaled up everything is fine tuned this is good and I have broken the their systems and load testing you you're good at making friends I can see yeah so so so like it's it's it's people don't like me and doing just pulling your leg Yeah but but I'm I'm just saying that that that's one core principle which I feel is super important the moment you get deeper into that principle you start looking at everything within your system every decision every configuration everything because you know that that configuration at some point can miss even the smallest thing is database connection pool sizes the number of connections you're making to uh database the number of servers like for example someone said hey our connection full size are fine tune we can scale infinitely now but there are there's a cap number of database connections you can make to a database right so you need to ensure that so when you're scaling when you're scaling compute the compute can scale up to a finite number because if it goes beyond that it will start choking databases then your requests and queries will start to get queued up which will increase waiting times uh and and so on right that is one right the other is metrics and measurement it's very important to have very detailed measurements in place for your systems to be able to figure out where the problems is uh kind of a water flow and this thing like uh there there's a common Paradigm problem I've seen with every engineer right so you know how you measure response times right now people go and measure response time by looking at the uh on the load balancer that this is my request and this is the response time now even in the load balancer there's a request the the way load balances there's a request CU you request Q request gets cued and then there's a like a CR or like a worker which will pick up the request and process it now the way you measure response data in C is uh from the the the time you CED the request to the time you return the response that is your response latency now what people will do is people in the process processing function which is listening to the request we'll put the timing function to start of this start of the processing function and end of the processing function which you will get response time but they're like oh my server is working all fine everything is good my respon you Noti that it's sitting in the queue sitting in the queue and and this is the same problem which can happen down the stack so that is the level of detail you have to go through when you're designing metrics and measurement uh yeah I guess what I'm hearing is it also really helps for you to just don't you know like go deep and understand exactly how it works cuz you know for a low balancer like it's kind of easy to think oh it just balance is load but once you go in there and understand like how is it implemented what kind of Q do it does it use how big can it be all of the internal latency Etc yeah good advice thank you yeah so that's one third third thing is I hate apms I've never used apms in my life application Performance Management yeah yeah so a lot of people say that hey why don't we put APM we can figure out stuff on there now apms have a like at least this is my biased openin I feel that it makes you lazy sure it can help you get all the metrics and everything in in place but people become lazy people don't get into the deeps of their system they're not measuring they're relying on APM to do the right things for them and sure apms have evolved over the course of time you can argue I'm sure if this video goes out people have a lot of views on it but uh the point is that it I feel like it makes you lazy not having APM forces you to go into the details forces you to look into the every corner forces you to measure every aspect of your code measure every performance of your code and find un it well thank you and to CL close it off we'll just do some rapid questions I'll fire a question and then you just go what what comes to it yeah what programming language are you the most productive with and which one is your favorite so uh for for Building Systems and Services I use Java uh otherwise for scripting or a script junky I'm I'm very familiar with Ruby and Ruby is my go-to like it's it's for me it's like English like like you're writing English and it just works so Ruby is my go-to for any scripting work and kind of stuff nice it's a friendly language I mean most people will use it because of rails or to get with rails but about by itself it's a nice language no yeah I I started I started with rails actually my career started with rails but then I I got hang of Ruby like it's very easy like if you have to process some data like you have to analyze something you write some model you just write in Ruby like get all the jams start working you just get it done uh what what are some blogs or websites that you read to keep up to date so I I watch I read a lot of Hacker News uh Hacker News has a lot of fun content and latest content over there then LinkedIn is a good source to be very honest like uh somehow the network of people I have is like they keep posting a lot of interesting stuff uh so I I get a lot of good content from there uh apart from that uh I have keen interest on latest Technologies so like CS events or some of the the events which are happening what are what are the talks being presented what are the things being spoken about is something I keep a tap on awesome well this this has been just very very interesting to go into like all the complexity of something seemingly so simple as live video streaming thank you very much for sharing all this and for being on the podcast yeah thank you thank you for inviting me like it's been a wonderful conversation sir thank you to ashutosh for taking us behind the scenes of building a large scale live streaming service and one that set a world record at the time check out the show notes below for references we mentioned in this podcast and to see related deep Dives you can read in the pragmatic engineer if you enjoyed this podcast please do subscribe on your favorite podcast platform and on YouTube thanks and see you in the next one

Summary

Ashutosh Agrawal, former architect at Disney+ Hotstar, explains how large-scale live streaming systems are designed to handle millions of concurrent users, focusing on the technical challenges of real-time video delivery, capacity planning, and the trade-offs between latency, quality, and infrastructure complexity.

Key Points

  • Live streaming involves complex architecture with multiple stages: source feed, contribution encoder, cloud encoding, distribution encoder, and CDN delivery.
  • Adaptive bitrate streaming allows clients to dynamically switch video quality based on network conditions, managed by the client-side player.
  • Latency in live streaming is caused by encoding (GOP), client buffering, and CDN segment delivery, typically resulting in a 5-10 second delay.
  • CDNs are crucial for caching and distributing live streams efficiently, but their effectiveness depends on proper TTL and configuration tuning.
  • Systems must balance user experience (smooth playback) with infrastructure costs, making trade-offs between latency, quality, and resource usage.
  • Capacity planning for live events is a year-long process involving physical infrastructure, provider coordination, and pessimistic scaling models.
  • Custom scaling systems are needed because auto-scaling cannot respond quickly enough to sudden traffic spikes, such as after a match break.
  • Game day drills simulate real-world traffic and operational failures to prepare teams for live events, including unexpected user behavior.
  • Monitoring focuses on leading indicators (buffer time, play failure rate) to detect issues before users complain, with sampling to manage data volume.
  • Engineering challenges in APAC include mobile-first usage, network switching, and battery consumption, requiring optimizations for low-power devices.

Key Takeaways

  • Design live streaming systems with a clear understanding of the trade-offs between latency, quality, and infrastructure cost.
  • Implement custom scaling mechanisms and conduct game day drills to prepare for sudden traffic spikes and operational failures.
  • Use adaptive bitrate streaming and client-side metrics to optimize user experience while managing server load.
  • Prioritize leading indicators in monitoring to proactively detect issues before they impact users.
  • Plan capacity well in advance, considering physical infrastructure constraints and provider limitations.

Primary Category

AI Engineering

Secondary Categories

Programming & Development Computer Vision

Topics

live streaming video streaming architecture high concurrency systems adaptive bitrate streaming content delivery networks load testing capacity planning APM systems real-time streaming mobile streaming

Entities

people
Ashutosh Agrawal Greg The Pragmatic Engineer
organizations
JioCinema Disney+ Hotstar Google DeepMind WorkOS CodeRabbit Augment Code The Pragmatic Engineer
products
technologies
domain_specific
technologies products events protocols metrics

Sentiment

0.70 (Positive)

Content Type

interview

Difficulty

intermediate

Tone

educational technical analytical entertaining inspirational