Code security for software engineers

What are code security basics that every software developer should know? Really know and understand what your code is doing. Maybe that sounds a bit silly and obvious, but that's how security experts find basically security issues in your code. I'll set up an MCP server that says it does something, but secretly it does something else. It runs locally. Boom. with agents and giving away more control. There is a new threat here because it's not just about the dependencies you're using or your machine security in general, but also making sure that the agent you're using is doing the right thing. How do you think AI is changing code security and also security in general? Today, what we are seeing is as software engineers, what should we know about writing secure code? To answer this question, I turn to Johannes Doss, who has been a security expert for 20 years and is currently the VP of code security at Sonar. In today's episode, we cover code security basics all software engineers should know of. Common code security tools worth knowing of and using like static application security testing and more advanced tools like software composition analysis. How AI coding assistants introduce new risks and what we can do about these and more. If you're a software engineer looking for pointers on how to make your code more secure, this episode is for you. This podcast episode is presented by StatSic, the unified platform for flags, analytics, experiments, and more. Check out the show notes to learn more about them and our other season sponsor. So, Johannes, welcome to the podcast. Thank you. A big pleasure to be here. >> So, we're going to talk about cyber security today. I wanted to get to know how did you get into cyber security and when was this? It >> must have been like uh 20 years ago. I I remember I got hacked basically. My computer got infected. Uh I think it was the SAS one back in the days and uh I was super frustrated, right? And then also super intrigued like how could someone get access to my computer? And so that led me into you know playing with security things like Trojan horses and that stuff at school time. And um and then I moved to Bh in Germany where you could study it security and that was exciting. And so uh they played capture the flag competitions, right? Those are hacking competitions where university teams connect online in an isolated environment and they try to hack each other to to to get points. And uh uh I got really obsessed playing those competitions and and that was the best learning experience for me you know and uh this led then into you know me getting into professional penetration testing uh writing tools to assist uh the the vulnerability hunting and and this led then into a startup which got acquired by Sona where I am today. How can we imagine penetration testing? So penetration testing is um yeah simulating an attack basically right so you are um kind of like a company hires you as a as a hacker basically and you you have to find out vulnerabilities in a given you know time scope and and and scope of the application that you should test and it was just like a natural move if you do that as a hobby with the hacking competitions right uh where you just do that for winning points in games to do then uh you know kind of like earn money with that as a student uh to to also look for security issues as a professional >> and I I understand that you know like yes you can h now hire penetration testers right as as the company I can hire teams that do this but did you do some of this did you do professional penetration testing? >> Yeah. Yes, absolutely. For a couple of years uh doing this as a freelancer and uh for big companies basically right uh uh looking for security issues they they have and always trying to to get into um their network um typically by by exploiting vulnerabilities and software uh applications they're running and then documenting this right not going further not destroying something doing something malicious but basically reporting then uh this is how an attacker could get in so they can fix it. How does the penetration test look like from when the company says like, "All right, come and penetration test us." How do you actually go around? Do they actually give you access to some of their systems or or you need to just assume no knowledge? How does that work? >> Yeah, the different types, right? So, there's blackbox penetration testing and white box meaning blackbox you don't have access to anything. You you treat it as a real attacker like with no knowledge. Yeah. you you um you know typically you have maybe like a web application running and and you go um and look at it from the attacker perspective and and uh play around with the application from the outside trying to imagine basically what could be the code behind that application what could be the code doing that I'm seeing from from using that application and then trying to figure out what could be vulnerabilities here right what could be something a developer forgot to do and by experience you you learn a bit like what are typical mistakes and and then you you go from there. Uh trying to always you know exploit something where you can uh steal files or or get access to the database or something where there is some data something you know that could be sensitive to the business uh that that is then security critical. Do you bring your own tools there? Do you have like methodologies like I I guess you know like it's very basic but I I do know of the concept called port scanning where you write a software where it tries all the different ports. It sends messages and you hope that if uh they configure it a server incorrectly or or or maybe correctly you can get through. But what what kind of tools that do do you come with you do use tools in in in as a penetration tester? I think mostly for uh kind of like mapping out what's available right that's I think the biggest part you automate um so you don't you know test for all the endpoints or ports as you said but I think then once you found um a good landscape of what's out there you you go in manually at least I used to do uh that manually uh to to uh you know try to to poke and and see what what breaks when you touch it and when when when you play around with it and um That's also the most exciting part I think >> you know in in the real world we're going to be sitting you know as as software engineers inside a company and we're going to be building our software this might be services this might be apps websites and so on and there's going to be attackers uh outside it's going to be like script kids or like you know like like people just like malicious like playing around poking around and there's going to be professionals as well who will be uh trying to get financial gain whatnot. Now inside the company who should own code security you know the in the industry today what we're seeing is um that this is a shared responsibility basically right uh so we talked about the penetration tests and typically security teams are involved in that um and then there is still developers um you know adding and writing code right and I think predominantly in the industry the the view is that the security team should should own all that security right it's it's in the name of the team but I see it quite the other way around. I I think you know uh every software vulnerability basically manifests in code and and developers are the only ones writing the code in in organizations and changing the code, right? And they're the only ones who can fix security issues. And so I think they uh they should own all those code security issues, right? They should own basically their share of the code and also the problems related to their code. And I think that's also more realistic today because you have great education available and great tools available for developers. So uh I think that ownership should be with developers on on the code security problems. I hear what you're saying on devs should own code security but then why have a security team or at what point should you have a security team? again you you now work with a lot of different companies and sizes and previously you also worked as a as a as a security engineer. At what point do do companies bring in a security team and when they do what is their role? I'm kind of like look if there's a security team I'm like come on they they that's their name right like as a developer like security as a whole like to make your your service 100% secure that's pretty daunting. >> Yeah. I don't think that security teams are useless right not at all. I we we talked about the the penetration test that's that's typically something run by security teams and so I think the the field of you know application security is just much more broader than than code security right um so you have maybe compliance requirements uh that that you need to look after and and some you know organizationwide initiatives security initiatives or there are vulnerability reports coming in from a penetration test or new threats um are are available and so security teams I think should look at this broader application security field and and um it's it's good to have a security team for that and the larger the organization uh gets right you I think you need a security team I just think that when you write software and when organizations you know deploy software the security team shouldn't waste their time and and looking into every single security issue that happens during development right and I think that part should be fully owned by by developers right I think it's u a waste of time to look at every single new crosshat scripting issue again and again and try to exploit it and build some some fancy exploit and risk assessment where SC developers could just you know fix issues as they as they code and move on. And that also allows then security teams to to have more time to actually focus on bigger problems on on problems where they can really bring in their expertise like cryptography or or authentication logic or things like this where they can then also be very helpful with their expertise for developers. So I'm kind of hearing some similarities between the kind of feature teams or program teams and platform teams where you know platform teams typically build platforms where engineers can build on and they have a specialized expertise. It might be a massive database platform like for for a large data storage company and then engineers they kind of use the APIs but they don't need to know all the details but when they do they can just go to the platform team saying hey how do I store you know like like two pabytes of data and they'll be like okay here's different ways you can do it. So do I understand correctly that you're kind of saying security teams will also be this like specialized expertise where they can help you with a bunch of stuff and they will try to do tools as well for for devs to like self-service or or you know share common things to watch out for. >> Yeah, exactly. I think it's a good comparison right um definitely helping um but also leaving you know the majority of ownership there with developers so they can basically have security as part of the process of development and not just something that is you know attached to or ad hoc run um and whenever the security teams decides right I think it should be really part of the process of development and and that must be then owned by developers to really you know um engage in security issues and and fix them uh because that's what makes you secure in the I will challenge though that historically I don't think software engineers owned security or were expected to own. Can we talk about how this changed over time over your 20 years how how you've seen changes happen because I do feel it's shifting left on onto developers but what was the historic context here and and what is changing now historically I think it was clearly owned by security teams right so if you imagine 20 years back it was all about compliance driven by compliance and then also the software development life cycle was uh a lot slower than today right you would have your quarterly release and then there would before that there would a security team come in and and do a final audit, right? And then you would release and and today we're just moving at a much more faster pace uh releasing a couple of times a day or per hour, right? And you have AI coding assistance and so we are moving at um a lot faster now. Johanna just talked about how engineering teams today are moving at a much faster pace than before, especially teams using AI coding assistants, which are most engineering teams honestly. Here's something surprising though. As dev teams build more products and features faster than before, coordination is increasingly the problem. You now have more Slack channels pop up, more customer feedback to deal with, and you often end up switching between different tools to decide what to build and how to build it. This is where a seasoned sponsor, Linear, can help dev teams stay focused. Sierra is an AI powered customer experience startup. They were preparing for the next phase of company growth and wanted to find a tool that can help a larger team move quickly without slowing down. They chose linear as their operating system of the company and wired all of their work into the platform. Today, project updates in linear ripple through Slack. Customer requests are logged in linear and stats from Linear are pulled out into company dashboards and into the slides that Sierra shows off as they celebrate wins at all hands meetings. Despite CR being in hyperrowth, everyone understands what they're building, why they're building it, and how the work is progressing. What I love about Sierra's approach is how they didn't set up Linear wanting to know what individuals did on a given week. They wanted to know what was accomplished in service of which projects. This is the beauty of using linear. It helps hyperlow companies stay focused, spend more time building, and less time coordinating. If your team cares about tools that remove additional work for the team instead of adding extra to it, check out Linear at linear.app/pragmat. app/pragmatic. And now let's get back to fastmoving engineering teams and security reviews. You cannot have this disconnected security review that you do afterwards. Um and so in the industry what's also changing here is I think the tools that you need for this. You know historically the tools are built only for security teams, right? Um and and then with that there is a different product philosophy that that comes with uh security products because as a security auditor basically you want to know about every single potential issue right you want to turn every stone and and and better look twice than never to find out what's you know what could go wrong and then now if if you apply this to this new pace of uh of fast development that doesn't work anymore right because you can't get interrupted all the time with findings. It's it's it's too noisy, right? I I like to compare this with uh you know, if you drive a car and you have a security guy in your passenger seat and he would scream and yell at you at every single thing that could go wrong all the time. That's uh maybe interesting the first 50 meters but then gets super painful and annoying. I think with that we see a change in the industry that you know I think developers should own code security issues but also the toolings around code security issues must be owned and and for developers um and then there are other application security tools and and application security as a broader thing that should be still owned by security teams. So, so far you've mentioned two different things or if I caught it correctly, one was code security and then application security and you said that application security is a lot more than code security. So, it's a you know like it's a supererset of of it. What is code security? I mean this this is one of your your expertise as as I understand but how do you define that? You know, where does it start? Where does it end? cuz it it does sound like something that as a software engineer we should be aware of it, right? >> Yeah. For a lack of a better definition, I would say it it it's basically code that is you know free of security issues um free of anything that you know can be leveraged by an attacker to uh exploit your application and then you know get access to some of your data and and put your business at risk. But with that simple definition, I think the the the complexity is a bit you know what are security issues um when we say code is free of security issues and I think here the we think typically as vulnerabilities right SQL injection is a vulnerability and and I think it's much more than this right if you think about bugs like a I don't know null pointer exception where your application crashes then your application is in in an unintended state and this can be abused by attackers in some scenarios or maybe a more obvious example would be you know memory corruption problems in CC++ where as an attacker you can uh you know do a buffer overflow um and then execute code on your server and so I think here the the lines get more blurry and and um then there are also more logical things like if you um write an application where you can upload a profile picture you you shouldn't forget that you know an attacker couldn't be you shouldn't be able to upload a shell to your server and those kind of things so I think we are real realizing izing that code security is much more than just vulnerabilities and and in the end those are just bugs, right? Those are either things you forgot about in your code or those are misspecified uh things. And so it's it's basically technical depth, right? It's it's not so much different than other bugs in your code that you have in your backlog and you just need to fix as a developer. And uh I think from that perspective, it's also more clear why that's a developer problem. Um and and should be owned by developers. >> I understand you. we should be owning, you know, code security, but it is it's a pretty it's a pretty mushy subject, as you say. It's it's a lot of things from from the obvious null pointer exceptions to may the maybe not as so obvious buffer overflows, which are a little bit harder to work with if you're not aware of it. Of course, sometimes you use languages and and that solves for it. As a software engineer, what are code security basics that every software developer should know in in your mind? You just mentioned buffer overflows. I think that the the key here is you know for developers in those basics um they need to only understand uh how to prevent those issues and how to patch them. They don't need to understand the full exploitation techniques to run a buffer overflow attack right like you can patch things uh without necessarily needing to uh run the full chain. And I think some of the basics you should be aware of I think the first thing that comes to my mind is to to really know and understand what your code is doing. And maybe that sounds a bit silly and obvious, but that that's how security experts find basically security issues in your code. They try to look for corner cases and edge cases that you may have forgotten about um or overlooked. And maybe in the time of uh you know AI accelerated development and uh using libraries and open source code that's not so obvious anymore uh to say that we all the time know what our code is doing and how it interacts with our code base, right? So I think one thing we we can do here is to really look through the through the eyes of an attacker at least when working on security sensitive features. What could an attacker do here and and how could an attacker u modify something here right? Uh the industry has been talking for a long time about this input validation input sanitization right maybe that's a good example here where never never trust the input right any input. Yes, exactly. And and um this can be also a bit more subtle, right? Like if if you upload a video to to YouTube uh and someone pauses with their application the YouTube video titles, then that's that's that's input basically, right? Because you you modify the the the YouTube uh title name. But then really making sure we think about this um where is all that you know get parameters, post parameters, cookies uh external input used and where am I using this in my file operation which which you know could be modified to open arbitrary files from attacker or you know traditionally in a SQL query you have a SQL injection in your HTML response page you have cross-ite scripting and and and those typical things and and I think we are still seeing those uh issues right they are the most critical ones and they have been around for for for a long time but we still see those issues. And then secret leaks I think is a is another you know basic thing that is involved in in many popular data breaches you know where developer hardcoded basically maybe just for testing purposes temporarily added like hardcoded into the code like a little API token. So, so like the secret secrets like API token well like all sorts of tokens, right? That that should typically live in your like local environment variables. >> Exactly. Exactly. And it can be API access tokens or cryptography tokens or or passwords for the database or whatever. Attackers nowadays, they crawl the public GitHub repo repositories, right? And and and steal those secrets and try if they to see if they're still valid. Even if you delete your code, right, it's in the git history and it gets passed. So I think that's another basic thing uh we should be uh aware of and and not do and it still happens um because we are humans right. So these were the kind of I guess the basics to cover like as a developer. Is there like a checklist I I could go through cuz again you listed a bunch of them and depending on your level you either say these are super basic or like what are these things but you know the the parameters SQL injections secret leaks and some some some other things like do do you have a go-to list of like you know go through all these things and make sure you understand each of these things and you can check your code or know if they would be applicable. It changes a bit over time also. You know, we are evolving and and we are learning more about um certain security issues and and certain types of issues we we do less and then maybe new types are becoming more prevalent. Maybe also because how the landscape changes or how development changes but again I think those those basic ones we talked about have been around for for a long time and and and they they we don't we still see them. They don't go they apparently they don't go away. And what about the more advanced things that could go wrong? Because these were the basic ones, right? Like I think we just covered the basic ones, but you must have seen some more exotic security issues that maybe would have not been as easily preventable or a lot more creative ones. So there are more advanced things in in the terms of maybe expertise what is needed. Um if we talk about cryptography things, right? If if you know you're encrypting something and an attacker is still able to decrypt it or there is some you know authentication logic or access privileges or password reset functionality is is also something where typically you know often things can go wrong. I think the the key as a developer is to for those more complex features, security features is to not try to reinvent the wheel and just uh use um you know solid frameworks or libraries, something that is vetted by the open source community and and trusted. And I think here again a security team can can help you with that, right? One of the recent security issues that that is coming up in the node ecosystem is packages being poisoned where an attacker takes over some packages, they inject malicious code and whoever is using a package or or a or the downstream dependency of that package. Uh they can be impacted. I think we we we've seen a cryptoreated issue like this. In in your view, who who could best protect against these issues? Would it would it need to be a security team who decides on things like pinning certain versions of packages or scanning updates for it? Or basically as as a developer, if I'm if I'm depending on third party packages, what are good practices I could I can do to try to avoid some of these, you know, dependency security issues which are now becoming more widespread. >> That's a tough one, right? because everyone uses dependencies and your dependencies are using dependencies and and uh so it's uh quite hard to to do something right if you have this whole dependency chain and it you know some developer of that dependency a maintainer gets compromised and and then you know u a dependency get back door uh you have almost no chance in uh having a security problem when you pull in that dependency you you you cannot not use dependencies I think the only thing what you can do here is to uh have tools in place and this is like software composition analysis is is a thing here that basically observe and and and check your dependencies you know for known uh threats right at some point luckily like the the npm package you mentioned became known to be vulnerable or malicious or back door and then those tools uh basically get uh updated on a very frequent basis to to look um what are the threats and what are dependencies you shouldn't be using in a specific version and then warn you about this uh and what what is the the the next version you should use or that you should get rid of that dependency basically. >> And what is software composition analysis? So software composition analysis is um called SCAR is is basically a a technique where you know we look at manifest files your list of dependencies right depending on the package manager you use and then this list of dependencies is checked to a database of known uh security problems right those are called the CVEes those are not the the zero day vulnerabilities we talked about earlier that you typed into your code right that someone else um some maintainer had a security problem. Someone found that problem, reported it, it's documented in a database and then you know you can basically with software composition analysis map uh that this specific lo forj version of your library is is vulnerable to the lock for shell vulnerability that is known and then it can warn you and can you tell us about the CVE program? I understand inside security circle this is very well known and very useful but what should I know as a developer about this and how much should I kind of look it up check it uh worry about it it's run by Mitra right like the US government uh there is some change happening here um so that's the the common vulnerability enumeration uh is the CVE list and it's a database where uh it used to be kind of like the central database for documenting known vulnerabilities um I think it's just too many vulnerabilities reported every day. So I think there's a bit of a bottleneck there and so there are also other databases evolving or places revolving where security issues are collected and and and gathered and scar tools typically use that CVE database um but also other resources to collect all kinds of known vulnerabilities uh to make sure they they know about all potential threats. And as a software engineer strictly focusing on you know I'm trying to make my code secure. Do you see value in kind of trying to keep up with with CVEes with with new vulnerabilities or or do you see this being more of something that you really need someone who is is dedicated focused on this may this be a security engineer? I'm just talking from a practical perspective you know if I'm working at a scaleup where we have a midsize team maybe have one security engineer and I really really want to you know do my best work to try to security is important in in our domain. Do I take some of this on on me or do I say like hey let's let's if if we really need about this let's get more resources dedicated folks who can help with uh you know the the kind of depth of the industry. Yeah, I I would use a tool for this, right? It's a it's a problem that you can automate and um I I wouldn't um hire um more security u um team members for this. So, you can use software composition analysis. You know, it will automatically check all the dependencies. There are I think in the database like over 200,000 CVEes and and every day I think there's like 50 new CVEes coming out. not necessarily I think in in open source libraries right also in known products etc but I think it's um not a good use of your time as a developer but also not as a security team member to to look at um you know every single CVE that comes out I think you should then have a good tool in place a software composition analysis tool in place um that helps you to detect those but also helps you in fixing those right which is much more important than building a huge backlog of security issues uh the important thing is that you can also fix this and get some advice on on on how to fix this. >> Yuan has just talked about how it's a no-brainer to automate much of your security analysis like keeping up with the latest security vulnerabilities in software engineering. Using the right automation and the right tooling means that you get to focus on what matters like building your product and not spend as much time on infrastructure. This is where our presenting sponsor static comes in. Static gives engineering teams a toolkit for safer deployment. Feature gates, gradual rollouts, and experimentation. These are built into your release process. So you ship changes to 10% of users, then expand to the remaining 90%. You validate behavior, measure real impact, and scale only when things look good. If something goes wrong, you can instantly turn it off before it affects everyone. To support this, static includes product and infra analytics, built-in tools for logging and tracing. So you can actually see what your code is doing in production, performance, errors, user behavior all in one place because you cannot secure what you cannot observe. For teams with strict data governance or security requirements, static also offers warehouse native. Your user level data stays in your data warehouse, Snowflake, BigQuery, data bricks, whatever you use. full control inside your security boundary and you get the deployment safety and observability without shipping sensitive data to external systems. Companies like Microsoft at Lashan and Brex use static for safer deployments with enterprisegrade security. Static has a generous free tier to get started and pro pricricing for teams starts at $150 per month. To learn more and get a 30-day enterprise trial, go to static.com/pragmatic. With this, let's get back to code security with Johannes. And recently you've produced a state of code security report which is a pretty comprehensive one as I understand. What are things that you found there? >> Yeah, so at Sona we we scanned 750 billion lines of code daily, right? So there's our analyzer see quite a lot of code and we we we studied like a subset of this. We took 8 billion lines of code that was written by 1 million developers of 40,000 organizations uh globally. took quite a data set and then we looked at what are the issues uh we we see and um I think one finding was that every about 1,000 line of code uh we we see a security issue and that reflects kind of well my uh my my feeling of when I manually uh audited code so every 1,000 line of code and issues is quite a lot I think and then the issue types we found and saw were the basic ones we talked about right the most in the top five at least right there was lo injection ction, cross-ite scripting, SQL injection, hardcoded passwords, the typical things that that um that go wrong. I think some surprises were in there about regular expressions for example was was something that uh we were typically or more often apparently you know we have a slow regular expression or insecure regular expression which can lead to denial of service attacks and so that would be uh something more out of the lines but yeah the the basic ones are still very prominent today in code. It's very interesting because you like you're saying every 1,000 lines of code roughly one security issue. It's funny because lines of code we always argue about is it a good measurement of things you know complexity work whatnot or or is it not but I I guess you know you're still using this heristic right? I mean it's it's it's a statistic we build for the report right but I think yes I think what comes down to that is is that quality here is really connected um to to security right I mean you could uh solve certain problems with you know more lines of code or with less lines of code um and I think equality is something here um that you know is very related in terms of um when you have more lines of code right there's more you know code to review basically and uh it's it's harder to spot security issues in the end while if if you do it in a well-maintained and and and structured way >> exactly this was exactly my feeling on on this. So like what would you say how is the quality of code related to security? Did you see any findings on this? >> Yeah, I I think it's super related, right? And I think it's it's totally underrated in in the industry today. um we I mean we talked about the the null pointer exceptions or these slow regular expressions, right? Uh that can lead to security issues and and that's more maybe the obvious um examples of bugs. Um but also if you think about unreadable code, not well-maintained code, right, that's kind of like spaghetti code, then it's not so obvious at first maybe how this is connected to security. But then if you think about that code is not easy to comprehend, not easy to review and you do peer programming or code reviews in your development team. Then in that spaghetti code, you will more likely um oversee security problems of your peer. And then also if you think about fixing security issues, right, like now maybe someone found an issue or found later an issue and reports that back to you and you as a developer have to fix it. Think about if that's you know not well-maintainable code you you you cannot fix the problem the security problem so quality suddenly becomes a security issue in the sense that the you know attacker window stays open longer at some point you have to to fix the issues and so I think here code quality is super related to code security especially now with AI generated code where we see typically uh poor quality of code right and that becomes a problem for security when we look at code secure security. How does that relate to cyber security a as as a whole? So, there are many fields of security, right? There's data security, cloud security, network security, forensics. As a larger organization, you kind of need all of them. Um, and they are all interconnected and they build multiple lines of defenses. From my perspective, from an you know offensive security uh perspective, I found application security always the most interesting field because if you think about uh you know every organization today basically deploy software they they ship software as a product or they deploy some services online to have customers interact with their business. And so those applications are online 24/7, right? And they're available to to me as an attacker. And that's the you know at the forefront of security and and typically the first entry point into the network. Um and so that makes it so so critical or so interesting for attackers the the application security field whereas the other areas you know more try to prevent the lateral movement once an attacker is in uh can the attacker maybe not decrypt the data he stole or can the attacker and not you know move from one server to the other. >> What is lateral movement? >> Yeah. So typically as an attacker you would um you know gain your first entry point into a network and then maybe you want to expand from there. So you have a shell on one server. You can control a server or a machine. You can run system commands and then you would uh from there you know you are in the internal network and try to see what other services can I reach, what other internal things can I access and then you need a security strategy basically in the broader cyber security uh strategy to prevent that lateral movement between uh internal services. One idea that comes to me about lateral movement with the advent of AI assistant MCP servers, it's probably going to be a pretty tempting attack vector for just thinking as an attacker, hey, let me try to get access to that developer machine. You know, I'll set up an MCP server that says it does something, but secretly it does something else. It runs locally. Boom. I get access to this developer machine. as developers and as as security professionals, how much should we worry about this? Uh and are are you seeing any worries about this specific attack vector? Because I feel until now developers machines were kind of a little bit off limits or were they off limits? Yeah, I mean developers machines I think are not off limits, right? I think supply chain attacks is a a big a big topic where I mean developers are building software and then software is deployed on on all the organizations worldwide right and that makes it so interesting so we talked about an npm package that gets compromised by compromising a developer's machine basically right and then from there uh you can uh compromise uh a super popular dependency right or if you're a software vendor you better make sure the software that is shipped to thousands of organizations maybe uh is not backtored because some developer uh got backtored and uh yes I think also with uh uh agents and and giving away more control there is a a new uh threat here because it's not just about the dependencies you're using or your machine security in general but also uh making sure that the the agent you're using is uh doing the right thing and it doesn't has the privileges to do something accidentally uh or on purpose as you said uh to do something harmful like if the if the agent passes a gyra ticket and something is you know someone can create a malicious gyro ticket that instructs basically the agent to add a backdoor and just in instead of just solving a development problem uh then you suddenly have a new you know type of security problem to think about >> you previously mentioned that if you can automate things uh as for for code security or application security you should try to do that what are the common code security tools that you've you keep seeing engineering teams use for security hygiene like what are the categories that I can think of I think every developer uses an IDE right so there's some basic linting available in ideides and that's great right like because as you type you find issues and you can resolve them just that in an IDE typically you don't have such a broad or in-depth security coverage built in right there are some IDE extensions you can use but then typically you stay at the linting side that means some syntactical and semantical checks um and typically in the current file you're working in simply out of performance reasons right because it has to be done in milliseconds as you code and shouldn't slow you down and then you have uh static application security testing tools SAS tools that um can go more into um a deeper level of of code analysis right so and depending on the SAS tool you use there is for example symbolic execution or taint analysis techniques used where your whole code base is transformed formed into an abstract model basically and then it's uh simulating static analysis is simulating what could happen here at runtime. what um it's not executing the code, right? But but analyzing this and uh connecting what we talked earlier about user inputs for example, how are they flowing in terms of data flows through through all your code paths and simulating what what could go wrong here to find um different issues and and can you just give us a high level of of what is happening because this sounds super interesting. what I understood and you know tell me if if I got it right is you take your code and you kind of turn it into like maybe a graph or or of some sort and then you can try to figure out kind of inputs how they can flow how they can get to components. Yeah, exactly. So your your code is is transformed into a big graph model. This can be any dimensions. Uh yes, right. So, so um every basically every um file of your codebase, every function, every uh if else basically um so whenever the control flow of your application changes, every function call, every if else is a part of that big graph model, right? And then you try to figure out what are all the combinations where you know your variable assignments which which create data flow basically uh where is user input uh received in that application and then passed on with you know data assignments through different if else and function u um calls and where does it end up in something security sensitive right and this can be very very long data flow path and very complicated to do right and also to do do that efficiently, right? Um it it used to be taking days and and now we can do that in minutes and that's a very hard problem to solve. But it helps you to automate that that that process, right? What we talked about earlier where you should um be be mindful of what is user input. It it helps you to automate that and find even very tricky and long uh connections between user input and something security sensitive. >> Okay. So we talked about the kind of llinters inside ids uh the SAS SAS scanners that you said are there other tools worth knowing about? I mean secret detection we talked about hardcoded passwords or um so there are secret detection tools there is um infrastructure as code scanning right if you think about code more broadly it's u also infrastructure as code or your GitHub actions file can be code right and there there are tools to scan this typically if you have a good SAS tool that's all covered by by static analysis basically right because everything can be considered code here and then we talked already about software composition analysis this as another tool uh for developers where you find those known vulnerabilities, those CVEes in your dependencies. >> I guess this is a layered approach, right? So the more security you'd like, the more of these layers you would set up, but do I do I sense that there's a trade-off between them? It's going to be maybe complexity, time to run, uh those kind of things. Like what what is the downside of just like throwing all of these tools onto every single codebase I have? Even if I'm a if I'm a oneperson startup, right? like why would you not recommend that if if you would not recommend it? >> Yeah, I think I mean for the the basic static analysis tools I would definitely recommend uh to do that. I think here what you should be careful of is choosing something that is you know intended to be used by developers and not by security teams. Right? We talked about the noise level that is uh you know interesting for security teams from an oil perspective but uh deadly for your product uh uh um development uh you know productivity where you shouldn't be annoyed and and and um I think that's something to watch out for but and and then there are differences you know for SAS tools and SCAR tools if they are more for the security teams or for developers but uh I would definitely recommend I think at all levels to to run uh static analysis and software composition analysis to have your your basic uh security hygiene in place. So these are static tools they after you run the code they can run on them they can run on CI they can run with continuous deployment. Are there kind of more dynamic tools and I'm just kind of thinking of of of the idea that you know as your code runs as your servers operate that dynamically try to test uh or or or just do funky stuff. >> Absolutely. So there's dynamic application security testing. We we talked about penetration tests, right? And and and a dust tool tries exactly to automate this, right? What is dust? Dynamic application security testing is um testing from the outside as a blackbox when your application is already running on a test server or in production. And it's basically shooting all kinds of malicious payloads from the outside against your application to see how it reacts and is it breaking or you know uh is there a delay is it behaving weird or throwing an error message and then this way um trying to automate such a human penetration test to find out if there are issues it can detect. Um and then there's also on the dynamic side fuzzing which is similar to dust basically where it's more for embedded software you know binaries CC++ uh libraries or applications where typically you pass complex formats or protocols like file formats and then you want to flip every single bit basically in what you're processing to see if something breaks right and you can automate that with fuzzing and then find those crashes. Um so that that works very well. Um I just think that you know those more dynamic tools are not so much for developers uh uh um today because uh you are a bit disconnected from your coding and you know you you have to context switch basically because you cannot find things as you type. You need to kind of like finish what you're doing, deploy it on the test server, get it run and then the feedback loop is just a bit longer. And so um I think for for developers it's it's more inefficient but for security teams is it's a it's a great tool to have uh to to additionally uh maybe run a dust or a fuzzing right. >> Yeah. And as you say like it's it sounds like a bunch of setup uh to do just one more thing like I can I can see why you're saying that it's it's more for a security team. One thing you haven't mentioned but I I was waiting if you would mention AI security reviews. These are, you know, popping up everywhere. There's there's a lot of different tools, a lot of different vendors, some some existing ones, and they're all saying the same thing. Use this thing. It will make your code more secure. What is your take as a software professional? >> I think it's super fascinating and fun to to see, right? Uh and and also impressive what uh AI can uh can can find today. You know, as with static analysis or every other technology, to me, it's it's not necessarily all about just finding issues. uh at least when you want to you know use that in a systematic way as a as a developer uh here you have to get into a good balance of am I not only finding things but how often am I reporting things that are actually not a true or a meaningful issue um and and can I scale this to half a million lines of code etc right so um I think what we're seeing more today is uh you know security research agents uh that go in and and randomly find um uh issues. You know, it's it's a great tool for security teams here. Uh but as a developer, you want to have, I think, something a bit more systematic that that finds all code security problems. And you mentioned maybe um the the you know, to me there's another aspect of being deterministic versus non-deterministic, right? Here uh the the uh AI basically is nondeterministic. And I think you know again for a security team that's not so important but as a development organization you you need to have basically like a a quality gate and that's consistent across your team and all the other teams that always has the same output and uh you cannot fail your gate you know because randomly a new issue is popping up or disappearing etc. So I think that that doesn't really work well for developers today. And lastly to me from a developer perspective right there's also a bit of a contradiction if you think about most or or a lot of code is written by AI itself right depending on who you ask and um if you then use AI to review AI generated code that's a bit uh you know uh having students grade their own homework uh where you you you could think that you know if AI didn't you know could prevent actually generating a security issue why would it data on detect that security issues. So I think we need to have good guardrails and verification in place uh that is not AI to verify then basically this AI generated code. I I I can see where you're coming from. Although I can also see some some people might say like well what if it's a different AI? What is a different LLM like difference but I I we're still not you know we're just chasing one another like we're still not changing the core problem here. And earlier you said that AI can generate lowquality code and this could be an issue when we're talking about the the lines of code per per security issue. Can you go a little bit into more detail on and what you're seeing observing? What what what does low quality mean in in this sense? Or is it the verbose nature of that we're sometimes seeing? So for example at Sona we did um great studies of the most popular LLMs um like cla GPD4 5 lama open coder etc. And we looked at the what we call personalities of them, right? How do they um what kind of issues do they produce and what kind of quality are they producing? And then measured what comes out of of of that. And studied this. And one interesting finding to me was for example that um you know if you if you use the reasoning mode of GPD5 it it actually decreases not eliminates but decreases the number of security issues you find but it it's it's using more verbose you know output to solve the the development problem. It produces more code actually. Um, right. And um this is then again something that leads into security problems because you have more uh lowquality code that maybe have less security issues uh itself but then poses a problem maybe combined with other snippets of your code um or it's it's harder to review um by your peers or later on and it's less maintainable and and you know leads to a security problem. This reminds me of there's this like old saying before AI that code is a liability. The more code you have, the more liability you have. And this was a reason that you know back in the day uh as an experienced engineer would sometimes spend a day or two reducing the lines of code, refactoring, compressing it, bringing single responsibility, removing duplication. And I wonder if we're kind of forgotten this a little bit that the more lines of code you have I mean just taking your statistic of one security issue per thousand lines of code let's just take it for now like yeah like you know like you you would want to have if if kind of an efficient lines of code right like you do want to spend that time and effort of of getting to a system that is simple clear responsibilities concise I think this is something for that developers or you know engineers look at uh today already uh not just uh to to vibe code and accept all the code but to actually make sure um that the code makes sense and is is well structured for all kinds of purposes right to maintain a good architecture in your in your codebase right and and to have a good maintainable code etc outside of the security world that that's already a big code quality uh um problem and and I think developers are aware of this uh but And yes, it it it adds for that uh uh to you know security problem. Uh on top of that there was a nice survey by I think it was stack overflow where I think uh only 3% of the developers asked basically uh said they they they trust uh their AI generated code and I I think uh that's uh that's very reasonable. Yeah. >> Yeah. I mean when I'm using uh AI to like build myself my APIs and test and all those things I also find myself like I I give it instructions and every now and then I also tell it to refactor some things to move things around as I'm watching the output I I just see something is getting bloated you know I have an index.ts TS that is getting this big. I'm like, "All right, let's pause. Let's refactor." But I do this because again, you know, like years of of building software, I know that it's just going to get a mess. I'm not going to be able to navigate. And for me, it's important that I need to understand my code and I need to have the structure for that. So, I guess this doesn't change or and and maybe the people who are Vive coding, they're going to come around to learning the same lessons that, you know, we we all learned the hard way. >> Yes, I guess. But how do you think AI is changing code security uh and also security in general? What impact you see it's already having? >> I mean there's definitely a change, right? I think I mean even for our security tools, I think there's a big change in the sense that it's um it's very powerful and helpful. So um even if if you run deterministic algorithms like static analysis to detect issues, uh you can still enhance that deterministic algorithms with AI, right? So for example, we talked about the taint analysis. Your deterministic analyzer needs to have a lot of knowledge about all the libraries and frameworks that are there and there are millions, right? And so AI can help you with gathering knowledge and information and then feed that into a deterministic algorithm, right? So you can combine technologies um and that's definitely changing I think the static analysis but also dynamic analysis and and other security tool areas. And then what we're also seeing is, you know, fixes work quite well. Like if you throw half a million line of code, you know, into um the the the context window, it's it's not working so well. Um but if you just throw in you have a very specific task, if you say here's a deterministically found security issue, here are the 20 lines of code and that's the problem. And then AI is very good in in fixing those issues, right? Um so so that's very helpful. um because it's about fixing and not just detecting and AI is super powerful here. we we also see um like a change in how code and applications are built right so if you think about um applications traditionally you have this backend front end and in the back end is a is a database if you remove that database then you don't have a SQL injection anymore right um but if you add an LLM to the back end then um you have a prompt injection maybe you know another vulnerability where the attacker you know can modify the system prompt or your prompt engineering and then mess with the LLM logic or or the the output and that's becoming then you know so the threat landscape changes and attackers adjust for it and and certainly the tools and the the industry uh adjust to this and um that's that's maybe taking a bit of time right if you think about all the cobalt code we are still seeing >> yeah but I guess we can add prompt injection right up we can pin it up there with SQL injection in fact who knows prompt injection might become even more uh of with the security issue. >> Yeah, I think as uh you know uh text becomes um code, right? I think uh um prompt injection is kind of like the new code injection, right? The human language is the new code and so if you inject human language, then suddenly uh that's that's your new code injection. So that's interesting from a security perspective. And what about uh with coding assistance? Uh are are you seeing things change with in terms of how we think about code security? I mean I think the the big problem in terms of security is that you you know produce just uh code much more faster and and and writing code is not the challenge anymore and so suddenly the new bottleneck is how are you verifying all that code right that's the new bottleneck not to get your code done but to verify it that's actually secure and uh if if you don't then that leads to security issues or quality issues which then on the long run lead to security problems right so I think That's the big new uh challenge for for security or code security. How do you verify all of that uh faster produced code at scale and at speed? And what is your take on what is working so far? I mean the obvious thing that I'm hearing a lot of engineers and engineering leaders say is like well we need to scale code reviews. We need to figure out ways where humans can look at code reviews you know like more of them. Meaning let's add tools to them. Let's add additional context. But outside of that, do do you see some other maybe promising areas where where we could actually verify from strictly from a security perspective like is this code secure? >> Yeah, I think you mentioned already the the key things uh right to to to add tooling to to automatically verify code as you produce it. I think there are also areas where um and and Sonosource is pioneering something here in the field where you basically look at how are LLMs trained and uh you make sure the data set that an LLM is trained on is actually free of uh common security issues, right? And if you do that and train your LLM on high quality uh code, on high quality data free of security or quality problems, you you are producing from the beginning right a much more um secure code and that's maybe uh another uh thing where in the future we will see more of this. And speaking of of this like again because you you you see a lot of code you do a lot of security analysis. Do you see AI generated code introduce different types of security issues than humans would do especially because we know that elements are trained on you know human code in the end. I think they're doing the same mistakes for for the reason you mentioned maybe the the prevalence changes of certain issue types right the issue types don't don't change so much but uh what we are seeing for example slop squatting is a good example where you know AI proposes to use a library that doesn't even exist right and then so an attacker can register in in npm or maven central that that that package that non-existing package and then with that you suddenly include a malicious package and and there's the back door. And so this security issue was known before and we had dependency confusion, but it's just less likely that a that a um developer um you know mistypes a dependency while with AI that that prevalence changes suddenly, right? There's an acceleration of that while other issues maybe uh decrease, right? I could imagine, I'm not seeing this for now, but I could imagine hard-coded passwords like some some issues that are just oneliner issues uh maybe decrease a little because AI is able to learn that you know I shouldn't do that and um uh then we could see a reduction of of of um you know those issues. Still human developers can add them but maybe the more AI generated code is used um we see less of them and then maybe we will see more of the complicated issues right issues where you need to combine multiple code snippets with each other to form a security issue um that is then not so easy for AI to to uh to to grasp and so u definitely you know some changes in the prevalences of of what we know of already today. What are some commonly misunderstood things about the security industry? Things that you know we we we could call them fallacies. >> I mean I come from the security industry right and I I moved uh more to the developer side of the over the years I would say and and now you know stepping a bit back for a moment and looking at the security industry. It's quite fascinating how we have this separated industry and community from the developer community, right? Where we we both talk about code and bucks basically, but one side is maybe more about building things and the other about attacking and destroying and and so they are a bit distinct somehow and separated. That's that's interesting to me. And um I think one fallacy that comes out of this is uh you know that the security industry is all fascinated about the the security problems and then is selling products basically um that that that promise you know you can have security just as a product right and and um I mean there's a lot of money in that industry and it's it's driven by you know compliance and fear of data breaches um and and so I think as a cuo you have a hard time knowing you know what product should I should I use and and and buy and uh often I think a mistake here is to to look at security as a product and not as something that you are building into the process of development right because I think in reality that's that's what uh you must do uh and not have a tool that finds you yet another 1,000 issues if you hit the scan button right uh but something that embeds into the process finding issues but then engaging developers and and and helping you fix things and uh I think that's the the biggest uh fallacy to me. We talked about the the ownership that comes with that maybe right. I think there is a bit of this mysterium about security and it can only be owned by experts who are top-notch hackers etc where um I think the lines are a bit more blurry here and um it's it's more about fixing things and not just so much about the exploitation stage all the time that the security industry talks about and and finds fascinating and I myself you know uh am guilty here and and find fascinating. Lastly, maybe this, you know, there is no perfect security. If you get the understanding of the security industry, then maybe that's a fallacy here that uh you know there's no perfect security unfortunately. >> Yeah. This this thing about vendors selling products promising you know your your your organization will be secure, your code will be secure. It reminds and and the fact that the reality is a lot more it it doesn't really work like that. Like you need teams, you need people who care about it. Uh sometimes you can I guess you can have a team that uses zero security tools producing really secure code because they're just experienced engineers or working in the domain that they understand and you can have the other way around as well like you can have a team that has all these scanners and whatever and their code is is still like not great not good and unsecure. It reminds me of the developer productivity term like how productive are my engineers and again there's vendors selling all these tools saying hey measure this and you will get this and we just see the same thing. So I I wonder if it's just a thing of there are just some things that are just hard because there's a lot of moving parts. You cannot just measure just one thing because we can optimize for that and still the outcome will not be great. I wonder if there's just some some areas you know developer productivity is one security may maybe software is just hard. It is right. I I I think the more complex software you write um it uh and every developer knows this right the more problems you have hard problems you have um you know bugs you have and also the the more security problems you will run into um and that's just natural um we you know we have a great vulnerability research team at Sona and and uh so those guys are they are picking the the most popular open-source projects in the world that are you deployed everywhere with great communities, great maintainers, buck bounty programs where people get paid if they find something, etc. And um it's it's fascinating to me every time they they they choose such a you know high-profile target that they go in and and and they still find something, right? If you uh you know, if you're motivated enough and u you know, look hard enough, I I I think you you can find something. Um and and unfortunately that's that's that's the that's the reality right >> as a software professional as a security professional in the field for 20 years what can you advise to me as an engineer uh how can I know that my software is secure enough or at what point should I stop and how would you think about this obviously uh there will be differences between if I'm a one person you know tiny business a midsize company at a very large company how would you advise engineers to think about, you know, good enough security. Okay, I can move on. This is good. Let's let's do the other stuff. >> Yeah, it's tough because we said it's it's um you know, perfect security is is um is hard. But then to the question, what is good enough and and how can you solve this? I think using tools is a is the first thing you you you should use, right? I I think um you know, it's like a bit like securing your house, right? like you should make sure you you shut your windows and doors um um and and have some basic hygiene. It doesn't mean that a you know highly skilled or funded attacker can can break in but you can make sure you you shut those windows and doors. I think with software the the challenge is a bit you're adding new windows and doors every day basically like with the new features you're adding and so um I think you need some automation for that uh to have your basics right. Um and then I would recommend basically you can start with a initial assessment where where am I standing today right like you can hire professionals or you know use a tool for this and kind of like assess where do I stand today what are my most critical issues I should fix um and and get them fixed. Um, and then more importantly, as you're adding features and as you're coding, making sure you're not adding more on top of that, right? Making sure you're not adding more security vulnerabilities and also you're not adding more technical depth that and quality problems that in the long run lead to security issues. And I think here automation is key basically. Um and then you know after a quarter or something you can run that assessment again and and look at where am I standing and uh hopefully you have been you know very productive as a developer and adding new features that didn't slow you down uh but also you you increased your security posture um at that point it's a never ending story and and it's a growing field right like we always need to be aware of the latest uh changes right now LMS and prompt injections you'll probably need to ask yourself as as If I'm building on top of LMS or I'm invoking APIs, can they go in there and then you know the next thing will come come again and the next and the next and the next. I guess keeping an eye on the OASP top 10 is never a bad thing just to cover the very basics. >> Yeah, I agree. >> Now, as a closing question, I'm going to put you on the spot here. Which programming language do you think is the most secure? the the the one that you you are very happy either as using it or observing as like okay this is the language itself seems to help prevent a bunch of security issues to start with. >> I think the newer languages um are more secure. I I like go is a is a is a good example. I think uh you know by default things are just you know new languages learned from the past from from older languages uh what uh goes wrong. But I think also other languages are evolving. I think Java is you know we see that a lot in enterprises and I think it's uh it's uh quite secure to use. Um so so that would be my answer here. >> No I I I like that you dropped Go. It's it's it's getting pretty good traction with startups as well in including for now even building web stuff. It's it's picking up. So it's I I guess it's all all down to people's tastes but it's good to hear. So Johannes, this was uh really interesting. Thanks for coming on the podcast. Yeah, thank you. My pleasure to be here and uh thanks for the invite. >> Well, thanks very much for this. Thanks a lot to Johannes for taking us deeper into the topic of code security. The thing that I found the most interesting is just how hard it is to define exactly what makes code secure because there are simply so many impossible attack vectors from using a dependency that gets compromised to AI generating code with glaring security vulnerability like not validating inputs to accidentally leaking credentials. The list just goes on. Security feels like this invisible thing across software. As long as there's no security issues discovered, it doesn't get much attention. But once there is, then there's a scramble on what to do. As a professional software engineer, we need to keep ourselves up to date with common security vulnerabilities and how we can defend against them, including the new ones that AI tools introduce. For more details on security engineering, see the Pragmatic Engineer deep dives linked in the show notes below. If you've enjoyed this podcast, please do subscribe on your favorite podcast platform and on YouTube. A special thank you if you also leave a rating on the show.

Code security for software engineers

Scores

Summary

Key Points

Key Takeaways

Primary Category

Secondary Categories

Topics

Entities

people

organizations

products

technologies

Sentiment

Content Type

Difficulty

Tone