• 8 Posts
  • 153 Comments
Joined 3 years ago
cake
Cake day: August 10th, 2023

help-circle


  • Moderation, much as the average internet user loves to lambast it, is not an easy task at all.

    This is exactly why you should use reddit lemmy as a forum instead of discord. One of the repeated problems I have seen in the emulation on android community, is that there are many entittled children, who harass and troll in these communities. Moderators have to ban them, but the bans are per server. That means that each server has to deal with the same troll who kicks up a fuss, and then ban them. And then they create a new account and repeat. I have seen communities and projects die due to harassment and trolling and it makes me sad.

    But on reddit Lemmy, instance bans could be applied to ban problematic users from many communities at once, saving and deduplicating work.

    Moderation is a lot of work, but moderating a reddit Lemmy community is ultimately a team sport, rather than an individual one.






  • Project Zero

    Project zero was entirely humans though, no GenAI. Project big sleep has been reliable so far, but there is no real reason for ffmpeg developers to value project big sleeps 6.0 CVE’s over potentially real more critical CVEs. The problem is that Google’s security team would still be breathing down the necks of these developers and demanding fixes for the vulns they submitted, which is kinda BS when they aren’t chipping in at all.

    Anyway there’s a big difference between submitting concrete input data that causes an observable crash, and sending a pile of useless spew from a static analyzer and saying “here, have fun”

    Nah, the actually fake bug reports also often have fake “test cases”. That’s what makes the LLM generated bug reports so difficult to deal with.


  • With a concrete bug report like “using codec xyz and input file f3 10 4d 26 f5 0a a1 7e cd 3a 41 6c 36 66 21 d8… ffmpeg crashes with an oob memory error”, it’s pretty simple to confirm that such a crash happens

    Google’s big sleep was pretty good, it gave a python program that generated an invalid file. It looked plausible, and it was a real issue. The problem is that literally every other generative AI bug report also looks equally as plausible. As I mentioned before, curl is having a similar issue.

    And here’s what the lead maintainer of curl has to say:

    Stenberg said the amount of time it takes project maintainers to triage each AI-assisted vulnerability report made via HackerOne, only for them to be deemed invalid, is tantamount to a DDoS attack on the project.

    So you can claim testing may be simple, but it looks like that isn’t the case. I would say one of the problems is that all these people are volunteers, so they probably have a very, very limited set of time to spend on these projects.

    This was the first search hit about ffmpeg cve’s, from June 2024 so not about the current incident. It lists four CVE’s, three of them memory errors (buffer overflow, use-after-free), and one off-by-one error. The class of errors in the first three is supposedly completely eliminated by Rust.

    FFMpeg is not just C code, but also large portions of handwritten, ultra optimized assembly code (per architecture, too…). You are free to rewrite it in rust if you so desire, but I stated it above and will state it again: ffmpeg made the tradeoff of performance for security. Rust currently isn’t as performant as optimized C code, and I highly doubt that even unsafe rust can beat hand optimized assembly — C can’t, anyways.

    (Google and many big tech companies like ultra performant projects because performance equals power savings equals costs savings at scale. But this means weaker security when it comes to projects like ffmpeg…)


  • AI tools were apparently used for locating the bugs but the reports were real and legit.

    Yes, but the FFMPEG developers do not know this until after they triage all the bug reports they are getting swamped with. If Google really wants a fix for their 6.0 CVE immediately (because again, part of the problem here was google’s security team was breathing down the necks of the maintainers), then google can submit a fix. Until then, fffmpeg devs have to work on figuring out if any more criticial looking issues they receive, are actually critical.

    It’s nuts to suggest continuing to ship something with known vulnerabilities without, at minimum,

    Again, the problem is false positive vulnerabilities. “9.0 CVE’s” (that are potentially real) must be triaged before Google’s 6.0 CVE.

    It would be great if Google could fix it, but ffmpeg is very hard to work in, not just because of the code organization but because of the very specialized knowledge needed to mess around inside a codec. It would be simpler and probably better for Google to contribute development funding since they depend on the software so heavily.

    Except google does fix these issues and contribute funding. Summer of code, bug bounties, and other programs piloted by Google contribute both funding and fixes to these programs. We are mad because Google has paid for more critical issues in the past, but all of a sudden they are demanding free labor for medium severity security issues from swamped volunteers.

    Being able to find bugs (say by fuzzing

    Fuzzing is great! But Google’s Big Sleep project is GenAI based. Fuzzing is in the process, but the inputs and outputs are not significantly distinct from the other GenAI reports that ffmpeg receives.

    Those approaches would be ridiculous bloat, the idea is just supply some kind of wrapper that runs the codec in a chrooted separate process communicating through pipes under ptrace control or however that’s done these days.

    Chroot only works on Linux/Unix and requires root to use, making it not work in rootless environments. Every single sandboxing software comes with some form of tradeoff, and it’s not ffmpeg’s responsibilities to make those decisions for you or your organization.

    Anyway, sandboxing on Linux is basically broken when it comes to high value targets like google. I don’t want to go into detail, but but I would recommend reading maidaden’s insecurities (I mentioned gvisor earlier because gvisor is google’s own solution to flaws in existing linux sandboxing solutions). Another problem is that ffmpeg people care about performance a lot more than security, probably. They made the tradeoff, and if you want to undo the tradeoff, it’s not really their job to make that decision for you. It’s not such a binary, but more like a sliding scale, and “secure enough for google” is not the same as “secure enough for average desktop user”.

    I saw earlier you mentioned google keeping vulnerabilities secret, and using them against people or something like that, but it just doesn’t work that way lmao. Google is such a large and high value organization, that they essentially have to treat every employee as a potential threat, so “keeping vulns internal” doesn’t really work. Trying to keep a vulnerability internal will 100% result in it getting leaked and then used against them.It would be great if Google could fix it, but ffmpeg is very hard to work in, not just because of the code organization but because of the very specialized knowledge needed to mess around inside a codec. It would be simpler and probably better for Google to contribute development funding since they depend on the software so heavily.

    It’s nuts to suggest continuing to ship something with known vulnerabilities without, at minimum, removing it from the default build and labelling it as having known issues. If you don’t have the resources to fix the bug that’s understandable, but own up to it and tell people to be careful with that module.

    You have no fucking clue how modern software development and deployment works. Getting rid of all CVE’s is actually insanely hard, something that only orgs like Google can reasonably do, and even Google regularly falls short. The vast majority of organizations and institutions have given up on elimination of CVE’s from the products they use. “Don’t ship software with vulnerabilities” sounds good in a vacuum, but the reality is that most people simply settle for something secure enough for their risk level. I bet you if you go through any piece of software on your system right now you can find CVE’s in it.

    You don’t need to outrun a hungry bear, you just need to outrun the person next to you Cybersecurity is about risk management, not risk elimination. You can’t afford risk elimination.


  • It might be appropriate for ffmpeg to get rid of such obscure codecs

    This is why compilation flags exist. You can compile software to not include features, and the code is removed, decreasing the attack surface. But it’s not really ffmpegs job to tell you which compilation flags you should pick, that is the responsibility of the people integrating and deploying it into the systems (Google).

    Sandbox them somehow so RCE’s can’t escape from them, even at an efficiency cost

    This is similar to the above. It’s not really ffmpeg’s job to pick a sandboxing software (docker, seccomp, selinux, k8s, borg, gvisor, kata), but instead the responsibility of the people integrating and deploying the software.

    That’s why it’s irritating when these companies whine about stuff that should be handled by the above two practices, asking for immediate fixes via their security programs. Half of our frustration is them asking for volunteers to fix CVE’s with a score less than a 6 promptly (but while simultaneously being willing to contribute fixes or pay for CVE’s with greater scores under their bug bounty programs). This is a very important thing to note. In further comments, you seem to be misunderstanding the relationship Google and ffmpeg have here: Google’s (and other companies’) security program is apply pressure to fix the vulnerabilities promptly. This is not the same thing as “Here’s a bug, fix it at your leisure”. Dealing with this this pressure is tiring and burns maintainers out.

    The other half is when they reveal that their security practices aren’t up to par when they whine about stuff like this and demand immediate fixes. I mean, it says it in the article:

    Thus, as Mark Atwood, an open source policy expert, pointed out on Twitter, he had to keep telling Amazon to not do things that would mess up FFmpeg because, he had to keep explaining to his bosses that “They are not a vendor, there is no NDA, we have no leverage, your VP has refused to help fund them, and they could kill three major product lines tomorrow with an email. So, stop, and listen to me … ”

    Anyway, the CVE being mentioned has been fixed, if you dig into it: https://xcancel.com/FFmpeg/status/1984178359354483058#m

    But it really should have been fixed by Google, since they brought it up. Because there is no real guarantee that volunteers will fix it again in the future, and burnt out volunteers will just quit instead. Libxml decided to just straight up stop doing responsible disclosure because they got tired of people asking for them to fix vulnerabilities with free labor, and put all security issues as bug reports that get fixed when maintainers have the time instead.

    The other problem is that the report was AI generated, and part of the issue here is that ffmpeg (and curl, and a few other projects), have been swamped with false positives. These AI, generate a security report that looks plausible, maybe even have a non working POC. This wastes a ton of volunteer time, because they have to spend a lot of time filtering through these bug reports and figuring out what’s real and what is not.

    So of course, ffmpeg is not really going to prioritize the 6.0 CVE when they are swamped with all of these potentially real “9.0 UlTrA BaD CrItIcAl cVe” and have to figure out if any of them are real first before even doing work on them.



  • There are a few apps that I think fit this use case really well.

    Languagetool is a spelling and grammer checker that has a server client model. Libreoffice now has built in languagetool integration, where it can acess a server of your choosing. I make it access the server I run locally, since archlinux packages languagetool.

    Another is stirling-pdf. This is a really good pdf manipulation program that people like, that comes as a server with a web interface.



  • I’ve seen three cases where the docker socket gets exposed to the container (perhaps there are more but I haven’t seen any?):

    1. Watchtower, which does auto updates and/or notifies people

    2. Nextcloud AIO, which uses a management container that controls the docker socket to deploy the rest of the stuff nextcloud wants.

    3. Traefik, which reads the docker socket to automatically reverse proxy services.

    Nextcloud does the AIO, because Nextcloud is a complex service, but it grows to be very complex if you want more features or performance. The AIO handles deploying all the tertiary services for you, but something like this is how you would do it yourself: https://github.com/pimylifeup/compose/blob/main/nextcloud/signed/compose.yaml . Also, that example docker compose does not include other services, like collabara office, which is the google docs/sheets/slides alternative, a web based office.

    Compare this to the kubernetes deployment, which yes, may look intimidating at first. But actually, many of the complexities that the docker deploy of nextcloud has are automated away. Enabling the Collabara office is just collabara.enabled: true in the configuration of it. Tertiary services like Redis or the database, are included in the Kubernetes package as well. Instead of configuring the containers itself, it lets you configure the database parameters via yaml, and other nice things.

    For case 3, Kubernetes has a feature called an “Ingress”, which is essentially a standardized configuration for a reverse proxy that you can either separate out, or one is provided as part of the packages. For example, the nextcloud kubernetes package I linked above, has a way to handle ingresses in the config.

    Kubernetes handles these things pretty well, and it’s part of why I switched. I do auto upgrade, but I only auto upgrade my services, within the supported stable release, which is compatible for auto upgrades and won’t break anything. This enables me to get automatic security updates for a period of time, before having to do a manual and potentially breaking upgrade.

    TLDR: You are asking questions that Kubernetes has answers to.




  • Many helm charts, like authentik or forgejo integrate bitnami helmcharts for their databases. So that’s why this is concerning to me,

    But, I was planning to switch to operators like cloudnativepostgres for my databases instead and disable the builtin bitnami images. When using the builtin bitnami images, automatic migration between major releases is not supported, you have to do it yourself manually and that dissapointed me.


  • I’m on my phone rn and can’t write a longer post. This comment is to remind me to write an essay later. I’ve been using authentik heavily for my cybersecurity club and have a LOT of thoughts about it.

    The tldr about authentik’s risk of enshittification is that authentik follows a pattern I call “supportware”. It’s when extremely (intentionally/accidentally) complex software (intentionally/accidentally) lacks edge cases in their docs,because you are supposed to pay for support.

    I think this is a sustainable business model, and I think keycloak has some similar patterns (and other Red Hat software).

    The tldr about authentik itself is that it has a lot of features, but not all of them are relevant to your usecase, or worth the complexity. I picked up authentik for invites (which afaik are rare, also official docs about setting up invites were wrong, see supportware), but invites may not something you care about.

    Anyway. Longer essay/rant later. Despite my problems, I still think authentik is the best for my usecase (cybersecurity club), and other options I’ve looked at like zitadel (seems to be more developer focused),or ldap + sso service (no invites afaik) are less than the best option.

    Sidenote: Microsoft entra is offers similar features to what I want from authentik, but I wanted to self host everything.



  • So instead you decided to go with Canonical’s snap and it’s proprietary backend, a non standard deployment tool that was forced on the community.

    Do you avoid all containers because they weren’t the standard way of deploying software for “decades” as well? (I know people that actually do do that though). And many of my issues about developers and vendoring, which I have mentioned in the other thread I linked earlier, apply to containers as well.

    In fact, they also apply to snap as well, or even custom packages distributed by the developer. Arch packages are little more than shell scripts, Deb packages have pre/post hooks which run arbitrary bash or python code, rpm is similar. These “hooks” are almost always used for things like installing. It’s hypocritical to be against curl | bash but be for solutions like any form of packages distributed by the developers themselves, because all of the issues and problems with curl | bash apply to any form of non-distro distributed packages — including snaps.

    You are are willing to criticize bash for not immediately knowing what it does to your machine, and I recognize those problems, but guess what snap is doing under the hood to install software: A bash script. Did you read that bash script before installing the microk8s snap? Did you read the 10s of others in the repo’s used for doing tertiary tasks that the snap installer also calls?

    # Try to symlink /var/lib/calico so that the Calico CNI plugin picks up the mtu configuration.

    The bash script used for installation doesn’t seem to be sandboxed, either, and it runs as root. I struggle to see any difference between this and a generic bash script used to install software.

    Although, almost all package managers have commonly used pre/during/post install hooks, except for Nix/Guix, so it’s not really a valid criticism to put say, Deb on a pedestal, while dogging on other package managers for using arbitrary bash (also python gets used) hooks.

    But back on topic, in addition to this, you can’t even verify that the bash script in the repo is the one you’re getting. Because the snap backend is proprietary. Snap is literally a bash installer, but worse in every way.