AI Remixes: Who's Tweaking Your Favorite Model, and Should We Be Worried?
Manage episode 478540127 series 3658923
We've all heard about powerful AI models like the ones that can write stories, create images, or answer complex questions. Companies that build these "foundation models" are starting to face rules and regulations to ensure they are safe. But what happens after these models are released? Often, other people and companies take these models and customize them – they "fine-tune" or "modify" them for specific tasks or uses. These are called downstream AI developers.
Think of it like this: an upstream developer builds a powerful engine (the foundation model). Downstream developers are the mechanics who take that engine and adapt it – maybe they tune it for speed, or efficiency, or put it into a specific kind of vehicle. They play a key role in making AI useful in many different areas like healthcare or finance, because the original developers don't have the time or specific knowledge to do it all.
There are a huge number of these downstream developers across the world, ranging from individuals to large companies, and their numbers are growing rapidly. This is partly because customizing a model requires much less money than building one from scratch.
How Can These Modifications Introduce Risks?While many downstream modifications are beneficial, they can also increase risks associated with AI. This can happen in two main ways:
- Improving Capabilities That Could Be Misused: Downstream developers can make models more capable in ways that could be harmful. For example, techniques like "tool use" or "scaffolding" can make a model better at interacting with other systems or acting more autonomously. While these techniques can be used for good, they could also enhance a model's ability to identify software vulnerabilities for cyberattacks or assist in acquiring dangerous biological knowledge. Importantly, these improvements can often be achieved relatively cheaply compared to the original training cost.
- Compromising Safety Features: Downstream developers can also intentionally or unintentionally remove or bypass the safety measures put in place by the original developer. Research has shown that the safety training of a model can be undone at a low cost while keeping its other abilities. This can even happen unintentionally when fine-tuning a model with common datasets. Examples include using "jailbreaking" techniques to override safety controls in models from major AI labs.
The potential risks from modifications might be even greater if the original model was highly capable or if its inner workings (its "weights") are made openly available.
While it can be hard to definitively trace real-world harm back to a specific downstream modification, the potential is clear. Modifications to image models, for instance, have likely made it easier to create realistic deepfakes, which have been used to create non-consensual harmful content and spread misinformation. The fact that upstream developers include disclaimers about liability for downstream modifications also suggests concerns exist.
Why is Regulating This So Tricky?Addressing these risks is a complex challenge for policymakers.
- Undermining Upstream Rules: Modifications by downstream developers can potentially sidestep the rules designed for the original model developers.
- Limited Visibility: Downstream developers might not have all the information they need about the original model to fully understand or fix the risks created by their modifications. On the other hand, upstream developers can't possibly predict or prevent every single modification risk.
- Sheer Number and Diversity: As mentioned, there are a vast and varied group of downstream developers. A single set of rules is unlikely to work for everyone.
- Risk to Innovation: Policymakers are also worried that strict rules could slow down innovation, especially for smaller companies and startups that are essential for bringing the benefits of AI to specific sectors.
The sources discuss several ways policymakers could try to address these risks:
- Regulate Downstream Developers Directly: Put rules directly on the developers who modify models.
- Pros: Allows regulators to step in directly against risky modifications. Could provide clarity on downstream developers' responsibilities. Could help regulators learn more about this ecosystem.
- Cons: Significantly expands the number and diversity of entities being regulated, potentially stifling innovation, especially for smaller players. Downstream developers might lack the necessary information or access to comply effectively. Enforcement could be difficult.
- Potential Approach: Regulations could be targeted, perhaps only applying if modifications significantly increase risk or involve altering safety features.
- Regulate Upstream Developers to Mitigate Downstream Risks: Place obligations on the original model developers to take steps that reduce the risks from downstream modifications.
- Pros: Can indirectly help manage risks. Builds on work some upstream developers are already doing (like monitoring or setting usage terms). Keeps the regulatory focus narrower.
- Cons: Regulators might not be able to intervene directly against a risky downstream modification. Could still stifle innovation if upstream developers are overly restrictive. May be difficult for upstream developers to predict and guard against all possible modifications. Less effective for models that are released openly.
- Use Existing Laws or Voluntary Guidance: Clarify how existing laws (like tort law, which deals with civil wrongs causing harm) apply, or issue non-binding guidelines.
- Pros: Avoids creating entirely new regulatory regimes. Voluntary guidance is easier to introduce and less likely to cause companies to avoid a region. Tort law can potentially address unexpected risks after they cause harm.
- Cons: May not be enough to address the risks effectively. Voluntary guidance might not be widely adopted by the large and diverse group of downstream developers. Tort law can be slow to adapt, may require significant changes, and it can be hard to prove a direct link between a modification and harm.
Based on the sources, a balanced approach is likely needed. The recommendations suggest:
- Start by developing voluntary guidance for both upstream and downstream developers on best practices for managing these risks.
- When regulating upstream developers, include requirements for them to consider and mitigate risks from downstream modifications where feasible. This could involve upstream developers testing for modification risks, monitoring safeguards, and setting clear operating parameters.
- Meanwhile, monitor the downstream ecosystem to understand the risks and see if harms occur.
- If significant harms do arise from modified models despite these steps, then policymakers should be prepared to introduce targeted and proportionate obligations specifically for downstream developers who have the ability to increase risk to unacceptable levels.
This approach aims to manage risks without overly burdening innovation. The challenge remains how to define and target only those modifications that truly create an unacceptable level of risk, a complex task given the rapidly changing nature of AI customization.
39 episodes