Open-source AI trimmed for efficiency produced detailed bomb-making instructions and other bad responses before retraining
UCR researchers retrain AI models to keep safety intact when trimmed for…
UCR researchers retrain AI models to keep safety intact when trimmed for…
Sign in to your account