12 Comments
User's avatar
Pedro Avila's avatar

Is the mitigation for instrumental reasoning level 1 effectively just "scratchpads" and if so, do we have a means of knowing that what's being output to the scratchpad is a direct line to the AI's "subconscious", or if it can be telling us what we want to hear there as well?

Expand full comment
Andy X Andersen's avatar

The safety plans of AI companies are very preliminary, because we are very early in the process.

All we have so far are LLM, that predict most likely action based on training data. They can be quirky, but not too competent, and lack a solid model of what they are dealing with that goes beyond text, and maybe images.

Next, there will be agents. Those will likely have more interaction with the real world, can use tools, can iterate, do some reasoning, maybe even gain feedback. But even these are likely not going to be too bright.

As such, while we should be mindful that the industry is moving fast, there's likely still way to go.

The focus for now is best on thorough reliability testing as with regular software.

Expand full comment
Richard's avatar

But others are saying, no, the process is much further along. The article quotes Dario Amodei saying ASL-4 by next year. Genuine question, not meant confrontationally, but why would I (a mere passer-by everyday moron) believe you and not him?

Expand full comment
Andy X Andersen's avatar

I think there is a lot of hype by startups that dearly need funding.

Expand full comment
Richard's avatar

That's not a very persuasive response

Expand full comment
Andy X Andersen's avatar

Look at the history. Hype people usually fail. Overeager companies crash and burn. Every time they think this time is different. Either way, we will see soon enough.

Expand full comment
Pedro Avila's avatar

Shouldn't we expect that agents that can gain feedback on their own but are not too bright quickly become brighter? I think this should be a given, but it's totally possible I'm missing something.

Expand full comment
Andy X Andersen's avatar

If we have a good architecture, yes. It will likely take a few iterations. Chatbots primarily know language, and that is not enough. Improving the modeling will take time.

Expand full comment
Richard's avatar

But others are saying, no, the process is much further along. The article quotes Dario Amodei saying ASL-4 by next year. Genuine question, not meant confrontationally, but why would I (a mere passer-by everyday moron) believe you and not him?

Expand full comment
Andy X Andersen's avatar

Dario Amodei is a startup founder. His startup will go under unless he makes a profit by next year.

The range of opinions on when AI will arrive is wide, and people like Demis Hassabis and Yann LeCun are more conservative.

Then, the problems are hard. Historically, large-scale infrastructure projects took many years to go from prototype to finished product.

Expand full comment
Pedro Avila's avatar

I agree there is a lot to be confused about in terms of knowing anything for certain. Seems to me like we should assume that we're as far along as anyone is saying (or further), that way we don't under-prepare, no?

Expand full comment
Andy X Andersen's avatar

It is very hard to prepare for AI. So far, the deployment in the real world is close to zero. Chatbots that respond to questions did not move the needle much.

As in previous waves of tech, deployment and refinement will take time, and it one cannot prepare for what does not exist.

Expand full comment