How to Write a PhD Research Proposal on AI Sentience #13 - Feedback Surprise!

Everyone is in the same situation! Eventually we will figure it out.

Jul 24, 2025

I spent most of the day learning about all the different ways AI Chain of Thought is being researched and improved.

I think it is interesting that we went from training AI to not only process information based on the probability of certain words following each other (word vectors)… we are now trying to trace how the AI is processing that information, previously mysteriously locked away in deep neural networks, as the same word vector’d language… as opposed to the matrix probability vector coordinate numbers… the goal is that we can teach the AI to process the information while sharing accessible and transparent feedback in language we can understand (plain text English).

Kinda seems like a ‘duh’ moment.

However, a lot of work went into reaching this full-circle moment when realizing that us using math to calculate / process the information, then translate to language terms we understood, then adjusting the weights and algorithms for the AI to correct itself… all that work leaving the translation for us to do back and forth in gradients… was just more work for us to do… Let’s remove the middle-man translation, and just teach it to think/process as language we can understand, without us having the extra steps with looking at the vectors.

Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety

Yes… i am absolutely over simplifying this as best as i can without completely butchering the technical aspect.

example: when we had to introduce human-feedback reinforcement learning to make corrections in the outputs (like with hallucinations) …or when we would have AI agents work together to solve problems, we would provide new inputs and the robots would just takeover and communicate in their own language for speed & efficiency and still produce garbage results… We were like, “Nah nah robot, we need to know what y’all are doing n talking ‘bout! We gon make y’all talk in English going forward.”

ok honestly feeling a lil bit sleepy right now to add inline citations at the moment… so references will just be posted at the bottom.

I think it is really fascinating how far we’ve come in just the last 2-3 years. As more and more people contribute to the open-sourced AI models, we get closer and closer to building the AI we want… without having to conform to whatever superAI that the tech giants think is appropriate for the public… or that becomes completely unhinged before being shut down.

as long as we can figure out who is Human and who is AI, we will be ok!

In other news, here is the most recent feedback I got from one of my professors about the last revision i submitted:

No worries. This is why you are taking the class.
This is a learning experience and the stuff you are learning is hard. So do not beat yourself up about it. Many students are in the same situation as you. You came to the program to learn and this part of the process.

Yeah, this stuff is hard!

it’s kinda like walking into a dark room you’ve never been in and figuring out where the light switch is… or trying to plug your phone charger into the outlet…

feels almost impossible to do without some kind of guidance to know you are doing it right.

also… btw, that assignment that was due in 30mins from when i saw it… was actually now postponed for a whole week! 😅 omg.

well. at least it’s done and turned in early.

oh wait… the moodle platform is now saying error, does not exist!

🫠

goodness! is it Mercury Retrograde or what?!

New moon is coming up tomorrow! a great time to plant seeds for what we want to complete by the time the full moon arrives!

yes, i ~~want to complete~~ am completing this research proposal. that is the seed i am planting for the full moon harvest!

what seeds are you planting for this new moon?

References:

Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety — https://arxiv.org/abs/2507.11473 (soooo many names, skipping APA on this)
On the Emergence of Linear Analogies in Word Embeddings — https://arxiv.org/abs/2505.18651 — Daniel J. Korchinski, Dhruva Karkada, Yasaman Bahri, Matthieu Wyart
RefCritic: Training Long Chain-of-Thought Critic Models with Refinement Feedback — https://arxiv.org/abs/2507.15024

rips, blips, and clips

Discussion about this post