blog post
It’s a New World
Imagine a world where robots learn complex tasks not just through algorithms, but with a little help from us, everyday people, even if we’re not experts and even if we make mistakes. It sounds like something out of a science fiction novel, but researchers from MIT, Harvard University, and the University of Washington are turning this into reality with their new approach, called Human Guided Exploration (HuGE), which also happened to be my boss’s favorite word.
Traditionally, teaching an AI agent, like a robot, to perform tasks such as opening a kitchen cabinet has relied heavily on reinforcement learning. This process involves a trial-and-error method where the agent is motivated by rewards for actions that edge it closer to its goal. But designing this reward function has always been a challenging task, often requiring the expertise of a human who needs to constantly update it as the AI explores various actions. This process, while effective, can be time-consuming and difficult to scale, especially for complex tasks.
HuGE Downsides
Enter HuGE, a groundbreaking method that doesn’t depend on these expertly designed reward functions. Instead, it uses crowdsourced feedback from many non-expert users to guide the AI agent. This method is a game-changer as it enables the AI to learn much faster, even though the crowdsourced data are often riddled with errors. Where other methods might stumble with these inaccuracies, HuGE thrives.
The first question that comes to mind, will this pass the snicker test with the European regulators who just passed a regulation insisting on transparency with how LLMs are trained. We will soon discover, I believe and have all along, that attempting regulate AI is a fool’s errand. How many slippery slopes can YOU count? Stay tuned.
One of the most striking aspects of HuGE is its ability to gather feedback asynchronously, allowing people from around the globe to contribute to the AI’s learning process. Imagine a robot in your home learning to carry out specific tasks quickly, guided by this crowdsourced, non-expert feedback. It’s a method that teaches the AI what to explore, rather than dictating the exact actions needed to complete a task. This approach is more forgiving of human errors and inaccuracies in supervision, allowing the AI to learn effectively through exploration.
Gaming the Crowd
And the odds of gaming the crowdsourcing are what? 100:1? It always amazes and surprises me that most adults seem to want to believe the best in the human psyche to the extent that they are willing to gamble critical outcomes like these upon that thesis.
The practical applications of HuGE are vast (or Huge). Researchers tested it by training robotic arms to draw the letter “U” and to pick and place objects, with data crowdsourced from 109 non-expert users across three continents. The results were impressive. HuGE helped these agents achieve their goals faster than other methods.
But this isn’t just about speed. It’s about making AI learning accessible and scalable. This method opens up possibilities for AI agents to learn from various forms of communication, including natural language and physical interactions.
It is Possible
The future of AI learning may look promising with HuGE, where robots could potentially learn to perform multiple tasks and even autonomously reset their environments to continue learning. This method is a step towards aligning AI agents more closely with human values, ensuring that as they learn and grow, they do so in a way that resonates with our needs and preferences.
Yet, I fear that without some strong guidance from folks like Sam Altman and Elon Musk, HuGE may instead represent a leap backward, using its native smart bridge between human intuition and robotic efficiency, but duped into deterministic intention by folks with unique political agendas.
The care with which our governing bodies, which don’t yet exist, must exercise diligence in over-seeing these sorts of movements, is the reason we need objective town squares in which everyone can share their observations, reasoning and intentions.
Author
Steve King
Managing Director, CyberEd
King, an experienced cybersecurity professional, has served in senior leadership roles in technology development for the past 20 years. He has founded nine startups, including Endymion Systems and seeCommerce. He has held leadership roles in marketing and product development, operating as CEO, CTO and CISO for several startups, including Netswitch Technology Management. He also served as CIO for Memorex and was the co-founder of the Cambridge Systems Group.