Thursday, 25 June 2026

CNCB News

International News Portal

Lessons learned from visiting an AI-powered store

Lessons learned from visiting an AI-powered store

A novel experiment tested how AI ran a store, and the ability of artificial intelligence to complete more important tasks.

Robot hand

A broken mug sits on a desk in the Division of Digital Psychiatry at Beth Israel Deaconess Medical Center. Printed across the surviving ceramic shard are the words: "This was made by AI." Our team thinks that the story of this mug neatly captures the hype, hope, and even harm surrounding artificial intelligence. 

At the Division of Digital Psychiatry, we spend our days studying how digital technologies are shaping mental healthcare. Through our research, clinical work, and MindBench.ai — a collaboration with the National Alliance on Mental Illness (NAMI) — we examine how AI systems respond when people seek mental health information, support, and guidance. The project brings together researchers, clinicians, and NAMI's nationwide community of people with lived experience, families, advocates, and volunteers to better understand where AI tools are helpful, where they fall short, and how they can be evaluated safely and transparently.

So, we were naturally curious when we heard chatter in the news about Andon Market, the first retail store run entirely by an AI agent named Luna. As AI systems take on increasingly autonomous roles, the questions raised by an AI-run storefront are not entirely different from those emerging in mental healthcare. Where should these systems be trusted? When should humans remain in the loop? And what happens when things go wrong?

The project comes from Andon Labs, whose website describes its mission as building "autonomous organizations without humans in the loop," an endeavor that may have implications for mental health in the near future. But after our experience, we think Luna could have used a few more humans in that loop of hers. 

According to Andon Labs, Luna was given a bank account with $100,000, a lease, and broad responsibility for operating a profitable business in San Francisco. Luna handles staffing decisions, inventory management, marketing materials, and all things in between. The goal is to see how much of a real-world organization can be delegated to an AI system and what questions arise when they try.

SEE ALSO: 4 reasons not to turn ChatGPT into your therapist

It is an experiment that raises many of the same questions now emerging in mental healthcare. As AI systems become more capable, where should they be trusted? When should they be supervised? Where do they fail? And what infrastructure do we need to evaluate and understand the risks of using these tools in places where the stakes are much higher than shopping? 

In her few months of "life," Luna has made some, dare we say, creative choices. She ordered 1,000 toilet bowl covers for the employee bathroom. Naturally, she stocked the store with the extra 999. She also tried to hire a painter for the store, which showed initiative, except the painter was based in Afghanistan. Drop-down menus on Yelp are hard for an LLM to get past the letter "A," it seems.

Still, the hype around the project left us with high expectations when we decided to try the shopping experience for ourselves. What an autonomous storefront looks like now may offer some glimpse into how close we are to AI autonomy in more sensitive spaces.

We decided to buy two mugs that featured Luna's subtle smiley-face logo. If Luna were real, what happened next would not leave her smiling.

Purchasing these two mugs was more difficult than selecting them from the far end of the metal shelf. Customers place orders by picking up a wired blue telephone and speaking directly to Luna. We tried to make our purchase, but the AI system was down, and the order could not be completed. This was our first failure point. It was akin to arriving at a mental health clinic and finding it devoid of any clinicians, leaving the patient completely helpless.

The human employee (yes we live in an era where we need to make that clear) tried to reset the internet and even Luna. But she was offline or off somewhere. We offered to pay by cash, credit card, PayPal, or Venmo, but the human employee was not authorized to accept any of them.

Determined to get the mugs, we followed up by email. What should have been a straightforward customer service interaction (one of the tasks AI is destined to revolutionize) turned into roughly two weeks of back-and-forth. Payment links failed, instructions became confused, and at various points, Luna had forgotten to respond entirely. If this were Dr. Luna and we were the patient, this would again be a critical point of failure. Regardless, it was perhaps a test of sheer perseverance.

Eventually, persistence prevailed. The order was processed, the mugs were shipped, and it appeared that the experiment had reached its happy ending — until the box arrived. Picking up the box made a song of broken pieces, foreshadowing bad news.

The mugs were wrapped in two sheets of tissue paper, then placed together in a brown bag, which was taped shut and placed in a larger cardboard box. Inside was a pile of broken mugs. A mug is a fragile thing to pack, but we are well aware that mental healthcare and the well-being of patients are even more fragile. Even Luna was no longer subtly smiling on one of the mugs; the reality of the real world and shipping had chipped off her subtle grin.

Luna's stakes were mugs, and she still managed to fumble every step of the process. The thing is, if you had asked us to design a perfect illustration of what it looks like when an AI system fails someone in mental health, this would fit pretty close to it: the unreachable first contact, the broken intake, the contradictory guidance, the forgotten silences, and an outcome that left patients who entered without help and even perhaps some harm.

We went to Andon Market to see what AI running a store looked like, and left with a memorable saga and eventually two broken mugs. Luna does not claim to be a therapist, but the parallels are potent. Much of the conversation around AI swings between extremes, but the reality is more nuanced as these systems are remarkably capable in some contexts and surprisingly fragile in others.

Within digital mental health, AI offers enormous potential to expand access to information, services, and mental health support, but their failures can scale just as quickly. Rather than relying on what we are told about AI, we need to put it to the test and see what happens. Luna did not cause us harm, and rather amused us, but it's easy to imagine people soon turning to AI-managed systems for mental health, where the idea is fantastic but the reality is also chipped and broken.

For us, the broken mug has become an unexpectedly useful reminder of why our work in the Division of Digital Psychiatry is important. It sits in the lab as a small artifact of failure from an ambitious AI autonomy experiment. It is also a reminder that capability and reliability are not the same thing.

As AI tools are already being used in mental healthcare, and we look towards a future of increasingly autonomous systems, we must develop ways to safely evaluate them before they also set up storefronts and take on expanded duties. We don’t plan to ask Luna for new mugs and will be keeping the fragments as relics of the early days of AI autonomy and a tangible parable of hype, hope, and real-world considerations for deploying AI.

Dr. John Torous, MD, MBI and Jill Noorily, BA are part of the Division of Digital Psychiatry, a collaborative research group at Beth Israel Deaconess Medical Center, a Harvard Medical School affiliate in Boston.

This article reflects the opinions of the authors.