Yet Another AI Assistant
- Mehrdad Majzoobi
- Jun 29
- 6 min read
With all the hype around Jony Ive’s new AI device with OpenAI, Rabbit R1’s mixed reception, defunct Humane AI pin, and other attempts to build the next breakthrough AI device, fundamental questions remain unanswered;
How would such a device function and look like?
Do we even need a new hardware device?
Are these devices going to make your phone obsolete?
How are they going to be different from Apple Home Pod, Google Home, Alexa, etc?

There are so many questions with little answers.
To find the answer to these questions, we have to take a step back and look at the limitations and issues with the current generation of devices first.
This post explores those gaps and shares my journey building Ubo Pod, an open source hackable personal AI assistant designed to address them.
Made in Secrecy
Big tech companies develop devices in secrecy for years without user input, often creating products misaligned with real needs.

While software can be updated post-launch, hardware design decisions are permanent once in production.
This inflexibility likely explains the Humane AI Pin's failure—early user testing might have prevented building something nobody wanted.
Privacy
Big tech devices arrive as black boxes with built-in microphones and cameras, entering our most private spaces.
Many of us have seen ads for topics we only discussed privately at the dinner table, making us wonder which device was listening.

Since both hardware and software are closed-source, it is very difficult to verify manufacturers' privacy claims both in hardware and software. One must either trust the brand or hope security researchers will attempt to verify them through reverse engineering.
Closed Source
Big tech devices are typically closed source, preventing tech-savvy users from customizing core functionality.
This lack of hack-ability leaves devices dependent on manufacturer updates and unable to evolve with users' specific needs.
Data and model ownership
We feed personal data to AI models, which companies use to improve their services and increase user engagement. However, you don't own the models trained on your data—this deliberate design locks you into specific providers.
What if instead, your personalized AI model stayed on your device and remained fully portable?
Requires Internet Connection
Most connected devices become nearly useless without internet, limiting their value in areas with poor connectivity like RVs, boats, or off-grid locations.
This dependence on cloud services, combined with closed source design, also raises significant privacy and security concerns.
Not Developer Friendly
The current devices' SDKs are too opinionated, language and platform specific.
For example, devices like Alexa and Apple HomePod do offer SDK such as HomeKit and Alexa Skills Kit (ASK).
They also impose a rigid boundary on what developers can and cannot do. For starters, developers cannot even change their branded wake word.
Limited function
Current AI assistants only perform pre-defined functions through command templates mapped to specific actions.
If you request something outside their programmed skills, they can't help—requiring large developer teams to build functionality for every possible use case.
Single mode
AI assistants have traditionally been voice-only, with vision recently added to some devices. However, additional sensors could enable more operational modes and context awareness.
Single agent
Existing devices use one agent triggered by a wake word, lacking the ability to define multiple specialized agents with their own triggers. Multi-agent systems could offer domain-specific experts fine-tuned for narrower task sets.
Single Service
Proprietary devices are tied to their company's cloud services (Alexa uses AWS, Google Home uses Google's services), creating vendor lock-in and the risk of service discontinuation bricking devices. Some also require monthly subscriptions.
Hardware specific
Devices designed by big tech companies are limited to the targeted specific hardware platform. There is often zero flexibility to DIY your hardware or choose a different hardware platform.
Lack of upgradability
Consumer devices typically last 3-5 years before disposal due to limited upgrade options—a profit-driven design that generates significant waste.
Introducing Ubo Pod
Hackable and Open Source Personal AI Assistant

Frustrated with proprietary assistant devices and struggle of building polished UX on non-mobile/desktop platforms, I decided to build Ubo Pod.
What started as a Raspberry Pi UX layer evolved into a multi-modal AI assistant platform for developers.
Users simply add API keys (obtained from prospective service providers) through our web interface, while Ubo Pro 5 supports local, cloud-free models with extensive customization options.
Developers can do shadow and deep customizations depending on their level of comfort.
Open source and hackable
Ubo is fully open source with a 3D-printable enclosure, though production units ship in sandblasted aluminum. Developers can contribute to all aspects—code, electrical, mechanical design, apps, and custom experiences.
Private
Ubo open source design lets you see exactly how everything works. We've included physical privacy controls like microphone switch and camera cover. Also, by running AI models locally, your data never leaves your device.
Modular, Upgradable, Repairable
Modularity is one of the core philosophies of the design. We apply that principle in both hardware and software. The modular design makes it even easier for AI engines to assist developers in doing customizations. I will get more into that in our upcoming design articles.
Besides, modularity makes it possible to update and repair devices without creating waste in the process.
Users can pay modest fees for component upgrades that extend device life and add new capabilities, or even 3D print some sub-component themselves.
Multi-agent, multi-modal, and multi-service
Multiple agents that can be summoned to respond to different requests. Each agent can have its own wake wake or trigger mechanism.
You can deploy multi-modal agents that can take text, audio, video/image, sensor readings as inputs.
Depending on the hardware model, you can run locally hosted AI models and or use any third-party Cloud AI/LLMs provider of choice, without being limited or tied to a specific service provider.
For example, you can use Picovoice for wake word detection, Assembly AI for Speech to Text, Claude’s Sonnet for LLM, and Eleventh Lab for text to speech. For a list of supported providers see below.

Ubo Assistant is also capable of multi-turn conversations similar to this example.
Hardware agnostic
We are working to let users fully DIY your hardware or use off-the-shelf cameras, mics, speakers, etc with a mini PC or single board computer (SBC).
A fully virtualized instance can be deployed on a remote server down the road as well.
Programming Language Agnostic API
The core exposes a gRPC and soon an MCP server for developers. These are large tool sets that can be called through LLM tool/function calling capabilities.
This would also make it much easier to integrate assistant functionalities and multi-modal UX into third-party applications.
Tool calling
With a large set of tools and functions at its disposal, Ubo assistant will figure out how to use these tools and reinforce successful procedures. Users ask for something, the assistant tries to do it, if it succeeds then it will remember for the next time.
Hardware Acceleration (Available on Pi 5 only)
You can accelerate vision and voice AI functions with dedicated AI accelerators from Google Coral and Hailo. By using accelerators, it can process multiple video and audio streams in parallel.
The table below contrasts Ubo AI Assistant with current devices in the market.

Where can I buy this?
I am launching a Kickstarter campaign to bring this device to mass production. Right now we have a small inventory from our pilot production run which we use for beta testing and getting user feedback. You can reserve yours for $1 or sign up on pre-launch campaign to be amongst the very first to get your Ubo Pod.
Supported AI Services
The following services would be supported out-of-the-box. To use these services, you need to sign up with perspective service providers, accept their terms of service, and obtain an API key. More services will be supported down the road.
Category | Services |
Speech-to-Text | AssemblyAI, Amazon Transcribe, Azure, Cartesia, Deepgram, Fal Wizper, Gladia, Google Cloud, Groq (Whisper), OpenAI Whisper (Local), Parakeet NVIDIA (Local), Picovoice Cheetah (Local), Vosk (Local), ,Ultravox |
LLMs | Anthropic, AWS, Azure, Cerebras, DeepSeek, Fireworks AI, Gemini, Grok, Groq, NVIDIA NIM, Ollama, OpenAI, OpenRouter, Perplexity, Qwen, Together AI |
Text-to-Speech | AWS, Azure, Cartesia, Deepgram, ElevenLabs, FastPitch NVIDIA (Local), Fish (Local), Google, LMNT, MiniMax, Neuphonic, OpenAI, Piper (Local), PlayHT, Rime, Sarvam, XTTS (Local), Picovoice Orca (Local) |
Speech-to-Speech | |
Transport | Daily (WebRTC), FastAPI Websocket, SmallWebRTCTransport, WebSocket Server, gRPC, Local |
Video | |
Memory | |
Vision & Image | |
Audio Processing |
Ubo software architecture
The diagram below shows Ubo software architecture. A more in-depth article on this will be published soon.

Resources
The official core software is already open-sourced on GitHub. You can also open a discussion on your GitHub repo to discuss any specific topic you have in mind. Below you can links to various design files for this project.
Comentários