Yet Another AI Assistant

Mehrdad Majzoobi
Jun 29
6 min read

With all the hype around Jony Ive’s new AI device with OpenAI, Rabbit R1 ’s mixed reception, defunct Humane AI pin, and other attempts to build the next breakthrough AI device, fundamental questions remain unanswered;

How would such a device function and look like?
Do we even need a new hardware device?
Are these devices going to make your phone obsolete?
How are they going to be different from Apple Home Pod, Google Home, Alexa, etc?

There are so many questions with little answers.

To find the answer to these questions, we have to take a step back and look at the limitations and issues with the current generation of devices first.

This post explores those gaps and shares my journey building Ubo Pod, an open source hackable personal AI assistant designed to address them.

Made in Secrecy

Big tech companies develop devices in secrecy for years without user input, often creating products misaligned with real needs.

While software can be updated post-launch, hardware design decisions are permanent once in production.

This inflexibility likely explains the Humane AI Pin's failure—early user testing might have prevented building something nobody wanted.

Privacy

Big tech devices arrive as black boxes with built-in microphones and cameras, entering our most private spaces.

Many of us have seen ads for topics we only discussed privately at the dinner table, making us wonder which device was listening.

Since both hardware and software are closed-source, it is very difficult to verify manufacturers' privacy claims both in hardware and software. One must either trust the brand or hope security researchers will attempt to verify them through reverse engineering.

Closed Source

Big tech devices are typically closed source, preventing tech-savvy users from customizing core functionality.

This lack of hack-ability leaves devices dependent on manufacturer updates and unable to evolve with users' specific needs.

Data and model ownership

We feed personal data to AI models, which companies use to improve their services and increase user engagement. However, you don't own the models trained on your data—this deliberate design locks you into specific providers.

What if instead, your personalized AI model stayed on your device and remained fully portable?

Requires Internet Connection

Most connected devices become nearly useless without internet, limiting their value in areas with poor connectivity like RVs, boats, or off-grid locations.

This dependence on cloud services, combined with closed source design, also raises significant privacy and security concerns.

Not Developer Friendly

The current devices' SDKs are too opinionated, language and platform specific.

For example, devices like Alexa and Apple HomePod do offer SDK such as HomeKit and Alexa Skills Kit (ASK).

They also impose a rigid boundary on what developers can and cannot do. For starters, developers cannot even change their branded wake word.

Limited function

Current AI assistants only perform pre-defined functions through command templates mapped to specific actions.

If you request something outside their programmed skills, they can't help—requiring large developer teams to build functionality for every possible use case.

Single mode

AI assistants have traditionally been voice-only, with vision recently added to some devices. However, additional sensors could enable more operational modes and context awareness.

Single agent

Existing devices use one agent triggered by a wake word, lacking the ability to define multiple specialized agents with their own triggers. Multi-agent systems could offer domain-specific experts fine-tuned for narrower task sets.

Single Service

Proprietary devices are tied to their company's cloud services (Alexa uses AWS, Google Home uses Google's services), creating vendor lock-in and the risk of service discontinuation bricking devices. Some also require monthly subscriptions.

Hardware specific

Devices designed by big tech companies are limited to the targeted specific hardware platform. There is often zero flexibility to DIY your hardware or choose a different hardware platform.

Lack of upgradability

Consumer devices typically last 3-5 years before disposal due to limited upgrade options—a profit-driven design that generates significant waste.

Introducing Ubo Pod

Hackable and Open Source Personal AI Assistant

Hackable, Open Source, and Modular Personal AI Assistant For Developers

Frustrated with proprietary assistant devices and struggle of building polished UX on non-mobile/desktop platforms, I decided to build Ubo Pod.

What started as a Raspberry Pi UX layer evolved into a multi-modal AI assistant platform for developers.

Users simply add API keys (obtained from prospective service providers) through our web interface, while Ubo Pro 5 supports local, cloud-free models with extensive customization options.

Developers can do shadow and deep customizations depending on their level of comfort.

Open source and hackable

Ubo is fully open source with a 3D-printable enclosure, though production units ship in sandblasted aluminum. Developers can contribute to all aspects—code, electrical, mechanical design, apps, and custom experiences.

Private

Ubo open source design lets you see exactly how everything works. We've included physical privacy controls like microphone switch and camera cover. Also, by running AI models locally, your data never leaves your device.

Modular, Upgradable, Repairable

Modularity is one of the core philosophies of the design. We apply that principle in both hardware and software. The modular design makes it even easier for AI engines to assist developers in doing customizations. I will get more into that in our upcoming design articles.

Besides, modularity makes it possible to update and repair devices without creating waste in the process.

Users can pay modest fees for component upgrades that extend device life and add new capabilities, or even 3D print some sub-component themselves.

Multi-agent, multi-modal, and multi-service

Multiple agents that can be summoned to respond to different requests. Each agent can have its own wake wake or trigger mechanism.

You can deploy multi-modal agents that can take text, audio, video/image, sensor readings as inputs.

Depending on the hardware model, you can run locally hosted AI models and or use any third-party Cloud AI/LLMs provider of choice, without being limited or tied to a specific service provider.

For example, you can use Picovoice for wake word detection, Assembly AI for Speech to Text, Claude’s Sonnet for LLM, and Eleventh Lab for text to speech. For a list of supported providers see below.

Example showing entering Picovoice API key via WebUI

Ubo Assistant is also capable of multi-turn conversations similar to this example.

Hardware agnostic

We are working to let users fully DIY your hardware or use off-the-shelf cameras, mics, speakers, etc with a mini PC or single board computer (SBC).

A fully virtualized instance can be deployed on a remote server down the road as well.

Programming Language Agnostic API

The core exposes a gRPC and soon an MCP server for developers. These are large tool sets that can be called through LLM tool/function calling capabilities.

This would also make it much easier to integrate assistant functionalities and multi-modal UX into third-party applications.

Tool calling

With a large set of tools and functions at its disposal, Ubo assistant will figure out how to use these tools and reinforce successful procedures. Users ask for something, the assistant tries to do it, if it succeeds then it will remember for the next time.

Hardware Acceleration (Available on Pi 5 only)

You can accelerate vision and voice AI functions with dedicated AI accelerators from Google Coral and Hailo. By using accelerators, it can process multiple video and audio streams in parallel.

The table below contrasts Ubo AI Assistant with current devices in the market.

Comparing Ubo Pod design with status quo AI assistant devices

Where can I buy this?

I am launching a Kickstarter campaign to bring this device to mass production. Right now we have a small inventory from our pilot production run which we use for beta testing and getting user feedback. You can reserve yours for $1 or sign up on pre-launch campaign to be amongst the very first to get your Ubo Pod.

Supported AI Services

The following services would be supported out-of-the-box. To use these services, you need to sign up with perspective service providers, accept their terms of service, and obtain an API key. More services will be supported down the road.

Category	Services
Speech-to-Text	AssemblyAI, Amazon Transcribe, Azure, Cartesia, Deepgram, Fal Wizper, Gladia, Google Cloud, Groq (Whisper), OpenAI Whisper (Local), Parakeet NVIDIA (Local), Picovoice Cheetah (Local), Vosk (Local), ,Ultravox
LLMs	Anthropic, AWS, Azure, Cerebras, DeepSeek, Fireworks AI, Gemini, Grok, Groq, NVIDIA NIM, Ollama, OpenAI, OpenRouter, Perplexity, Qwen, Together AI
Text-to-Speech	AWS, Azure, Cartesia, Deepgram, ElevenLabs, FastPitch NVIDIA (Local), Fish (Local), Google, LMNT, MiniMax, Neuphonic, OpenAI, Piper (Local), PlayHT, Rime, Sarvam, XTTS (Local), Picovoice Orca (Local)
Speech-to-Speech	AWS Nova Sonic, Gemini Multimodal Live, OpenAI Realtime
Transport	Daily (WebRTC), FastAPI Websocket, SmallWebRTCTransport, WebSocket Server, gRPC, Local
Video	Tavus, Simli
Memory	mem0
Vision & Image	fal, Google Imagen, Moondream
Audio Processing	Silero VAD (Local), Krisp, Picovoice Koala (Local), Noisereduce (Local)

Ubo software architecture

The diagram below shows Ubo software architecture. A more in-depth article on this will be published soon.

Block diagram architecture of Ubo core software

Resources

The official core software is already open-sourced on GitHub. You can also open a discussion on your GitHub repo to discuss any specific topic you have in mind. Below you can links to various design files for this project.