top of page

Yet Another AI Assistant

With all the hype around Jony Ive’s new AI device with OpenAI, Rabbit R1’s mixed reception, defunct Humane AI pin, and other attempts to build the next breakthrough AI device, fundamental questions remain unanswered;


  • How would such a device function and look like? 

  • Do we even need a new hardware device?

  • Are these devices going to make your phone obsolete?

  • How are they going to be different from Apple Home Pod, Google Home, Alexa, etc?


Race to make the third device
Race to make the third device

There are so many questions with little answers. 


To find the answer to these questions, we have to take a step back and look at the limitations and issues with the current generation of devices first. 


This post explores those gaps and shares my journey building Ubo Pod, an open source hackable personal AI assistant designed to address them.


Made in Secrecy


Big tech companies develop devices in secrecy for years without user input, often creating products misaligned with real needs.


While software can be updated post-launch, hardware design decisions are permanent once in production.


This inflexibility likely explains the Humane AI Pin's failure—early user testing might have prevented building something nobody wanted.


Privacy


Big tech devices arrive as black boxes with built-in microphones and cameras, entering our most private spaces. 


Many of us have seen ads for topics we only discussed privately at the dinner table, making us wonder which device was listening. 



Since both hardware and software are closed-source, it is very difficult to verify manufacturers' privacy claims both in hardware and software. One must either trust the brand or hope security researchers will attempt to verify them through reverse engineering.


Closed Source 


Big tech devices are typically closed source, preventing tech-savvy users from customizing core functionality.


This lack of hack-ability leaves devices dependent on manufacturer updates and unable to evolve with users' specific needs.


Data and model ownership


We feed personal data to AI models, which companies use to improve their services and increase user engagement. However, you don't own the models trained on your data—this deliberate design locks you into specific providers.


What if instead, your personalized AI model stayed on your device and remained fully portable?

 

Requires Internet Connection


Most connected devices become nearly useless without internet, limiting their value in areas with poor connectivity like RVs, boats, or off-grid locations.


This dependence on cloud services, combined with closed source design, also raises significant privacy and security concerns.

 

Not Developer Friendly 


The current devices' SDKs are too opinionated, language and platform specific.


For example, devices like Alexa and Apple HomePod do offer SDK such as HomeKit and Alexa Skills Kit (ASK).


They also impose a rigid boundary on what developers can and cannot do. For starters, developers cannot even change their branded wake word.


Limited function


Current AI assistants only perform pre-defined functions through command templates mapped to specific actions.


If you request something outside their programmed skills, they can't help—requiring large developer teams to build functionality for every possible use case.


Single mode


AI assistants have traditionally been voice-only, with vision recently added to some devices. However, additional sensors could enable more operational modes and context awareness.


Single agent 


Existing devices use one agent triggered by a wake word, lacking the ability to define multiple specialized agents with their own triggers. Multi-agent systems could offer domain-specific experts fine-tuned for narrower task sets.


Single Service


Proprietary devices are tied to their company's cloud services (Alexa uses AWS, Google Home uses Google's services), creating vendor lock-in and the risk of service discontinuation bricking devices. Some also require monthly subscriptions.


Hardware specific 


Devices designed by big tech companies are limited to the targeted specific hardware platform. There is often zero flexibility to DIY your hardware or choose a different hardware platform. 


Lack of upgradability


Consumer devices typically last 3-5 years before disposal due to limited upgrade options—a profit-driven design that generates significant waste. 


Introducing Ubo Pod

Hackable and Open Source Personal AI Assistant 


Hackable, Open Source, and Modular Personal AI Assistant For Developers
Hackable, Open Source, and Modular Personal AI Assistant For Developers

Frustrated with proprietary assistant devices and struggle of building polished UX on non-mobile/desktop platforms, I decided to build Ubo Pod.


What started as a Raspberry Pi UX layer evolved into a multi-modal AI assistant platform for developers.


Users simply add API keys (obtained from prospective service providers) through our web interface, while Ubo Pro 5 supports local, cloud-free models with extensive customization options.

Developers can do shadow and deep customizations depending on their level of comfort. 


Open source and hackable

Ubo is fully open source with a 3D-printable enclosure, though production units ship in sandblasted aluminum. Developers can contribute to all aspects—code, electrical, mechanical design, apps, and custom experiences.


Private


Ubo open source design lets you see exactly how everything works. We've included physical privacy controls like microphone switch and camera cover. Also, by running AI models locally, your data never leaves your device.


Modular, Upgradable, Repairable


Modularity is one of the core philosophies of the design. We apply that principle in both hardware and software. The modular design makes it even easier for AI engines to assist developers in doing customizations. I will get more into that in our upcoming design articles.


Besides, modularity makes it possible to update and repair devices without creating waste in the process.


Users can pay modest fees for component upgrades that extend device life and add new capabilities, or even 3D print some sub-component themselves.


Multi-agent, multi-modal, and multi-service 


Multiple agents that can be summoned to respond to different requests. Each agent can have its own wake wake or trigger mechanism. 


You can deploy multi-modal agents that can take text, audio, video/image, sensor readings as inputs. 


Depending on the hardware model, you can run locally hosted AI models and or use any third-party Cloud AI/LLMs provider of choice, without being limited or tied to a specific service provider. 


For example, you can use Picovoice for wake word detection, Assembly AI for Speech to Text, Claude’s Sonnet for LLM, and Eleventh Lab for text to speech. For a list of supported providers see below.


Example showing entering Picovoice API key via WebUI
Example showing entering Picovoice API key via WebUI

Ubo Assistant is also capable of multi-turn conversations similar to this example.


Hardware agnostic 


We are working to let users fully DIY your hardware or use off-the-shelf cameras, mics, speakers, etc with a mini PC or single board computer (SBC).


A fully virtualized instance can be deployed on a remote server down the road as well. 


Programming Language Agnostic API


The core exposes a gRPC and soon an MCP server for developers. These are large tool sets that can be called through LLM tool/function calling capabilities.


This would also make it much easier to integrate assistant functionalities and multi-modal UX into third-party applications. 


Tool calling


With a large set of tools and functions at its disposal, Ubo assistant will figure out how to use these tools and reinforce successful procedures. Users ask for something, the assistant tries to do it, if it succeeds then it will remember for the next time. 


Hardware Acceleration (Available on Pi 5 only)


You can accelerate vision and voice AI functions with dedicated AI accelerators from Google Coral and Hailo. By using accelerators, it can process multiple video and audio streams in parallel.


The table below contrasts Ubo AI Assistant with current devices in the market. 


Comparing Ubo Pod design with status quo AI assistant devices
Comparing Ubo Pod design with status quo AI assistant devices

Where can I buy this?


I am launching a Kickstarter campaign to bring this device to mass production. Right now we have a small inventory from our pilot production run which we use for beta testing and getting user feedback. You can reserve yours for $1 or sign up on pre-launch campaign to be amongst the very first to get your Ubo Pod. 


Supported AI Services


The following services would be supported out-of-the-box. To use these services, you need to sign up with perspective service providers, accept their terms of service, and obtain an API key. More services will be supported down the road.

Category

Services

Speech-to-Text

LLMs

Text-to-Speech

Speech-to-Speech

Transport

Daily (WebRTC), FastAPI Websocket, SmallWebRTCTransport, WebSocket Server, gRPC, Local

Video

Memory

Vision & Image

Audio Processing

Ubo software architecture


The diagram below shows Ubo software architecture. A more in-depth article on this will be published soon. 


Block diagram architecture of Ubo core software
Block diagram architecture of Ubo core software

Resources


The official core software is already open-sourced on GitHub. You can also open a discussion on your GitHub repo to discuss any specific topic you have in mind. Below you can links to various design files for this project.


 
 
 

Comentários


bottom of page