The technology to start capturing all the audio around you is ready. Someday soon, you will be able to train an AI with these captures and it will know everything that you have heard. The sooner you start capturing, the farther back your new enhanced audio memories will stretch.
Here is how to start capturing…
Buy this magical little device…
I’ve updated the link above to point to what seems to be the same device as the original but with a different name, and it is currently on sale for $31. I’ve also tested this alternate device…
…which works just as well, but is about twice as large and costs more. It does have a USB-C port rather than micro-USB which is nice. My suggested settings for it are here.
Wow, it looks like the Amazon listing for this product was removed immediately after I published this article! I wonder what happened there? It looks like it is still available on ebay and Walmart.ca (but not walmart.com?). Try this link and see what you get…
I have a couple of the similar devices that are still on Amazon, but none are as good as the Hfuear. I’ll add a comment soon when I find a suitable replacement!
There are other devices like this, but this is the best one I have found so far (at least for my personal use patterns).
- small enough to fit in a breast pocket
- has enough battery to last several days between charges
- has enough memory to last more than a week between downloads
- has proven to be reliable with more than good-enough sound quality
If you find one that is better, please let me know.
Configure the device (optional)
Plug it into a computer and a new USB drive will show up. Edit the “factory.txt” file and put today’s time and date in there. I like to use UTC time which you can get at time.gov. The time will get set the moment you save the file.
You can also change recording quality, but the default is fine. Bits are cheap.
You can also change the auto voice activation level, but I strongly prefer to use continuous recording so this setting does not matter to me. Bits are cheap.
Note that having an accurate timestamp on the files this just makes it easier to manage them. But if you follow my daily routine below then you will also have the date/time recorded inside the voice stream.
- Unplug from computer or charger.
- Move switch to continuous recording mode (🎙️).
- Wait for the red LED to stop blinking. This takes a couple of seconds.
- Say today’s time and date out loud. Optionally say where you are if you travel. Optionally read the headline from your favorite widely circulated news source if you want to be able to later establish “earliest possible creation date” verifiability.
- Drop the device into your breast pocket. If you do not have a breast pocket, you can also keep it in your front pants pocket but the voice quality will not be as good. You can also wear as a necklace or a broach if you are brave.
- Move the switch to the off position ().
- Wait for the red LED to turn off. (This is important, if you plug it in before it is done shutting down it can corrupt the drive.)
- Plug into a USB port on your computer or into a USB charger (note that you can for at least 2 days without charging, probably more).
- Drag move the files from the
RECORDdirectory on the device to a folder on your computer. (Note that you can skip doing this for several weeks at a time, with the risk that if you lose the device then you will lose everything in it, and a person who finds it will have access to everything in it.)
All day, every day
- Remember that everything you say matters. When you are about to say something bad about someone, ask yourself if you want that statement on your permanent record (even if that record is only available to future you). Same for exaggerations, and outright lies. Embrace this discipline to be a better, more consistent, and more trustworthy person.
- Repeat back important facts. When you meet someone, say their name back to them. When someone says something important or surprising, restate it in your own words. This will likely improve the durability of your internal memory as well.
- Makeup catch phrases and vocalize them to mark special internal or visual-only moments. “hashnote wow I just saw the most beautiful sunset ever”. “hashnote ouch i have a bad headache this morning”. “hashnote damn that red corvette almost just hit me!”. A great novel has a great narrative, so should your life. Someday these catch phrases will add a layer of meaning and exposition to whoever (more likely whatever) is processing these files.
That’s it. Really.
Your external audio memories start today. You can figure out all the details later. In the very near future it will become very easy to use these files that you are accumulating in all sorts of useful ways, but your future retrieval and processing system can only go back as far as your capture files do- so do not wait to start capturing them!
Extra credit for nerds
Store your captured files someplace where they will not be lost if your hard drive crashes. This is very cheap and easy with something like google drive or apple cloud. You will be generating about 1Gb per day with the default voice quality settings and no additional compression.
Compress your files to save space. ffmpg+
libopus gives me about 60%-70% compression but is lossy. But do not over think this because bits are cheap.
Encrypt your files to make sure no one but you can access them. If this is your thing then you already know how to
gpg. But for most people, these are probably not the highest value targets on your hard drive so whatever you do to protect everything else should be good enough (you are doing something to protect everything else, right?).
Make a hash of each batch of files and publically timestamp it. This establishes a verifiable “existed before” date for your captures. If you also followed my advice to read a newspaper headline each morning, then you have the ability to verify when a recording was created to within a day or two. This is potentially very valuable, so if you do not know how to do this and want to LMK and I’ll give you guidance.
Use the text corpus to fine tune any of the fine LLMs to either be able to answer questions (“what was joe’s wife’s last name again?”) or (with a big enough data set and a decent voice cloner) be a backup you.
Make a script to do all the above steps automatically every time you plug the device into your computer.
Do all this on a little RaspberryPi so you can keep it next to your bed.
Why would I even want to record what I say?
On a practical level, it can be very handy to be able to ambiently collect vast amounts of high-fidelity knowledge with almost no effort. Even my current primitive iteration of a retrieval system saves me from lots of tedious of note taking. I look forward to the very near future when I can feed this knowledge base into an LLM so that it just knows massive amounts of information about my life and will be able to use this to augment and leverage my memory. Most people will take a long time to realize how powerful this tool is or worse, they will resist it and suffer a huge and needless disadvantage.
On an epistemological level, knowing the objective truth is fundamentally good. If there is ever controversy about what I may have said in the past, I want to know if my internal memory is correct or if it is lying to me. Without knowing the truth, it is hard for me to make corrections and improvements. Reputation also increasingly matters, so I want to make sure that I am right as often as possible and self-correct as quickly as possible when I am wrong so that people will believe me.
On an emotional level, I am very sentimental about my memories. I love looking at photographs of my daughter’s 6th birthday party and I similarly love listening to her little voice describe her first day of school. Why are visual memories worth cherishing and recording, but audio memories are not?
When will all this audio data that I am capturing become useful to me?
It is already possible today to extract value from this data, it is just cutting edge technology and requires a lot of effort. If you are asking this question, then it is probably not worth the effort for you yet. But I think that within 5 years this will become completely mainstream. You will then be able to easily import all of the data you collect now into your system and extend your memory back in time farther than most other people.
But people freak out when you record them
This is probably the best argument against using this system. When I started playing with creating memex systems for myself, my kids were young and I didn’t want to ruin their social lives so I ended up pulling way back on when and how I preserved my audio memories.
But now I am old and so the only person’s social life I am capable of destroying is my own, so here we are. If you are my friend and you didn’t know about this project (or at least could not have predicted it based on everything you know about me) and you are now upset… well, I am sorry. If you want me to delete all my data that happens to capture your past verbal communications with me, let me know. Also please do let me know if you also want me to delete all the emails and texts I’ve ever gotten from you (I have them ALL) and any photographs I’ve ever taken of you or journal notes that include you (there are LOTS). Do you also want me try to suppress any fond biological memories I have of the times we’ve spent together?
But why do people freak out at the idea of being recorded?
People seem not to have a problem with talking with someone who has a very good memory. They usually have no problem with talking to someone who later goes home and writing journal notes. They probably even have no problem with someone taking contemporary notes during a conversation. But as soon as you capture the waveform that represents the changing pressure levels of the air coming out of their mouth, then they sometimes get very upset.
Maybe this is just a general fear of technology? Brains and pens and paper they understand, but MEMS microphones and audio CODECs are scary. Maybe a bit.
But I think in the general case, people are actually afraid of not being able to disavow something they said. We are used to being able to say different (and conflicting) things to different people in different contexts, and even being able to tell different things to *ourselves* when it suits us. I can easily object that your notes or even a transcript are biased, cherry picked, or even fake – but an audio recording of my actual voice that captures exactly everything I said with all my intonations is very hard to disown (at least until very recently! 😊 ).
So why do we all want so badly to be able to talk out of both sides of our mouths with impunity? This could be its own multipage article! But my short take is: the world is a hard place to understand and the social world rewards people who have strong opinions – even in the face of uncertainty and conflicting interests. I would argue that this is fundamentally bad and we should use technology to mitigate the crappy instincts that nature has given us.
Are there any reasons for not wanting high fidelity memories?
I think there are some interesting edge cases, but to be clear I do not believe they are practically important. If you are using any of these as your justification for recording ick, it is probably an excuse to avoid facing the hard questions.
- Trama. It is possible that some memories are just bad and you are better off without them. There is nothing you can learn from them and they only hold you back and cause you pain.
- There are some things you are just better off not knowing. This is an extremely interesting and important topic, and no one has done more or better thinking on it than Cass Sunstein, so I will defer to his book on it. Read it (or at least watch some free videos). These problems will only get more important over time as our collective and individual capacities to know things get more powerful.
- Embarrassment. Sometimes we all say things we are ashamed of. One reaction is to try to forget these and pretend they never happened. Another is to see these as learning opportunities.
- Shiftiness. You like being able to tell different people (including yourself) different things at different times with impunity and have no interest in presenting (or even having) a consistent worldview.
I will say that as time goes on it will be harder and harder for you to maintain your ignorance and/or shiftiness. Much of what you say today is already recorded by others and going forward this will continue to increase. We are also just on the verge of anyone being able to produce a compelling recording of you saying anything they want you to say. Having a large, consistent, and trustworthy oeuvre of everything you’ve ever actually said is a great advantage in these situations.
But what about secrets?
When you tell your pin code to your banker, you do not want anyone else to be able to hear it. Same goes for your comments during the deliberations of the Committee of Secret Correspondence.
The solution to this is to try to keep your external memories as secure as your internal ones. Encryption and obscurity can help here, but fundamentally under current US law you can be compelled to disclose your external memories while your internal ones are specially protected (except the ones that secure your external ones!). I do not have a good answer for this but as our external memories continue to grow in size and importance, so will this difficult issue.
Why not do contemporaneous transcription?
You need the recognition engine accuracy not only to be much better than a human, but you also want it to capture speaker identity across time, speaker tone, and incidental sounds. Maybe someday, but we are not there yet. Also, a transcription is a lossy compression and audio data is relatively small, so why settle for the sheet music when you can listen to the symphony? And bits are free.
Why not do contemporaneous and continuous LLM training/fine tuning?
We already do this, just with not-so-great wetware. Doing this on more capable and reliable hardware is the holy grail, but we are not there yet. My guess is that systems that claim to do this today are actually behind the scenes making an audio recording, then doing voice-to-text on that recording, and then feeding that text into an LLM as context window, and then maybe later using the flat text again for fine tuning. This is cool, but it is still capturing (and then discarding!) an audio recording and so suffers from all the same issues that come with that. Also, because the LLM is not being actually trained as the new data comes in, running context is lost and understanding will suffer.
Our brains not only forward train in real time as we listen, they can even go back in time and re-evaluate sound that we heard a few seconds ago and reprocess it using information that came later in time. This is what is happening when someone says something that you mishear or do not understand, but then after the next sentence gives you more context you actually retroactively hear the previous sentence correctly. And that context window can be your whole life. If you hear some gibberish, and then realize it was your aunt who has a strong accent, you brain can use everything that you know about your aunt and her story and her accent to reprocess the gibberish into coherent speech. Amazing stuff, and AI has nothing like this (yet)!
I think it is better to start saving the good raw data now and wait for the processing tools to catch up, rather than try to contemporaneously process now and be stuck with the low quality artifacts that today’s processing can create.
Isn’t it illegal to record other people talking?
It depends on where you live and the situation. In most states (like New York where I live), as long as one party consents (me), then it is legal. Even in the few states that are not “one party consent”, it is likely that “incidental” recordings (unintentionally recording two people talking as they walk past you) and recordings of non-confidential conversations are fine. Lots of info here…
BUT I AM NOT A LAWYER.
Isn’t it illegal to record copyrighted materials?
Maybe not when you are making an incidental recording of the copyrighted material with no plans to profit commercially from it, distribute it, or even ever listen to it yourself.
To be safe, you can always just turn off your device on your way into the Taylor Swift concert.
BUT I AM NOT A LAWYER.
When I first started working on OrsonEar (my first attempt at capturing audio for memex purposes) many years ago, I also started the companion project OrsonEye to capture visual information. It was *way* ahead of its time and turned out to be amazingly useful – but these days I just have a bunch of Nest cameras that suck but are good enough. I played around with body-mounted memex cameras, but they attracted so much negative attention (even at nerd festivals where you’d expect people would be cooler about it). I could have warned Google about the glasshole problem long before they started working on Glass. If you pull out a Leica rangefinder, people will climb over each other to get in front of the lens but if you try to wear a crappy digital camera on your person then people want to punch you in the face. Why?
But you should get in the habit of taking lots and lots of photos with your phone. I can take a photo and have the phone back in my pocket in about a second. Do not worry about taking too many photos. Bits are cheap and someday soon it will be very easy to productively deal with your huge image collection. Turn on photo geotagging so you also record where each photo is taken.
Also turn on timeline in google maps and occasionally export your KML files. This data is so rich and so easy to acquire. Do not delete non-spam emails. Export as much of your messaging histories as possible.
Hoard your data. Bits are cheap and very soon it will be very easy to extract huge amounts of value from them!
If anyone cares, I can do a tear-down of this specific device. It is very simple- basically a microphone, a battery, and a flash memory.
If a lot of people care (500+?), I could make a better version of this device with…
- Probably could be about 1/2 the size
- Longer battery life
- Possibly better microphone and acoustic path (although the current one works way better than I would have expected!)
- On-device, real-time encryption
- More accurate time keeping
- Record always, even while charging
- Better file format than WAV
- BLE link to a phone so, say, an app could automatically turn recording on and off based on things like location
- BLE beacon that broadcasts a running hash chain, combined with an app that externally timestamps these hashes to further narrow the verifiable creation time window. Maybe the app also listens for, stores, and signs other peoples’ broadcasts too, creating a web of trust that further increases the trustworthiness of captures
- Inductive charging
- Ability to add an external microphone (like a lav)
- Include a dock that automatically downloads and processes capture data
Everyone agrees that it is a tragedy when memories are lost due to disease. It will take some time before most people recognize the tragedy of memories lost due to indifference and phobia.