How we used Snackable in this blog post
Chapters, to give context
Our AI got to work and broke the full recording into topical chapters, creating a table of contents. So if “audio search and accessibility” is what you truly care about, jump right to minute 12. Imagine how much time you’d save for your listeners if you started running your episodes through Snackable!
Highlights, ready for social media
Next, our AI chose a few short highlights that we packaged into an Audiogram - it’s like a trailer, perfect for giving your audience a sneak peek into what they can expect, and ideal for sharing on socials.
Metadata, for an SEO boost without the hassle
Then, we exported both the Audiogram and chapters directly to YouTube from Snackable to share our content with the world’s largest stage for video and audio, picking up the SEO benefits from Google’s Video Search engine which indexes chapters.
Tags, for better visibility
Tags were automatically extracted, so we could get in front of the right people in seconds. Snackable is also perfect if you have a large library of content - you can make the tags searchable so that your users can discover similar content using keywords, names and concepts.
Transcripts, for easy reading
Of course, our AI generated a human readable transcript in seconds, breaking the wall of text into paragraph-sized clips and turning it into content built for browsing.
The Sonic Truth team passed over the podcast episode title and description, but Snackable grabs it automatically when your RSS feed is linked.
The ultimate content packaging tool
The result? Content that’s going places - ready for discovery by human and machines and packaged neatly into shareable pieces. Or, to be more precise, an easily browsable, accessible, time-saving, insight-highlighting representation of an information-rich recording.
Now read what Snackable did!
Bite-sized, shareable content. By the time you hear this podcast you will have likely already shared three Tiktoks. But sometimes short-form content can solve a real business challenge. Take the packaging of spoken-word content. How do you get a podcast, for example, in front of consumers easily, and give them the gist of it quickly? Mari Joller created Snackable to solve the problem, using AI to produce clearly-summarized, easily-discoverable and shareable spoken word content. Veritonic CEO Scott Simonelli sits down with Mari on this episode of The Sonic Truth.
Keywords: audio content, metadata, sound, content, audio video, short form, Estonia, marketing, long form, relevance, recording, transcription, user, packaging, monetization, analogy, keyword, enterprise, discovery
"Imagine you went to a bookstore and every single book in that store was shrink-wrapped. All you could see is the front cover and the back cover. But you couldn't open it."Mari Joller / Founder & CEO Snackable AI
Preview · 45 sec
Full episode · 26 minutes
00:00 Chapter 1: Snackable founder story
02:16 Chapter 2: The audio-first-world
06:48 Chapter 3: Making audio discoverable
12:12 Chapter 4: Tackling content search & accessibility with AI
20:11 Chapter 5: Packaging content to drive reach and engagement
Chapter 1: 0:00 - 02:16
Snackable founder story
So tell us a little bit about yourself and what you do now and how you got here.
I'm Mari Joller. I'm founder and CEO of Snackable. We are a technology startup, three years old, venture-funded, and headquartered in New York City. And personally, I've been in the audio-meets-content-meets-artificial-intelligence world for the past five years. Snackable is actually my second startup in this space. Before Snackable, I founded a company called Scarlet and we conceived this entirely new kind of voice assistant, one who would talk to you proactively as opposed to forcing you to ask questions.
And Scarlet was really the first place where we really saw how powerful short-form audio can be. And in some ways, it was also the genesis for Snackable. Originally, I'm from Estonia and I was fortunate as a kid to grow up at a time when a really transformative change was happening in Estonia, meaning that we got our independence from Soviet Union. And so a whole new country was basically built from scratch. And I saw firsthand how technology could be leveraged because Estonia simply didn't have the resources to build this analog expensive infrastructure. So everything was built digitally.
And today, Estonia is a leader in digital government and provides a platform for public and private services. So again, I think seeing technology from early age and the transformative power of it really is something that has drawn me to it since, and also to the opportunity to have a part in building products that can affect this transformative, positive change. So I stayed in Estonia for high school and then I came to the US without a long term plan to stay.
Twenty years later, here I am. And I came to college. I went to Middlebury in Vermont, studied Chinese and Economics, and then went into consulting. I worked for a boutique consultancy in Washington, D.C. called Kaiser Associates, and did the majority of my projects for tech and media clients; got my MBA from Harvard and then went to the operational side. So I was actually an intrapreneur before I was an entrepreneur. I did some early work at companies like Virgin Mobile and Nokia, building and scaling products like bringing mobile maps to emerging market users.
Chapter 2: 02:16 - 06:48
Well, you've done a little bit. So it's a lot of diverse experiences and a lot of different things and a lot of different cultures and different environments kind of across that, both Estonia versus the US versus where you went to college and the different companies.
What, if anything, kind of drew you back to audio specifically? Was there something about sound? So I want to be in this business. I want to be in this field. What brings that passion? Obviously, starting a company, starting a business, you've got to be something kind of that internal fire that can drive you. What about audio kind of pulled you to that?
Yeah. So I think there are really two events that I can think back on. One was when we first launched Scarlet and we really saw how much people were drawn to this short-form, personalized audio. And we wanted to be able to build a content ecosystem around that.
And perhaps unsurprisingly, to many of the listeners of this podcast, we all know that audio doesn't get produced in short, personalized clips. It's really kind of bigger, monolithic blocks of content. You can't really look inside. There's very little metadata to make it truly searchable.
So the possibility of unlocking the wealth of this world's audio is something that really sounded very compelling to me and that also later gave birth to Snackable. And the other is really 2020. I mean, last year we were talking about the approaching audio-first-web at Snackable and then 2020 brought us the audio-first-world.
I mean, with Covid we were just thrown in head-first. The digital transformation that companies had been talking about for a decade now just happened overnight. And it was a drastic change because all of the conversations were now being held online. I mean, the examples of personal are known to everybody, home-schooling kids, schools happening online, working on Zoom in PJ bottoms all day long.
But what's really interesting for us at Snackable is the enterprise side of things like what's been happening with all the large online events like Apple's product launch, all the trade shows, the thought leadership that companies produce. That hasn't gone away. Vegas got shut down, but those events moved online. So, what's interesting about it is actually it's more expensive, turns out, to hold these events online than it is to do them face-to-face.
So it's even more imperative to get the content in front of the right audiences, in the right packages, and in the right channels.
Wow, I would take it in first, I never thought I'd say this, I would take an in-person event right now over everything.
So you look at the audio as part of that landscape. Obviously, it exists in the context of podcasting and certain educational tools, but also Zoom and all the things that we're doing now. What does Snackable do and how does it fit into that ecosystem? So, assuming our audiences, marketers, and brands, and people who work in enterprise companies, what is Snackable and how does it fit into their business? How does it fit into the world?
Yeah, so we solve essentially the packaging problem for spoken word. So we sit in between the content production and content distribution. So, to give you an analogy from the physical world.
So imagine you went to a bookstore and every single book in that store was shrink-wrapped. So all you could see is the front cover and the back cover. On the front cover, you would see the title of the book, the author. On the back cover, you would maybe see a couple of reviews by people who are paid to say nice things about it. But you couldn't open it. You couldn't see the table of contents, you couldn't browse the chapters. You couldn't read out an interesting highlight to a friend.
So basically it was extremely limited. I mean, when we got to the online book sales, we would have Amazon introduce a feature like Look Inside, unsurprisingly that was really necessary. But if you're an audio producer today, you're still creating and distributing the content like a shrink-wrapped book.
So what Snackable does is that we use A.I. to basically provision that audio content with data and structure so it can become transparent. So you can put it in front of the right audiences, in the right channels, and also in the right packages. And in our parlance, it's mostly short-form content for better discovery or in other words, Snackable content.
Chapter 3: 06:48 - 12:15
Making audio discoverable
That's one thing that obviously my world here and our world at Veritonic is audio-focused as well. And one of the things that we're always fascinated by is the time aspect. How does time relate to audio and audio relate to time? And I think one of the challenges is you can flip through a couple of pages of a book to stay with your analogy because it takes time to hear all that content. Transcribing, it's not good enough because of the time factor, to provide structure to it.
How is that different from my own curiosity as well? Just like, how does that packaging work, and what does Snackable do to help make it accessible, make it something that people can use?
So I think audio discovery, in general, is really resistant to discovery. You have this what we call the monolithic blocs, but it essentially means fairly long recordings. And at best what you have is that you have indexing on the title, maybe the category, or maybe the description level. But what's inside is still unstructured for search engines and other internet tools.
And it's really difficult to kind of extract the right bits. So those right, the interesting bits, so to speak, still remain hidden within the long-form audio. So what we're trying to do is really go deeper into the actual content itself and say, okay, here's how this content breaks down into its natural chapters.
Here's a different guest in this podcast. Here are different parts of this webinar. Here's the metadata. So it basically allows this content to become much more easily searchable and accessible and allows us to extract kind of short-form pieces out of that for various distribution purposes and also monetization purposes. Because now if you have short-form content and you have the right metadata, it also becomes much easier to give the targeting elements for better monetization.
So going the other direction. So obviously that helps me or somebody to understand what's happening inside audio. One thing that I think is also a challenge is people interacting with audio. So we're seeing voices as a user interface become more and more of everybody's lives. Look at the world in general. But I really expect to be able to talk to just about anything these days. But when you’re looking at a smart speaker or looking at kind of voice technology, you know, a lot of times the results you get back are very limited, and that conversation can be very binary.
How do you see Snackable playing a role there? If I'm saying, you know, if I'm talking to Alexa or Siri, how can Snackable help improve that response?
Yeah. So I think when it comes to audio search, we've really only just scratched the surface. And then you're right, today when you ask a question to Alexa or Google Assistant, at best, you'll get back a Wikipedia entry read out by the machine voice.
Now, imagine you're able to instead replace that with a real excerpt from a real human conversation. I mean, as humans, we've recorded a hundred and sixty years of audio content. So there's like incredibly rich universal content that exists both historic and also modern. And it is, you know, highly emotive, audio is really powerful as a storytelling tool.
So I think we have a long ways to go to, not very long ways to go, but there's a lot more opportunity to unlock the potential of audio search. And expand what we can do, but that, of course, then requires that we're able to, again, peek inside that long-form content and extract the right pieces and by right pieces, we mean not only kind of like the right length of a clip, but also relevancy and also be able to make that clip kind of semantically and linguistically coherent.
So you can understand like who's talking, what they're talking about, and why that's relevant to you. And so if I could come back to the 2020 context, I think that we've now had another proliferation of content because of this entire new industry has mushroomed from nowhere, which is digital events.
I mean, they did exist before. But what's happened this year is that there's so many of them. And the role that they play, how prevalent they are, and how much they're important in terms of various communications. This is entirely new since Covid. And it's actually a really competitive industry. There's new interfaces, people are spending a lot of resources. Again, it's more expensive to put on these events.
And what's interesting about it is also that these events are much less of one and done because more than half the people attending webinars or any sort of recordings for the purposes of the audio conversation they are attending after the fact, not when the event is actually airing live. So the interesting opportunity there for marketers is actually to put that event in front of many more eyeballs and earbuds if you will because the internet unlike the conference rooms in Vegas don't have capacity constraints.
So it's a really interesting way to repurpose the content and also get the content in front of the right people and again, to the idea of extracting the right elements from that content. I was talking to a marketing executive the other day and she said, if I can get five minutes of engagement from someone to whom I've given an hour worth of content, that's a huge win for me.
Chapter 4: 12:15 - 20:11
Tackling content search & accessibility with AI
Right. That's actually a perfect segway.
So getting to brands and marketers is really hard. There's a lot of, no pun intended, there's a lot of noise and it's difficult. Even if you said audio search is going to be bigger than web search, which audio search is clearly undervalued. But there's a lot of pieces there in the way if someone's listening to this or if you were to take a quote from this podcast if you were to use Snackable. If I'm a brand, why should I care? What's the thing that if I'm a marketer.
If I’m working at an enterprise or a brand. Why should I care about this? Why is this important to me?
Yeah, I think it's definitely important because brands, and many of the brands, have just incredibly rich content libraries that are effectively collecting dust. The investment has already been made, the content has been produced. And so making it more accessible and searchable allows this content to be reused, repurposed, and remonetized. Marketers have been doing this for years with written content and I think this is an opportunity now to do the same with audio/video.
I think the really interesting thing with audio and if you think about it from a brand's perspective and the true difficulty from an end user’s ability to engage with that content is that it's really hard to grasp the sheer size of audio. I mean, we all understand 140 characters on Twitter. But when it comes to audio, for example, if you had an hour and a half worth of recording, that actually translates to 17 pages of written transcription. That's a really long article that you'd probably think twice about investing into reading.
So there again, it becomes more imperative of making that content transparent so that you can break that monolithic hour and a half down into logical component parts so that somebody that you're trying to target and that somebody only has five minutes. You can still get engagement from the target user.
And that's a huge piece of it, and I think that it's hard. I think it's one of those things that kind of hides in plain sight. And if people don't think about, one, the quantity of audio within their enterprise, within their organization, but also the value that might be unlocked within it if you could use it, if they were useful in some way or if you had any access to it. What is the legacy technology?
Because I think sometimes we look at these things and it's, okay, well, even if somebody today realized I've got this piece of audio content, I've got this sitting somewhere in my company and I can use this, I can use something that's audio-focused or has audio, what was the thing that kind of people would do now?
How would I even do that without Snackable? What is the legacy disruption that's happening here in your mind, or is it even doable at all or just sitting there somewhere in the database?
I mean, we've seen scenarios really wall to wall, so to speak, really across the spectrum. I mean, the great analogies that someone famous dies, that David Bowie died. Where do we have him? We had him in the studio. When was that? Three years ago. Can you go find that recording or where is it exactly? We can't find it. So it can create this entire havoc of people trying to find the right content.
Oftentimes there is some sort of, in most of the times there is a CMS to look, but oftentimes there's very little metadata that gets written about a certain recording. So it's like episode 151 X, Y, Z, and David Bowie is not mentioned anywhere. Or, you have actually some sort of an organizational structure, but again, he happened to be a guest for five minutes in a recording and that recording has five minutes of really valuable content.
But unless you've really indexed that, there's really no way to easily make that searchable and be able to pull that out and really quickly get that content in front of the right users in that very moment. So it really depends but I think the opportunity is to go much further than that and really, truly structure and index the audio libraries in a way that that content could be just very easily pulled out, repurposed and monetized.
Not to get into the weeds here, but curious, like I've seen the light. I want to do this. How do I onboard my existing maybe just sitting in a CMS somewhere, maybe it's on some database somewhere. How do I go from the past to the present and future? How does that happen and how does that onboarding look?
Onboarding is not difficult. I mean, there's ways in which you can connect your RSS feeds, connect through APIs, just ingest the content. So there is some couple of logical steps that happen around transcription, metadata extraction, the actual segmentation of the content into this chapter's table of contents, various topics, speakers, et cetera.
So there's pretty easy ways with the help of A.I. I think without that it would be pretty difficult to do that to, so to speak, digitize and index a library to allow it to do exactly what you're describing.
So to be clear, it's not just this big dump into like a new index. There's A.I. involved, there's obviously a lot of there's more to it than probably meets the eye.
Yeah, absolutely. I mean, you know, if you just went with transcription, there's it's really difficult to make the audio truly searchable because I could do the transcription, I could just bing out the keywords but I was looking for a popular keyword.
And the first result, like the first episode alone, had 80 mentions of that. So how do I sift through the first 80 mentions in just the first result to find what I'm looking for? So definitely the A.I. is needed to truly understand the structure and the component parts of the audio within each recording but across your entire immense library. I make it structured in a way that actually allows the end-user to find the relevant bits first.
Do you find that people don't know they need it because they didn't know it existed? lt's one of those things where because they're so used to it being something they can't solve for or something that would be too much work to bother with because it's so locked down in this kind of and mired in this mess that they aren't looking for a solution. Have they given up or do you have to let them know that this is possible?
Where is the state of awareness of this problem? And how does that come to I need to solve this problem? You have to remind them they have it or they realize they have it and or both.
I would say the discoverability problem of audio has been one that all of us have been really hearing publishers voice for the past few years. I mean, the analogy that we bring is that we are with audio where we are, where we were with a written web pre-Google, where you literally had Yahoo. And to quote-unquote search the internet, you went to the Yahoo! portal and you clicked on the right link. And that was your entire internet experience.
And when Google came, now you could actually unlock the entire internet and actually go and find broadly what was interesting to you and not be just limited to some editors' choices in terms of putting links in front of you. So I think it's a very relevant problem. I think it's a very timely problem.
And with Covid more and more of the content moving into digital audio/video, it's one where marketers and publishers are really seeing the problem front and center and actively looking for solutions. And again, the piece that's really driving that is that it's not just one and done kind of create a piece of content and you just push it out, then you create next piece of content and push it out. There's much more thinking around how do I package it and how do I distribute that.
Chapter 5: 20:11 - 26:11
“Packaging” content to drive reach and engagement
How do I get the right piece in front of the right user? Because there's so much noise in 2020 as we've all just inundated with content that the ability to get through and be able to get in front of the right user with the right message is more important than ever.
I appreciate that you use search on the internet as an analog. Our marketing team is in their mind because I do that all the time, but it's true, and thank you for validating. You know, you think about it. Imagine the written web without search and all the things that came out of that. Again, it's not just about finding things. There are thousands and thousands of applications built upon the premise that this content is accessible to me and this content is relevant to me at all.
And that's key. Right? There is not a lot of people going past the first page of Google, so relevancy is paramount. And I think what you're talking about with Snackable is right in that area. It's right in the bull's eye there.
So let's say everything goes right and then let's hope the world gets a little better. Well, Covid may have brought in a little more focus to some of these things. Let's hope it does go in the right direction over the long haul. But what is the ideal state of kind of the world for Snackable? What is the future? What do we have to look forward to? And how are you going to make it better for all of us?
One hundred percent agree with you that we all hope that Covid is going to be soon a more of a memory and if anything will have made us perhaps a little bit less frantic and perhaps we travel a little bit less and act in more environmentally friendly ways when the pandemic is over, but certainly can't wait to get to have the world again be not upside down. But in terms of content discovery, the trends are definitely here to stay.
And what we're building at Snackable in this world, we would have structure in audio, so we would have logical chapters and a table of content for recordings like this. So if anybody wanted to come to this podcast afterwards, they would be able to read the chapter titles and click into the part of our conversation that was relevant for them. We would also have smart highlights that both Sonic Truth and Snackable would be able to post through their social media presence to be able to drive awareness to this conversation and or put them onto our website or app with short previews for the long-form conversation.
We would have metadata, for example, transcripts that make audio readable regardless of your hearing capabilities. Accessibility is a big trend and important trend, and it also lines the content more with how the modern day SEO works so that content can be indexed beyond its recording into the actual content and structure of it and the component parts of it. So I think overall it's very possible to make this richness of audio content and the world of audio we've recorded as humans for the past 160 years.
And so make it accessible, make it easy to engage with, especially with the help of A.I.
I think that's awesome. Just that one last question for you and this is something I think you just have a really interesting story, a very interesting background and Snackable is solving a really, really big problem.
There's a lot going on here. What are the things that you see if you are to provide some advice to someone who's an organization, who wants to unlock the value of audio, who sort of is in a space, has recognized the secular story around audio like they're seeing all these things in their lives. They're living in the world today.
What is your advice to them, given all your amazing set of experiences, like what would you say is a great place to start and maybe a good first step? Because sometimes it can just be overwhelming. It's like I have this huge, you're not going to solve these huge problems with one action. What's the first step? Or maybe some kind of short-term advice you give to somebody who recognizes that this problem exists?
I think if I'm thinking about someone who is at a publisher and is thinking about how do I get the most awareness and discoverability for my content? And I think the way to think about it is what are the channels that I'm using today and where are my users?
So what is it that I'm actually going to do to help my content cut through the noise? I think the biggest thing in 2020 is that all of our attention spans have just completely shrunk because we have inundated it with so much content, so I think the way to think about it is the trend we used to have was TL;DR, which is too long, didn't read. I mean, in 2020 we have TL;DL, too long, didn't listen. So I think it's helpful to think of content production. It's like, you know, there is absolutely a reason to produce long-form content.
It's a beautiful storytelling mechanism, a medium. But how do you then take bits and pieces of that and put it in front of the right audiences who have very limited time? So that build basically the mechanism to kind of get the tentacles out into the internet and be able to draw the users back in.
OK, I didn't think our attention span could get any shorter, but I guess that's where we're going. OK, well, thank you for coming on today. I think there's a huge opportunity here for anybody, publishers and really anybody in enterprise of any kind.
Audio is a big part of everybody's existence and a big part of everybody's customers and clients. So that there's this podcast is just one of many examples that we use. Even though we're an audio focused company, we communicate using audio a million different ways. And there's a lot of value in that content everywhere. So hopefully you'll make that better for everybody. So thank you so much.
Thanks for having me. It's been fun.