Wednesday, April 17, 2024
Coding STEM

Summarization & Detecting Topics by Deepgram Whisper AI

This is the last of 3-part series on Datagram’s Audio->Text Transcriber using their latest AI engine called Whisper. Be sure to read them in this order, if you haven’t already, to follow along best:

  1. Powerful Auto-Transcription Using AI (OpenAI’s Whisper)
  2. AI Transcription With Diarization
  3. Summarization & Detecting Topics by Deepgram Whisper AI (this post)
  4. One Of The Newest TTS On The Block — this is code going opposite direction from text->audio creation using AI (Bark).

To complete essentially all the features I care to implement, today I’m going to add the last 2 features in my code: Summarization, and Topics detection.

Let’s first see what they are and how to configure our request parameters to get the responses back. Then I’ll demonstrate how to make sense of the response we get from the server and format it in our own way.

Summarization

Deepgram’s Summarization feature summarizes sections of content in the audio and returns these summaries in the JSON response.

To enable Summarization, set ‘summarize’ parameter set to True in the options.

JSON response with following basic structure (condensed form to focus just on the ‘alternatives’ array):

{

    “metadata”: {

      “transaction_key”: “string”,

      “request_id”: “string”,

      “sha256”: “string”,

      “created”: “string”,

      “duration”: 0,

      “channels”: 0

    },

    “results”: {

      “channels”: [

        {

          “alternatives”:[],

        }

      ]

    }

  }

The ‘alternatives’ object array in turn contains the following items:

  “alternatives”:[

    {

      “transcript”: “<Entire trasnscript of audio will be here>”,

      “confidence”: 0.99107355,

      “words”: [],

      “summaries”: [

        {

          “summary”: “<Summary text for the section will be here in one string>”,

          “start_word”: 0,

          “end_word”: 623

        },

        {

          “summary”: “<Summary text for the section will be here in one string>”,

          “start_word”: 623,

          “end_word”: 1227

        },

        …

      ]

    }

  ]

Once we understand the structure, the next task is to extract the information of interest from this. My code to parse this and extract all summaries and create one paragraph is as follows:

1
2
3
4
5
6
7
    summaries = [] # create a list to populate all summary values there in the loop
    for channel in data['results']['channels']:
        for alternative in channel['alternatives']:
            for summary in alternative['summaries']:
                summaries.append(summary['summary'])

    summary_string = ' '.join(summaries) # separate by a space.

With that taken care of, let’s move to Topics detection.

Topics Detection

Deepgram’s Topic Detection feature identifies and extracts key topics from content in the audio and returns these topics in the JSON response.

To enable Topic Detection, set ‘detect_topics’ parameter set to True in the options:

{

    “metadata”: {

      “transaction_key”: “string”,

      “request_id”: “string”,

      “sha256”: “string”,

      “created”: “string”,

      “duration”: 0,

      “channels”: 0

    },

    “results”: {

      “channels”: [

        {

          “alternatives”:[],

        }

      ]

    }

We see above that the ‘alternative’ array contains:

    transcript: Transcript for the audio being processed.

    confidence: Floating point value between 0 and 1 that indicates overall transcript reliability. Larger values indicate higher confidence.

    words: Object containing each word in the transcript, along with its start time and end time (in seconds) from the beginning of the audio stream, and a confidence value.

    summaries: Object containing the information about summaries for the audio being processed.

    And we see that each summaries object contains:

    summary: Summary of the audio section being summarized.

    start_word: Index of the first word in the section of audio being summarized.

    end_word: Index of the last word in the section of audio being summarized.

    {

        “alternatives”:[

          {

            “transcript”:”<Entire transcript wil be here>”,

            “confidence”:0.99121094,

            “words”:[…],

            “topics”:[

              …

              {

                “text”:”That work in the afternoon, that dip that you feel. For instance, it’s just one example and weight management. And we all respond differently sometimes a little bit sometimes vastly differently even to the same foods. So one type of carbohydrate that my body might process well, let’s say it’s fruit or rice or sweet potato, your body might not. The levels app interprets your glucose data and provides a simple score after you eat a meal. You can see how different foods affect you and then develop a personalized diet that’s right. For you and your goals. Seeing this data in real time at least for me and for so many others, who used levels is a really powerful behavioral change mechanism. And many of the guests on the podcast have talked about this.”,

                “start_word”:27240,

                “end_word”:27375,

                “topics”:[

                  {

                    “topic”:”<detected topic1>”,”confidence”:0.9869026

                  },

                  {

                    “topic”:”<detected topic2>”,”confidence”:0.97236645

                  },

                  {

                    “topic”:”<detected topic3>”,”confidence”:0.4745059

                  }

                ]

              },

              …

            ]

          }

        ]

      }

We see that each ‘alternative’ in the array contains the following items:

      transcript: Transcript for the audio being processed.

      confidence: Floating point value between 0 and 1 that indicates overall transcript reliability. Larger values indicate higher confidence.

      words: Object containing each word in the transcript, along with its start time and end time (in seconds) from the beginning of the audio stream, and a confidence value.

          topics: Object containing the information about topics for the audio being processed.

And we see that each topics object contains:

      text: Transcript of the audio section being inspected for topics.

      start_word: Location of the first character of the first word in the section of audio being inspected for topics.

      end_word: Location of the first character of the last word in the section of audio being inspected for topics.

      topics: Object containing key topic of the section of audio being inspected for topics, along with a confidence value. For a list of topics supported, see Identifiable Topics

My code to parse this and extract all topics and create one line of comma-separated topics is as follows:

1
2
3
4
5
6
7
8
    topics = []
    for channel in data["results"]["channels"]:
        for alternative in channel["alternatives"]:
            for topic_group in alternative["topics"]:
                for topic in topic_group["topics"]:
                    if topic["topic"]:
                        topics.append(topic["topic"])
    result = ", ".join(topics)

Putting it altogether with the codes we discussed in the previous 2 posts about using Deepgram’s AI model, here’s a real-world example of a 8:04 minute audio book titled “No Pets Allowed” by M.A. Cummings.

Feeding this audio to my code, it outputs the following statistics and information in about 15 seconds. The code does several things: gets the full transcript (without diarization), get the diarized transcript, organizes every phrase in order by the speaker number (in this case, the book was read by one person, so it correctly identified just one speaker), it counts the number of words detected in the full audio, shows the topics discussed in the audio, shows a summary of the audio…all in about 15 seconds using a regular computer.

Audio length: 8:04 mins

Seconds to execute: 15.266
** Overall accuracy ** : 99.701%

Count of words detected: 1336 | Unique speakers detected: 1
Topics discussed: security, prison, loans
Summary of audio: Mae, Monette, to her friends, Cummings, returns with another hauntingly persuasive tale, of a tomorrow that may not be as gleaming as we hope. Grem the flycatcher was so sleek and pretty, she was a pet to be proud of. Every 3 or 4 weeks, 3 of the council members came to take a part of the treasure or to add to it. Grim, the little old member, asked me to give him the key. I was unhappy to displease him, but I said, I can't let you have it. She's grown so much larger now and more beautiful than ever. But I hope she hasn't developed a taste for human flesh. Lately when she stretches out her feelers, it seems that she's trying to reach me.

*** Full transcript (not diarized)***:

No pets allowed by m a Cummings.
This is a LibriVox recording.
All LibriVox recordings are in the public domain.
For more information or to volunteer, please visit librivox dot org, reading by Belona Times.
No pets allowed by MA Cummings.
Mae, Monette, to her friends, Cummings, returns with another hauntingly persuasive tale, of a tomorrow that may not be as gleaming as we hope.
Her recent story, The Weardies, apparently delighted some and startled others.
And this in Los Angeles.
What’s happening there? He didn’t know how he could have stood the 4 months there alone.
She was company, and 1 could talk to her.
I can’t tell anyone about it.
In the first place, they never believed me.
And if they did, I’d probably be punished for having her, because we aren’t allowed to have pets of any kind.
It wouldn’t have happened.
If they hadn’t sent me way out there to work.
But you see, there are so many things I can’t do.
I remember the day the chief of vocation took me before the council.
I’ve tried them on a dozen things, he reported.
People always talk about me as if I can’t understand what they mean, but I’m really not that dumb.
That doesn’t seem to be a thing he can do, the chief went on.
Actually, his intelligence seems to be no greater than that which we believe our ancestors had back in the twentieth century.
As bad as that, observed 1 of the council members You do have a problem.
But we must find something for him to do, said another.
We can’t have an idle person in the state.
It’s unthinkable.
But what asked the chief, he’s utterly incapable of running any of the machines I’ve tried to teach him.
The only things he can do are already being done much better by robots.
There was a long silence broken at last by 1 little old council member.
I haven’t, he cried.
The very thing, we’ll make him guard of the treasure.
But there’s no need of a guard, no 1 will touch the treasure without permission.
We haven’t had a dishonest person in the state for more than 3000 years.
That’s it exactly.
There aren’t any dishonest people, so there won’t be anything for him to do.
But we will have solved the problem of his idleness.
It might be a solution.
Said the chief, at least a temporary 1, I suppose we will have to find something else later on, but this will give us time to look for something.
So I became guard of the treasure, with a badge and nothing to do unless you count watching the key.
The gates were kept locked.
Just as they were in the old days, but the large keys hung beside them.
Of course, no 1 wanted to bother carrying it around.
It was too heavy.
The only ones who ever used it.
Anyway, were members of the council.
As the man said, we haven’t had a dishonest person in the state for thousands of years, Even I know that much.
Of course, this left me with lots of time on my hands.
That’s how I happened to get her in the first place.
I’d always wanted 1, but pets were forbidden.
Busy people didn’t have time for them, so I knew I was breaking the law.
But I figured that no 1 would ever find out.
First, I fixed a place for her and made a brush screen.
So that she couldn’t be seen by anyone coming to the gates.
Then 1 night, I sneaked into the forest and got her.
It wasn’t so lonely after that.
Now, I had something to talk to.
She was small when I got her.
It would be too dangerous to go near a full grown 1.
But she grew rapidly.
That was because I cut small animals and brought them to her, not having to depend on.
What she could catch, she grew almost twice as fast as usual and was so sleek and pretty.
Really, she was a pet to be proud of.
I don’t know how I could have stood the 4 months there alone, if I hadn’t had her to talk to.
I don’t think she really understood me.
But I pretended she did, and that helped.
Every 3 or 4 weeks, 3 of the council members came to take a part of the treasure or to add to it.
Always 3 of them.
That’s why I was so surprised 1 day to see 1 man coming by himself.
It was Grim, the little old member.
Who had recommended that I be given this job.
I was happy to see him, and we talked for a while, mostly about my work and how I liked it.
I almost told him about my pet, but I didn’t, because he might be angry at me for breaking the law.
Finally, he asked me to give him the key.
I’ve been sent to get something from the treasure.
He explained.
I was unhappy to displease him, but I said, I can’t let you have it.
There must be 3 members.
You know that.
Of course, I know it, but something came up suddenly, so they sent me a loan.
Now let me have it.
I shook my head.
That was the 1 order they had given me.
Never to give the key to any 1 person who came along.
Graham became quite angry.
You idiot.
He shouted, what do you think I had to put out here? It was so I could get in there and help myself to the treasure.
But that would be dishonest and there are no dishonest people in the state.
For 3000 years, I know.
His usually kind face had an ugly look I had never seen before.
But I’m gonna get part of that treasure.
And it won’t do you any good to report it because no 1 is going to take the world of a fool like you against a respected council member.
They’ll think you are the dishonest 1.
Now, give me that key.
It’s a terrible thing to disobey, a council member.
But if I obeyed him, I would be disobeying all the others, and that would be worse.
No, I shouted.
He threw himself upon me.
For his size and age, he was very strong, stronger, even than I.
I fought as hard as I could, but I knew I wouldn’t be able to keep him away from the key for very long.
And if he took the treasure, I would be blamed.
The council would have to think a new punishment for dishonesty.
Whatever it was, it would be terrible, indeed.
He drew back and rushed at me.
Just as he hit me, my foot caught upon a root and I fell.
His rush carried him past me.
And he crashed through the brush screen beside the path.
I heard him scream twice, then there was silence.
I was bruised all over, but I managed to pull myself up and take away what was left of the screen.
There was no sign of grimm.
But my beautiful pet was waving her pearl green feelers as she always did and thanks for a good meal.
That’s why I can’t tell anyone what happened.
No 1 would believe that Grim would be dishonest.
And I can’t prove it because she ate the proof.
Even if I did tell them, no 1 is going to believe that a flycatcher plant even a big 1 like mine would actually be able to eat a man.
So they think that Grem disappeared, and I’m still out here with her.
She’s grown so much larger now and more beautiful than ever.
But I hope she hasn’t developed a taste for human flesh, Lately when she stretches out her feelers, it seems that she’s trying to reach me.
End of, no pets allowed by MA Cummings.
.
*** Diarized transcript ***:

[Speaker:0] No pets allowed
[Speaker:0] by m a Cummings.
[Speaker:0] This is a LibriVox recording. All LibriVox recordings are in the public domain. For more information or to volunteer, please visit librivox dot org, reading by Belona Times.
[Speaker:0] No pets allowed
[Speaker:0] by MA Cummings.
[Speaker:0] Mae,
[Speaker:0] Monette, to her friends, Cummings,
[Speaker:0] returns with another hauntingly persuasive tale, of a tomorrow that may not be as gleaming as we hope. Her recent story, The Weardies,
[Speaker:0] apparently delighted some and startled others. And this in Los Angeles. What’s happening there?
[Speaker:0] He didn’t know how he could have stood the 4 months there alone. She was company, and 1 could talk to her. I can’t tell anyone about it. In the first place, they never believed me. And if they did, I’d probably be punished for having her, because we aren’t allowed to have pets of any kind.
[Speaker:0] It wouldn’t have happened. If they hadn’t sent me way out there to work. But you see, there are so many things I can’t do. I remember the day the chief of vocation took me before the council.
[Speaker:0] I’ve tried them on a dozen things, he reported. People always talk about me as if I can’t understand what they mean, but I’m really not that dumb. That doesn’t seem to be a thing he can do, the chief went on.
[Speaker:0] Actually, his intelligence seems to be no greater than that which we believe our ancestors
[Speaker:0] had back in the twentieth century.
[Speaker:0] As bad as that, observed 1 of the council members You do have a problem.
[Speaker:0] But we must find something for him to do, said another. We can’t have an idle person in the state. It’s unthinkable.
[Speaker:0] But what asked the chief, he’s utterly incapable
[Speaker:0] of running any of the machines
[Speaker:0] I’ve tried to teach him. The only things he can do are already being done much better by robots.
[Speaker:0] There was a long silence broken at last by 1 little old council member.
[Speaker:0] I haven’t,
[Speaker:0] he cried.
[Speaker:0] The very thing, we’ll make him guard of the treasure.
[Speaker:0] But there’s no need of a guard, no 1 will touch the treasure without permission. We haven’t had a dishonest person in the state for more than 3000 years.
[Speaker:0] That’s it exactly. There aren’t any dishonest people, so there won’t be anything for him to do.
[Speaker:0] But we will have solved the problem of his idleness.
[Speaker:0] It might be a solution.
[Speaker:0] Said the chief, at least a temporary 1, I suppose we will have to find something else later on, but this will give us time to look for something.
[Speaker:0] So I became guard of the treasure,
[Speaker:0] with a badge and nothing to do unless you count watching the key.
[Speaker:0] The gates were kept locked. Just as they were in the old days, but the large keys
[Speaker:0] hung beside them. Of course, no 1 wanted to bother carrying it around. It was too heavy.
[Speaker:0] The only ones who ever used it. Anyway, were members of the council. As the man said, we haven’t had a dishonest person in the state for thousands of years, Even I know that much.
[Speaker:0] Of course, this left me with lots of time on my hands. That’s how I happened to get her in the first place.
[Speaker:0] I’d always wanted 1, but pets were forbidden.
[Speaker:0] Busy people didn’t have time for them,
[Speaker:0] so I knew I was breaking the law. But I figured that no 1 would ever find out.
[Speaker:0] First, I fixed a place for her and made a brush screen. So that she couldn’t be seen by anyone coming to the gates.
[Speaker:0] Then 1 night, I sneaked into the forest and got her.
[Speaker:0] It wasn’t so lonely after that.
[Speaker:0] Now, I had something to talk to. She was small when I got her. It would be too dangerous to go near a full grown 1. But she grew rapidly.
[Speaker:0] That was because I cut small animals and brought them to her,
[Speaker:0] not having to depend on. What she could catch, she grew almost twice as fast as usual and was so sleek and pretty. Really, she was a pet to be proud of.
[Speaker:0] I don’t know how I could have stood the 4 months there alone,
[Speaker:0] if I hadn’t had her to talk to. I don’t think she really understood me. But I pretended she did, and that helped.
[Speaker:0] Every 3 or 4 weeks, 3 of the council members came to take a part of the treasure or to add to it. Always 3 of them. That’s why I was so surprised 1 day to see 1 man coming by himself.
[Speaker:0] It was Grim, the little old member. Who had recommended
[Speaker:0] that I be given this job. I was happy to see him, and we talked for a while, mostly about my work and how I liked it. I almost told him about my pet, but I didn’t, because he might be angry at me for breaking the law.
[Speaker:0] Finally, he asked me to give him the key.
[Speaker:0] I’ve been sent to get something from the treasure. He explained.
[Speaker:0] I was unhappy to displease him, but I said, I can’t let you have it. There must be 3 members. You know that.
[Speaker:0] Of course, I know it, but something came up suddenly, so they sent me a loan. Now let me have it.
[Speaker:0] I shook my head.
[Speaker:0] That was the 1 order they had given me. Never to give the key to any 1 person who came along.
[Speaker:0] Graham became quite angry. You idiot.
[Speaker:0] He shouted, what do you think I had to put out here? It was so I could get in there and help myself to the treasure.
[Speaker:0] But that would be dishonest
[Speaker:0] and there are no dishonest people in the state.
[Speaker:0] For 3000
[Speaker:0] years, I know.
[Speaker:0] His usually kind face had an ugly look I had never seen before.
[Speaker:0] But I’m gonna get part of that treasure. And it won’t do you any good to report it because no 1 is going to take the world of a fool like you against a respected council member.
[Speaker:0] They’ll think you are the dishonest 1. Now, give me that key.
[Speaker:0] It’s a terrible thing to disobey,
[Speaker:0] a council member.
[Speaker:0] But if I obeyed him, I would be disobeying all the others, and that would be worse.
[Speaker:0] No, I shouted.
[Speaker:0] He threw himself upon me. For his size and age, he was very strong,
[Speaker:0] stronger, even than I. I fought as hard as I could, but I knew I wouldn’t be able to keep him away from the key for very long. And if he took the treasure, I would be blamed.
[Speaker:0] The council would have to think a new punishment for dishonesty.
[Speaker:0] Whatever it was, it would be terrible, indeed.
[Speaker:0] He drew back and rushed at me. Just as he hit me, my foot caught upon a root and I fell. His rush carried him past me. And he crashed through the brush screen beside the path. I heard him scream twice,
[Speaker:0] then there was silence.
[Speaker:0] I was bruised all over, but I managed to pull myself up and take away what was left of the screen.
[Speaker:0] There was no sign of grimm. But my beautiful pet was waving her pearl green feelers as she always did and thanks for a good meal.
[Speaker:0] That’s why I can’t tell anyone what happened.
[Speaker:0] No 1 would believe that Grim would be dishonest.
[Speaker:0] And I can’t prove it because she ate the proof.
[Speaker:0] Even if I did tell them, no 1 is going to believe that a flycatcher plant even a big 1 like mine would actually be able to eat a man.
[Speaker:0] So they think that Grem disappeared,
[Speaker:0] and I’m still out here with her.
[Speaker:0] She’s grown so much larger now and more beautiful than ever.
[Speaker:0] But I hope she hasn’t developed
[Speaker:0] a taste for human flesh, Lately when she stretches out her feelers,
[Speaker:0] it seems
[Speaker:0] that she’s trying to reach me.
[Speaker:0] End of, no pets allowed
[Speaker:0] by MA Cummings.

To convince you even more, see the output of an audio with multiple speakers of different genders in my previous post: AI Transcription With Diarization

There you have it! As you can see, it did a wonderful job in every task of a 8 min+ audio in less time than it takes to just play it—all in just 15 seconds! Hope this series was educational, as well as exciting. And by the way, if you’re thinking of using AI to go the other way…meaning, text to audio creation, I have done that too for you! Check out the TTS post here.

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top