Greg Shackles: Using Speech Recognition in iOS 10

There’s a lot of awesome stuff coming in iOS 10, but one of the things I’m most excited about is that now there’s actually an API for doing speech recognition in your apps. It was sort of possible to do this previously through some hackery (I have a bit of experience there), but now Apple is finally making it easy to leverage their great dictation feature in any app. Let’s take a look at how easy it is to plug dictation into an app.

Requesting Permission

Similar to other APIs in iOS such as location, your app much request authorization from the user before it can make use of this feature. First you’re going to need two new properties in your Info.plist file:

<key>NSSpeechRecognitionUsageDescription</key>  
<string>For science</string>  
<key>NSMicrophoneUsageDescription</key>  
<string>For science</string>  

For this example there are two things you need to request explicit permission from the user in order to do:

  • use speech recognition
  • use and record the microphone

The descriptions you provide in here will be displayed in the prompt iOS presents to the user:

Permissions prompt

Basic Setup

We’ll want to add some private members to the controller to track most of the moving parts of the dictation:

private readonly AVAudioEngine _audioEngine = new AVAudioEngine();  
private readonly SFSpeechRecognizer _speechRecognizer = new SFSpeechRecognizer();  
private SFSpeechAudioBufferRecognitionRequest _speechRequest;  
private SFSpeechRecognitionTask _currentSpeechTask;  

This is a mix of some pre-existing AVFoundation functionality and new additions to iOS 10 such as SFSpeechAudioBufferRecognitionRequest.

Next we need to actually trigger that prompt:

public override void ViewDidLoad()  
{
    base.ViewDidLoad();

    Dictate.Enabled = false;

    SFSpeechRecognizer.RequestAuthorization(status =>
    {
        if (status != SFSpeechRecognizerAuthorizationStatus.Authorized)
            return;

        _audioEngine.InputNode.InstallTapOnBus(
            bus: 0,
            bufferSize: 1024,
            format: _audioEngine.InputNode.GetBusOutputFormat(0),
            tapBlock: (buffer, when) => _speechRequest?.Append(buffer));
        _audioEngine.Prepare();

        InvokeOnMainThread(() => Dictate.Enabled = true);
    });
}

When the screen loads it will request authorization from the user, enabling or disabling a button in the UI based on the current authorization status. In a real application you would want to provide some more feedback to the user, of course. Once the user grants permission we also establish a tap on the microphone that appends the received buffer to the current speech request.

Side note: if you fail to set NSMicrophoneUsageDescription in your Info.plist file, attempting to access _audioEngine.InputNode here will cause your app to crash immediately in a way that you can’t catch and that doesn’t report any useful messages. Major hat tip to Larry O’Brien for helping me get to the bottom of that one, as it was driving me crazy.

Define the UI

For the sake of simplicity, the app’s UI will be super simple:

UI

There’s a button that will either start or stop the dictation, and a label that will display the output.

Implement Speech Recognition

Okay, now the fun part. First, let’s wire up the button:

partial void onDictateTapped(UIButton sender)  
{
    if (_currentSpeechTask?.State == SFSpeechRecognitionTaskState.Running)
    {
        InvokeOnMainThread(() =>
           Dictate.SetTitle("Start Dictating", UIControlState.Normal));

        stopDictating();
    }
    else
    {
        InvokeOnMainThread(() =>
        {
            Dictate.SetTitle("Stop Dictating", UIControlState.Normal);
            DictationResults.Text = "Waiting for dictation...";
        });

        startDictating();
    }
}

With that in place we can now implement startDictating():

private void startDictating()  
{
    NSError error;
    _audioEngine.StartAndReturnError(out error);

    _speechRequest = new SFSpeechAudioBufferRecognitionRequest();
    _currentSpeechTask = _speechRecognizer.GetRecognitionTask(_speechRequest, (result, err) => InvokeOnMainThread(() =>
    {
        if (result == null) return;

        DictationResults.Text = result.BestTranscription.FormattedString;
        DictationResults.BackgroundColor = result.Final ? UIColor.Black : UIColor.Green;
        DictationResults.TextColor = UIColor.White;
    }));
}

In this method we create a new recognition request and speech task. When updates come in from the recognizer we update our label to display the text, and alter its background color based on whether dictation is still in progress or not.

Finally we just need to implement stopDictating():

private void stopDictating()  
{
    _audioEngine.Stop();
    _speechRequest?.EndAudio();
}

It’s that easy! Running the app will look like this:

Final app

You can also provide a prerecorded file to the speech recognition APIs and have that turned into text as well, instead of recording audio live from the microphone.

Summary

That might have seemed like a lot, but most of it was just boilerplate, really. The actual code to tap into the microphone, record input, pass it into the speech recognizer, and display the results in real time as the user speaks really only took a few lines of code. I don’t know about you, but I’m pretty excited about that.

Details

Greg Shackles: Composable Customizations with Xamarin.Forms

In the early days of Xamarin.Forms, if you wanted to start customizing any of the built-in controls in ways beyond what were explicitly exposed on those controls, you probably found yourself creating a lot of custom renderers. Renderers can be a great way to hook into the Xamarin.Forms rendering pipeline and inject your own customizations, but it can quickly get unwieldy if you do it a lot, and is often a larger hammer than you would want to be swinging.

In more recent versions of Xamarin.Forms, the concepts of behaviors and effects were introduced that make these types of customizations much more approachable. In contrast to renderers, where you can only have a single implementation for any given control, behaviors and effects allow you to define customizations at a much more granular level. In addition, you can add any number of behaviors and effects to a control, meaning you can compose them together to get the outcome for which you’re looking.

Read the rest of the article over at Visual Studio Magazine.

Details

Xamarin: Performing OCR for iOS, Android, and Windows with Microsoft Cognitive Services

Optical character recognition, commonly known as OCR, detects the text found in an image or video and extracts the recognized words. By using OCR, we can provide our users a much better user experience; instead of having to manually perform data entry on a mobile device, users can simply take a photo, and OCR can […]

The post Performing OCR for iOS, Android, and Windows with Microsoft Cognitive Services appeared first on Xamarin Blog.

Details

Xamarin: Creating Animations with Xamarin.Forms

Animations are a great way to add polish to your user interface and make your app stand out. Xamarin.Forms includes its own animation infrastructure that allows for easy creation of simple animations, while also being versatile enough to create complex animations. The Xamarin.Forms animation classes target different properties of visual elements, with a typical animation […]

The post Creating Animations with Xamarin.Forms appeared first on Xamarin Blog.

Details