Using Vision framework for Facial Recognition in iOS iOS 03.03.2022

Vision framework can recognize objects in pictures such as faces. Vision API is only available since iOS 11. Some of useful possibilities of that framework:

  • can identify the number of faces in a picture
  • can find position of face

First step is drag and drop a picture into the Assets folder. Make sure that picture contains a face.

Add some views to the ContentView

import SwiftUI
import Vision

struct ContentView: View {
    @State var img = UIImage(named: "face")!

    var body: some View {
        NavigationView {
            VStack {
                Image(uiImage: img)
                    .resizable()
                    .aspectRatio(contentMode: .fit)
                    .frame(width: 200, height: 200)

                Button {
                    recognizeImage(image: img)
                } label: {
                    Text("Recognize")
                }.padding()
            }
        }.navigationTitle("Facial Recognition")
    }
}

The Image view displays an image stored in the Assets folder. The .resizable() modifier makes sure the image can be resized, the .aspectRatio(contentMode: .fit) makes the image fit within the frame of the Image view, and the .frame(width: 200, height: 200) modifier defines the size of the Image view.

When the user selects this Button, it will call a function called recognizeImage.

Add following functions

func recognizeImage(image: UIImage) {
    let handler = VNImageRequestHandler(cgImage: image.cgImage!, options: [:])
    let request = VNDetectFaceRectanglesRequest(completionHandler: handleRecognize)
    try! handler.perform([request])
}

func handleRecognize(request: VNRequest, error: Error?) {
    guard let results = request.results as? [VNFaceObservation] else {
        fatalError ("Can't find a face in the picture")
    }
    print("Found \(results.count) face(s)")
}

This function uses VNDetectFaceRectanglesRequest in the Vision framework to detect faces in a image. After it analyzes a picture, it runs another function called handleRecognize.

This handleRecognize function simply displays the number of faces. Keep in mind that the facial recognition feature may not always be accurate.

Highlight face

You can also highlight each face with a rectangle to show you exactly which parts of a picture the Vision framework recognized as a face. Vision framework provides a landmark data which containts information about face position.

func handleRecognizeAndHighlight(request: VNRequest, error: Error?) {
    guard let results = request.results as? [VNFaceObservation] else {
        fatalError ("Can't find a face in the picture")
    }
    print("Found \(results.count) face(s)")
    DispatchQueue.global().async {
        for landmark in results {
            highlightFace(with: landmark)
        }
    }
}

func highlightFace(with landmark: VNFaceObservation) {
    let source = img
    let boundary = landmark.boundingBox

    UIGraphicsBeginImageContextWithOptions(source.size, false, 1)

    let context = UIGraphicsGetCurrentContext()!
    context.setShouldAntialias(true)
    context.setAllowsAntialiasing(true)
    context.translateBy(x: 0, y: source.size.height)
    context.scaleBy(x: 1.0, y: -1.0)
    context.setLineJoin(.round)
    context.setLineCap(.round)

    let rect = CGRect(x: 0, y:0, width: source.size.width, height: source.size.height)
    context.draw(source.cgImage!, in: rect)

    let fillColor: UIColor = .green
    fillColor.setStroke()

    let rectangleWidth = source.size.width * boundary.size.width
    let rectangleHeight = source.size.height * boundary.size.height
    context.setLineWidth(5)
    context.addRect(CGRect(x: boundary.origin.x * source.size.width, y:boundary.origin.y * source.size.height, width:
                            rectangleWidth, height: rectangleHeight))
    context.drawPath(using: CGPathDrawingMode.stroke)
    let highlightedImage : UIImage = UIGraphicsGetImageFromCurrentImageContext()!
    UIGraphicsEndImageContext()
    DispatchQueue.main.async {
        img = highlightedImage
    }
}

Centering image on a face

So far we can identify a face on a picture and process it position. Let's extend our knowledge and add posibility to crop the accordingly to lendmark (face position data). Initial solution.

Extension to UIImage

enum FaceCropError: Error {
    case notFound
    case unknown(Error)
}

extension UIImage {
    func cropToFace(completion: @escaping (Result<UIImage, FaceCropError>) -> Void) {
        let req = VNDetectFaceRectanglesRequest { request, error in
            if let error = error {
                completion(.failure(.unknown(error)))
                return
            }

            guard let results = request.results as? [VNFaceObservation] else {
                completion(.failure(.notFound))
                return
            }

            let rect = self.getRectangle(for: results)

            guard let cgImg = self.cgImage, let result = cgImg.cropping(to: rect) else {
                completion(.failure(.notFound))
                return
            }

            completion(.success(UIImage(cgImage: result)))
        }

        do {
            try VNImageRequestHandler(cgImage: self.cgImage!, options: [:]).perform([req])
        } catch let error {
            completion(.failure(.unknown(error)))
        }
    }

    private func getRectangle(for faces: [VNFaceObservation]) -> CGRect {
        let marginX: CGFloat = 50
        let marginY: CGFloat = 100

        let imageWidth = CGFloat(self.cgImage!.width)
        let imageHeight = CGFloat(self.cgImage!.height)

        var totalX: CGFloat = 0
        var totalY: CGFloat = 0
        var totalW: CGFloat = 0
        var totalH: CGFloat = 0

        var minX = CGFloat.greatestFiniteMagnitude
        var minY = CGFloat.greatestFiniteMagnitude
        let numFaces = CGFloat(faces.count)

        for face in faces {
            let w = face.boundingBox.width * imageWidth
            let h = face.boundingBox.height * imageHeight
            let x = face.boundingBox.origin.x * imageWidth
            let y = (1 - face.boundingBox.origin.y) * imageHeight - h

            totalX += x
            totalY += y
            totalW += w
            totalH += h
            minX = .minimum(minX, x)
            minY = .minimum(minY, y)
        }

        let avgX = totalX / numFaces
        let avgY = totalY / numFaces
        let avgW = totalW / numFaces
        let avgH = totalH / numFaces

        let offsetX = marginX + avgX - minX
        let offsetY = marginY + avgY - minY

        return CGRect(x: avgX - offsetX, y: avgY - offsetY, width: avgW + (offsetX * 2), height: avgH + (offsetY * 2))
    }
}

Usage

struct ContentView: View {
    @State var img = UIImage(named: "face")!

    var body: some View {
        NavigationView {
            VStack {
                Image(uiImage: img)
                    .resizable()
                    .aspectRatio(contentMode: .fit)
                    .frame(width: 200, height: 200)

                Button {
                    cropToFace()
                } label: {
                    Text("Recognize")
                }.padding()
            }
        }.navigationTitle("Facial Recognition")
    }

    func cropToFace() {
        DispatchQueue.global().async {
                img.cropToFace { result in
                    switch result {
                    case .success(let image):
                        DispatchQueue.main.async {
                            img = image
                        }
                    case .failure(let error):                        
                        DispatchQueue.main.async {
                            print(error)
                        }
                    }
                }
            }
    }
}