AVFoundation. Capturing photo using AVCaptureSession iOS 31.12.2023

UIImagePickerController provides a straight way to take a picture. It supports all the basic features, such as choosing source of camera (front, back), tapping on an area to lock focus and exposure and very simple editing features.

However, when direct access to the camera is necessary, the AVFoundation framework gives full control, for example, for changing the hardware parameters programmatically, or manipulating the live preview.

So, AVFoundation, along with a couple of other frameworks, is the gateway to the camera, the microphone, and the multimedia support in iOS. AVFoundation allows us to combine inputs (such as camera or microphone) and outputs (such as UIImage of a a video file) which we can then use for any purpose.

Here are the main classes of AVFoundation framework:

  • AVCaptureSession. An object that manages capture activity and coordinates the flow of data from input devices to capture outputs. To perform real-time capture, you instantiate an AVCaptureSession object and add appropriate inputs and outputs. When session is ready, we need to provide a source of data. For this purpose we create an instance of AVCaptureDevice.
  • AVCaptureDevice. It is the interface to the hardware camera and is used to control the hardware features such as the position of the lens, the exposure, and the flash.
  • AVCaptureDeviceInput. It provides the data coming from the device.
  • AVCaptureOutput. It is an abstract class describing the result of a capture session. There are concrete subclasses to capture a still image or capture the raw frames for a live preview.
  • AVCaptureVideoPreviewLayer. The preview layer is just a CALayer you can create from a capture session, and add it as a sublayer into your view. All it does is present to you the video that is running through the capture session.

Let's look at AVFoundation capture process in general. You can think of it as a pipeline from hardware to software. You have a central AVCaptureSession that has inputs and outputs. It mediates the data between the two. Your inputs come from AVCaptureDevice which is software representations of the different audio/visual hardware components of an iOS device. The AVCaptureOutput extract data from whatever is feeding into the the capture session.

Now, let’s do some code to capture the photo using AVCaptureSession.

Before setting up the capture session, add the camera permission key Privacy — Camera Usage Description into the plist file. Make sure it is required to access the camera.

<key>NSCameraUsageDescription</key>
<string>Accessing your camera to take photo.</string>
<key>NSPhotoLibraryUsageDescription</key>
<string>We are accessing your photos.</string>

Here is example of manual photo capturing and saving to the Gallery or recognizing a text on the photo (just uncomment self?.recognizeImage(image!)).

import UIKit
import AVFoundation
import Vision

class AVTestViewController: UIViewController {
    let session = AVCaptureSession()
    let output = AVCapturePhotoOutput()
    var previewLayer = AVCaptureVideoPreviewLayer()

    let avQueue = DispatchQueue(label: "AVQueue", qos: .userInitiated)

    let shutterButton: UIButton = {
        let v = UIButton(frame: CGRect(x: 0, y: 0, width: 100, height: 100))
        v.layer.cornerRadius = 50
        v.layer.borderWidth = 10
        v.layer.borderColor = UIColor.white.cgColor
        return v
    }()

    override open var preferredInterfaceOrientationForPresentation: UIInterfaceOrientation {
        return .portrait
     }

     override var shouldAutorotate: Bool {
         return true
     }

     override var prefersStatusBarHidden: Bool {
         return true
     }

    override func viewDidLoad() {
        super.viewDidLoad()

        view.layer.addSublayer(previewLayer)

        view.addSubview(shutterButton)
        shutterButton.addTarget(self, action: #selector(handleTap), for: .touchUpInside)

        checkPermission()
    }

    override func viewDidLayoutSubviews() {
        super.viewDidLayoutSubviews()
        shutterButton.center = CGPoint(x: view.frame.size.width / 2, y: view.frame.size.height - 100)
        updatePreviewLayerFrame()
    }

    func setupCamera() {
        if let device = getDevice() {
            do {
                session.sessionPreset = .photo

                let input = try AVCaptureDeviceInput(device: device)
                if session.canAddInput(input) {
                    session.addInput(input)
                }

                if session.canAddOutput(output) {
                    session.addOutput(output)
                }

                previewLayer.videoGravity = .resizeAspectFill
                previewLayer.session = session

                updatePreviewLayerFrame()
                avQueue.async { [weak self] in
                    self?.session.startRunning()
                }
            } catch {
                print(error.localizedDescription)
            }
        }
    }

    func getDevice() -> AVCaptureDevice? {
        let anyDevice = AVCaptureDevice.default(for: .video)

//        if let device = AVCaptureDevice.default(.builtInWideAngleCamera, for: .video, position: .back) {
//            anyDevice = device
//        } else {
//            fatalError("no back camera")
//        }
//        
//        //get front camera
//        if let device = AVCaptureDevice.default(.builtInWideAngleCamera, for: .video, position: .front) {
//            anyDevice = device
//        } else {
//            fatalError("no front camera")
//        }

        return anyDevice
    }

    func checkPermission() {
        switch AVCaptureDevice.authorizationStatus(for: .video) {
        case .notDetermined:
            AVCaptureDevice.requestAccess(for: .video) { granted in
                guard granted else { return }

                DispatchQueue.main.async { [weak self] in
                    self?.setupCamera()
                }
            }
        case .restricted:
            break
        case .denied:
            break
        case .authorized:
            setupCamera()
        @unknown default:
            break
        }
    }

    @objc func handleTap() {
        let myShotOrientation = UIDevice.current.orientation.asCaptureVideoOrientation
        if let photoOutputConnection = output.connection(with: .video) {
                photoOutputConnection.videoOrientation = myShotOrientation
            }

        let photoSettings = AVCapturePhotoSettings()
        //photoSettings.isHighResolutionPhotoEnabled = true
        photoSettings.flashMode = .auto
        output.capturePhoto(with: photoSettings, delegate: self)

        let generator = UIImpactFeedbackGenerator(style: .light)
        generator.impactOccurred()
    }

    private func updatePreviewLayerFrame() {
         let orientation = UIDevice.current.orientation

         if let connection = previewLayer.connection,
            let videoOrientation = AVCaptureVideoOrientation(rawValue: orientation.rawValue) {
             previewLayer.frame = view.bounds
             connection.videoOrientation = videoOrientation
             previewLayer.removeAllAnimations()
         }
     }
}

extension AVTestViewController: AVCapturePhotoCaptureDelegate {
    func photoOutput(_ output: AVCapturePhotoOutput, didFinishProcessingPhoto photo: AVCapturePhoto, error: Error?) {
        session.stopRunning()

        guard let data = photo.fileDataRepresentation() else { return }
        let image = UIImage(data: data)
        let imageView = UIImageView(image: image)
        imageView.contentMode = .scaleAspectFill
        imageView.frame = view.bounds

        DispatchQueue.main.async { [weak self] in
            self?.view.addSubview(imageView)
            // Save to the Gallery
            UIImageWriteToSavedPhotosAlbum(image!, nil, nil, nil)

            // Recognize text
            //self?.recognizeImage(image!)
        }
    }

    func recognizeImage(_ image: UIImage) {
        let request = VNRecognizeTextRequest { (request, error) in
            guard let observations = request.results as? [VNRecognizedTextObservation] else {
                print("Error recognizing text: \(String(describing: error))")
                return
            }

            for observation in observations {
                guard let topCandidate = observation.topCandidates(1).first else { continue }
                print("Recognized text: \(topCandidate.string)")
            }
        }

        guard let cgImage = image.cgImage else { return }
        let handler = VNImageRequestHandler(cgImage: cgImage, options: [:])

        do {
            try handler.perform([request])
        } catch {
            print("Error performing Vision request: \(error)")
        }
    }
}

extension UIDeviceOrientation {
    var asCaptureVideoOrientation: AVCaptureVideoOrientation {
        switch self {
        case .landscapeLeft: return .landscapeRight
        case .landscapeRight: return .landscapeLeft
        case .portraitUpsideDown: return .portraitUpsideDown
        default: return .portrait
        }
    }
}