'Vision Recognized Object results into ARView as AnchorEntity

I get ARFrame's from the session delegate of an ARView where I then perform inference with CoreML + Vision using a YOLOv5 model. I successfully get an array of [VNRecognizedObjectObservation]'s

I pass these observations to a function like this:

func add(inferenceResults: [VNRecognizedObjectObservation], from frame: ARFrame) {
    
    for inference in inferenceResults {
        //NOTE: 1
        let flippedNormalizedBoundingBox = inference.boundingBox.flipYCoordinateFromBottomLeftToUpperLeft
        
        let point = flippedNormalizedBoundingBox.center()
        let label = inference.labels.first?.identifier ?? "Unknown"
        
        //PROBLEM: 1
        guard arView.entity(at: point) == nil else {
            break
        }
        
        
        let estimatedPlane = ARRaycastQuery.Target.estimatedPlane
        let alignment = ARRaycastQuery.TargetAlignment.any
        
        //NOTE: 2
        let raycastQuery = frame.raycastQuery(from: point, allowing: estimatedPlane, alignment: alignment)
        guard let raycastResult = arView.session.raycast(raycastQuery).first else {
            print("No Ray cast results")
            break
        }
        
        let newAnchor = AnchorEntity(world: raycastResult.worldTransform)
        
        //PROBLEM: 2
        let squareMaterial = SimpleMaterial(color: .blue, isMetallic: true)
        let textMaterial = SimpleMaterial(color: .white, isMetallic: true)
        let squareEntity = ModelEntity(mesh: MeshResource.generatePlane(width: 0.1, height: 0.1, cornerRadius: 0), materials: [squareMaterial])
        let textMesh = MeshResource.generateText(label, extrusionDepth: 0.1, font: .systemFont(ofSize: 2), containerFrame: .zero, alignment: .center, lineBreakMode: .byCharWrapping)
        let textEntity = ModelEntity(mesh: textMesh, materials: [textMaterial])
        textEntity.scale = SIMD3<Float>(0.03, 0.03, 0.1)
        squareEntity.addChild(textEntity)
        newAnchor.name = label
        newAnchor.addChild(squareEntity)
        //PROBLEM 3
        self.arView.scene.addAnchor(newAnchor)
    }
}

Some extensions

extension CGRect {
    /// This will change the Y origin from the lower left corner to the upper left corner
    public var flipYCoordinateFromBottomLeftToUpperLeft: CGRect {
        return CGRect.init(x: self.origin.x, y: (1 - self.origin.y - self.height), width: self.width, height: self.height)
    }

    /// Returns a `CGPoint` that represents the center of the `CGRect`
    /// - Returns: A `CGPoint` constructed by obtaining the `midX` and `midY` values
    public func center() -> CGPoint {
        let midY = self.midY
        let midX = self.midX
        let point = CGPoint(x: midX, y: midY)
        return point
    }
}

I end up getting results like this

enter image description here

NOTE 1: BBOX's from vision are normalized and have an odd origin.

PROBLEM 1: Because I can do inference quickly I don't want to keep adding AnchorEntities at the same location. This is an attempt to stop further processing but it does not ever break

NOTE 2: I know there is a rayCast function from the ARView but it seems like I want to use the rayCast function from the ARFrame I speculate that after a few milliseconds of inference on a background thread the results may be different depending on which object I do the recast from? Because the user moved?

PROBLEM 2: My AnchorEntities are alway black

PROBLEM 3: The text and BBOX is never aligned with the camera. "Billboard style"

In general I would like to apply a square with a label in AR that was reflective of the size of the BBOX from vision. I need to get past these few problems first before I refine to that level. Any help is appreciated! AR is Fun.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source