'Vision Recognized Object results into ARView as AnchorEntity
I get ARFrame's from the session delegate of an ARView where I then perform inference with CoreML + Vision using a YOLOv5 model. I successfully get an array of [VNRecognizedObjectObservation]'s
I pass these observations to a function like this:
func add(inferenceResults: [VNRecognizedObjectObservation], from frame: ARFrame) {
for inference in inferenceResults {
//NOTE: 1
let flippedNormalizedBoundingBox = inference.boundingBox.flipYCoordinateFromBottomLeftToUpperLeft
let point = flippedNormalizedBoundingBox.center()
let label = inference.labels.first?.identifier ?? "Unknown"
//PROBLEM: 1
guard arView.entity(at: point) == nil else {
break
}
let estimatedPlane = ARRaycastQuery.Target.estimatedPlane
let alignment = ARRaycastQuery.TargetAlignment.any
//NOTE: 2
let raycastQuery = frame.raycastQuery(from: point, allowing: estimatedPlane, alignment: alignment)
guard let raycastResult = arView.session.raycast(raycastQuery).first else {
print("No Ray cast results")
break
}
let newAnchor = AnchorEntity(world: raycastResult.worldTransform)
//PROBLEM: 2
let squareMaterial = SimpleMaterial(color: .blue, isMetallic: true)
let textMaterial = SimpleMaterial(color: .white, isMetallic: true)
let squareEntity = ModelEntity(mesh: MeshResource.generatePlane(width: 0.1, height: 0.1, cornerRadius: 0), materials: [squareMaterial])
let textMesh = MeshResource.generateText(label, extrusionDepth: 0.1, font: .systemFont(ofSize: 2), containerFrame: .zero, alignment: .center, lineBreakMode: .byCharWrapping)
let textEntity = ModelEntity(mesh: textMesh, materials: [textMaterial])
textEntity.scale = SIMD3<Float>(0.03, 0.03, 0.1)
squareEntity.addChild(textEntity)
newAnchor.name = label
newAnchor.addChild(squareEntity)
//PROBLEM 3
self.arView.scene.addAnchor(newAnchor)
}
}
Some extensions
extension CGRect {
/// This will change the Y origin from the lower left corner to the upper left corner
public var flipYCoordinateFromBottomLeftToUpperLeft: CGRect {
return CGRect.init(x: self.origin.x, y: (1 - self.origin.y - self.height), width: self.width, height: self.height)
}
/// Returns a `CGPoint` that represents the center of the `CGRect`
/// - Returns: A `CGPoint` constructed by obtaining the `midX` and `midY` values
public func center() -> CGPoint {
let midY = self.midY
let midX = self.midX
let point = CGPoint(x: midX, y: midY)
return point
}
}
I end up getting results like this
NOTE 1: BBOX's from vision are normalized and have an odd origin.
PROBLEM 1: Because I can do inference quickly I don't want to keep adding AnchorEntities at the same location. This is an attempt to stop further processing but it does not ever break
NOTE 2: I know there is a rayCast function from the ARView but it seems like I want to use the rayCast function from the ARFrame I speculate that after a few milliseconds of inference on a background thread the results may be different depending on which object I do the recast from? Because the user moved?
PROBLEM 2: My AnchorEntities are alway black
PROBLEM 3: The text and BBOX is never aligned with the camera. "Billboard style"
In general I would like to apply a square with a label in AR that was reflective of the size of the BBOX from vision. I need to get past these few problems first before I refine to that level. Any help is appreciated! AR is Fun.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|

