I am trying to use an RGB-D camera to detect 3D bounding boxes of books on a shelf.
Here are some examples of possible scenarios: books on a shelf 1 or books on a shelf 2. I would like to infer the dimensions (length, width and height) and the pose of each book and of the shelf. So approximate each of the element in the scene with a 3D bounding box.
I would like to get suggestions on which computer vision or image processing techniques are best suited to implement a solution to my problem. I am thinking of YOLO 5 or the following paper:
Deng, Z. and Jan Latecki, L., 2017. Amodal detection of 3d objects: Inferring 3d bounding boxes from 2d ones in rgb-depth images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 5762-5770).
Are there any better or easier alternatives or already implemented packages?
Thank you all very much!
question from:
https://stackoverflow.com/questions/65934659/how-to-create-3d-bounding-boxes-of-books-on-a-shelf-from-rgb-d-images 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…