Chapter 3: Feature Detection

The first step is what is actually based on the SfM theorem. It needs to solve a problem: Recognize the 3D shape of an object that is depicted on several images from different directions. SfM software uses multiple 2D digital images to calculate the precise location of the camera positions from which the images have been captured which can then be used to triangulate the precise 3D position of individual pixels recorded in multiple overlapping frames. SfM software relies upon algorithms that in the first stage identify individual edges, boundaries or key pixels (called Features), identified within multiple photographs taken whilst moving around the object or view being recorded. These individual points or pixels identified in a minimum of 3 images are used to create a Sparse Point Cloud of the key points and use these to triangulate the precise camera positions (Powlesland 2016: 21). This is called Feature Detection. Without the features, the images can not be matched and therefore no reconstruction can take place. There are a lot of different approaches to that, but at the moment, an algorithm called Scale-invariant Feature Transform (henceforth: SIFT) is the most popular one (Lowe 1999: 2004).

The SIFT algorithm works in different steps. SIFT allows the relative position of the feature to shift dramatically with only small changes in the Descriptor (Carrivick/Smith/Quincey 2016: 40). A descriptor is a unique code that is able to describe a feature in an image in a way, that it is not dependent on orientation or illumination. It is therefore able to detect individual features and also find them on other photos as well, because each feature gets the same descriptor in each photo, no matter of the orientation or lighting conditions. This is also the reason why SfM works best with high-contrast objects.

After detecting the features among a set of photos, the SIFT algorithm searches for partners and discards these features that have none or too less. There are different approaches to this problem, but basically it ends with a list of matched features on a set of given photos.

Starting the process

First, you need to start the application, which you should find under the Start Menu/Agisoft, where you should find the executable Agisoft Metashape Standard (64 bit) (or similar). When the software has started, you'll find an empty Workspace on the left, the grey Perspective Model View on the right and Tools on top of the screen. To add your colour corrected photos, you press the Add photo-Button on top of your workspace or go into the Workflow menu and click on Add photos. A file dialog appears where you can navigate to your data set. Select all the pictures from that folder at once by clicking on one and pressing CTRL+A/STRG+A. Open the images by clicking on the Open-button. At the bottom of your screen, you'll see three tabs: Photos, Console, Jobs. Click on Photos. You should now see the opened images in the bottom of your screen.


We will now start the aligning process. Please open the Workflow menu on top and select Align Photos…. A settings box appears. For this example, we will use a High Accuracy and Generic Pair Preselection. Hit the OK-button and let the computer do all the work.

Additional information: High Accuracy means, that the software will process the photos in their original resolution. Medium Accuracy will downscale the photos by a factor of 4, Low by a factor of 16 and Lowest by a factor of 64. Highest Accuracy however will upscale the image by a factor of 4. This is only recommend for very sharp and professional pictures. Needless to say, that the higher the accuracy, the longer the calculation. In Generic preselection, the software will pre-align the photos in the lowest accuracy setting first and then search for common features in the photos already aligned roughly. In Reference Preselection/Sequential, the feature detection will only occur in neighbouring photos and not in every photo. This is especially helpful, when you did take the photos sequentially.

Rotating and cleaning the Sparse Point Cloud

The result of the first aligning step should look a little bit like this. As we work with a non-referenced coordinate system, the model is somewhere in virtual space. We will first try to move it into the centre of the screen and orient it correctly. To do so, we first want to get rid of these blue boxes. The blue boxes represent the positions taken of the photos. To remove them, simply click on the small Camera Icon in the top menu. To move the object, use the Move tools from the top menu and the mouse wheel to zoom. You can also press the mouse wheel to move the object. Use the Navigation tool to move your view and the Move Object tool to move the object into the world centre. You can rotate around the object by clicking on the rotation sphere in the centre of the screen. Try to move the object inside that sphere, as you can see in the last screenshot. If you position your object correctly, you will have it way easier to navigate around the object when adjusting the Bounding Box.

Cleaning unnecessary points

If you zoom a little bit out, you see a lot of points from the background, that we do not need anymore. We can either select and delete them by hand or we simply adapt our Bounding Box around the object. Only points within the Bounding Box will get processed. You should already have seen the box around the object, as Metashape automatically tries to estimate the object of interest. We still need to make sure, that all necessary points will be inside that box, that includes also parts of the table where we put our scale. Try to navigate around the object to see it from all sides and use the Move, Resize and Rotate Region tools to adapt the box around the object (and the scale!). Make it as small as possible, without compromising parts of the object. The result of the cleaning and positioning should look like this. Don't worry, if it does not exactly look like the image here, but try to get as close as possible.


  • Carrivick, Jonathan L., Mark W. Smith, und Duncan J. Quincey. 2016. Structure from Motion in the Geosciences. Chichester: Wiley Blackwell.
  • Powlesland, Dominic 2016: “3Di - Enhancing the Record, Extending the Returns, 3D Imaging from Free Range Photography and Its Application during Excavation.” In: The Three Dimensions of Archaeology. Proceedings of the XVII UISPP World Congress (1–7 September 2014, Burgos, Spain), edited by Hans Kamermans, Wieke de Neef, Chiara Piccoli, Axel G. Posluschny, and Roberto Scopigno, 13–32. Volume 7/Sessions A4b and A12. Oxford: Archaeopress.
  • Lowe, David G. 1999. „Object Recognition from Local Scale-Invariant Features“. In Proceedings of the International Conference on Computer Vision, Corfu (Sept. 1999), 1–8.
  • Lowe, David G. 2004. „Distinctive Image Features from Scale-Invariant Keypoints“. International Journal of Computer Vision, 1–28.

This page was last edited on 2024-04-11 14:20

Powered by Wiki|Docs

This page was last edited on 2024-04-11 14:20

Sebastian Hageneuer
CC BY-NC-SA 4.0 Deed

Powered by Wiki|Docs