I guess there are many variables, each have their own degree of estimated accuracy:
1) The offset, orientation, FoV, focal point and other attributes of the view.
2) The estimated depth of a matchpoint in each view (giving its relative position)
3) The orientation, scale and position of the point cloud, given by the user
1 and 2 are linked, where matchpoints exist in images from a greater range of location and zoom in the same point cloud, their estimated accuracy increases.
3 is problematic, it depends on user input and is therefore always wrong! I guess the positions have to be relative to something, some form of reference point and the best option here would be to have the bird's eye photography from Bing Maps being the glue that holds everything together, while the rest is flexible.