Make the pixels match the template.
User photos are hand-held — tilted, zoomed, slightly rotated. Before anything else, SIFT feature-matches both frames against the reference and warps them into template space. Below: the raw phone photo (left) and the reference template (right), with green lines connecting the keypoints that survived Lowe’s ratio test and RANSAC.

