General object recognition and image understanding is one of the hardest problems for computer visionand multimedia retrieval. By taking the discovery of general objects in large image set as the pivot, thispaper proposes GORIUM to solve the general object recognition and image understanding problem in anunified framework. GORUIM is a four-layer bottom-up model. For the lower two layers, we have proposedand implemented several salient region detection and segmentation algorithms which can precisely extractvisual objects from any image. For the third layer, we have also proposed an unsupervised approach toautomatically discover general objects from large image set by pairwise matching with local invariantfeatures. On these bases, visual dictionary construction can be easily implemented by using existingmatured algorithms. As a consequence, a general object and image understanding machine (short forGORIUMachine) could be turned out in the near future.