this paper proposes a method to detect human and then track it with only depth image. But not only, this is the first paper I have read in this domain using Kinect sensor…
why depth cue is so confident:
objects may not have consistent color and texture but must occupy an integrated region in space.
object contours detection:
observe the depth array, it exists salient gradients between different distance, so use Canny edge detector, it’s easy get the edges. In the paper, authors also eliminate small edges simply count the pixels contained in an edge.
Computer is just a machine, without knowledge base, it cannot recognizes the object as human^-^. However, may only human has the unique head model? I am not sure. In this paper, authors use head binary template to identify the “object as human”
As last step claimed, “object as human” but not means the object is human indeed. In this paper, authors use depth array to model 3D head model (estimate parameters), and fit the detected regions (by matching to head binary template), and calculate the square error bettern the region and 3D model, remain the accurate region, as real human. (how about gorilla^-^)
extract integrated human shape:
there are several cases when simple edge detector failed. a) other Non-human objects approach to human; b) depth of foot and ground are the same. In this paper, so called region growing algorithm is developed to extract the whole human body. It’s simple to understand. As above steps can mostly success identify the head, so initially use the identified region as seed, and compute the depth mean value, scan the neighboring pixels, calculate the similarity. highest ones are the winner. grow the region, and re-calculate the mean value, and repeat…
human movement between neighboring frames should be smooth. Calculate the 3D coordinates and speed (difference between center of detected blobs). In this paper, they defined an energy function, to calculate the movement smooth, by two data terms (3D coordinates and speed), smallest energy means same person.
compare with Sho Ikemura, etc.”Real-Time Human Detection using Relational Depth Similarity Features” ACCV 2010
Precision Recall Accuracy
this paper 100% 96.0% 98.4%
Ikemura’s 90.0% 32.9% 85.8%