Home Gestures Blue_Background Reference Edit Guide Teams-all

Controlling a Computer by Hand Gesture

Study group of Sydney Machine Learning

This study group is formed to study the Harvard CS109 among the members of Sydney Machine Learning Meetup. ( Youtube channel https://www.youtube.com/channel/UCcZ5Sy4JzVUaiD1ZYRGDM0g/videos )

Members were re-grouped into 10 teams and completed their own interesting projects. My team was “team echo” named after the prize Amazon Echo.

Project

Among 10 teams, we won the 1st prize ( and Amazon Echo ( Alexa ) as an award ) along with one other team ( DeepAI ).

Brief Introduction of this project

Almost people use a desktop or laptop these days. One of serious problem is that we are stuck to the keyboard and mouse, which will cause a serious health problem on a long run. Moreover in VR/AR age, we cannot use keyboard/mouse. Our purpose is to replace a keyboard and mouse with hand gestures. We have devised a virtual keyboard and virtual mouse with subtle hand gestures and made ML recognise our gestures so that we can control our computer remotely. Amazon echo has ear now. It will have eye in future. We need to make a standard gestures for people to adopt easily like a standard keyboard and mouse. ML will address this for us, human. We used Deep Learning as ML in this project.

Introduction

People have used a keyboard and mouse as a standard way of input to computer for a long time. One of the serious problem of using keyboard and mouse is that we are stuck to the keyboard/mouse and sit still while we are using our desktop or laptop computer. This causes a serious health problem on a long run.

Moreover we are entering the VR ( Virtual reality )/ AR ( Augmented reality ) age and we cannot use keyboard/mouse as we did before. We have to use some kind of mobile controller or GESTURE !!!

ML( Machine Learning ) can recognise our gesture and control the computer as we intented by gesture remotely. We will feel like a magician. And by moving our body to control the computer, we can avoid sitting still and getting a health problem. Moreover we can play a game with much more immersive experience by moving our hands, heads and body.

To replace a keyboard and mouse, we have to devise many subtle gestures and these gestures have to be easy one for people to learn easily. And these gestures should be a standard like a standard keyboard and mouse. It may be annoying if we have to learn different gestures to control different devices in future.

We suggest a standard hand gestures like follows. The goal of this project is to search a possibility of using these subtle hand gestures to control a computer up to the level to replace a keyboard and mouse completely.

Recent works in this field

  1. Carnegie Mellon University OpenPose: webcam

  2. Microsoft hololens: 3D sensor

  3. Microsoft hand tracking: 3D sensor

  4. Leap Motion: 3D sensor

  5. Mano Motion: smartphone camera

  6. PilotBit Mobile Hand Tracking: 3D sensor

Challenging points

Work flow

  1. searching github for open sources and youtube for useful information.

  2. defining as many hand gestures as possible to cover all the keys on a keyboard and mouse.

  3. coding for tools to create as many data sets as possible in short time.

  4. coding to use a deep learning model which can recognise these hand gestures at real time.

  5. coding for tracking and capturing hand

  6. coding for some demos such as basic calculation or a game.

Tools used for Team work

  1. github site for project page: https://github.com/whatifif/handgesture
  2. github site for coding work : https://github.com/whatifif/handgesturecode
  3. slack for team communication and instant file sharing: https://sml109.slack.com

Technical Details

Main Dependencies:

Hardwares and softwares

Hand Gestures as a Standard Way of Input like a Keyboard and a Mouse

We have two hands normally. We can use one hand for keyboard and the other for a mouse. The standard gestures should be easy for people to learn easily. Just imagine that there is a virtual keyboard in front of your left side and virtual mouse on your right side. Lets focus on virtual keyboard first.

We divided the left side into three regions

  1. left region
  2. middle region
  3. right region

We can make 10 easy distinct gestures on each region:

  1. closed hand
  2. open hand
  3. thumb only
  4. additional index finger
  5. additional middle finger
  6. folding first finger
  7. folding second finger
  8. folding middle finger
  9. folding index finger
  10. folding thumb

If we use 6, 7, 8, 9, 10 as inputs, we have 5 * 3 = 15 different gestures in each region and 15 * 3 = 45 gestures in total region.

If we use 3, 4, 5 as control, we have 3 * 3 = 9 controls and 45 * 9 = 135 different gestures which will cover the whole range of keys ( numbers, lower alphabets, upper alphabets, special keys and controls )

For a right hand as a mouse, we can have 10 same gestures as left hand, which cover whole inputs from mouse. Our center of hand is a mouse cursor. The mouse cursor of computer will track the center of the right hand. There is only one region on the right side for a mouse.

For a left-handed people, of course we can swap the left with the right.

Making a data set

To train the ML, several thousand data are needed and these data are to be prepared by ourselves. So we have to make a program for capturing the hand images easily. With the capturing program, about 2000 hand images were captured in short period.

Detecting and tracking Hand

Since we are moving our hands freely in front of webcam, our hands should be detected and tracked correctly in the frame of webcam image at real time. Haar cascade, background substraction and skin color detection were tried to track a hand. Skin color detection was found to be stable.

detection region for hand and mouse

Since our face has a same skin color as our hands, we have to find ways to ignore the face. Haar cascade can be applied for this purpose. But due to the time limitation of this project, we simply defined a detection region and tried not to push our face into that region.

See the blue_background used to capture hand images

tracking hands

Since we use a skin color for tracking a hand, background color and our shirts color should be a contrasting color to skin color. And we have to wear a long sleeve shirt to hide our arm from detection also.

The environment light affects skin color significantly. So bright room was avoided. And a blue screen made with blue color table cloth was used as a background to get a good data. It turns out that the whiteboard is a good background also.

Deep Learning Model to recognise the gesture

Difficulties

Why MxNet is chosen as a Deep Learning Model of this project?

How can the trained model be transferred to work at mobile devices such as smartphone?

Two ways:

  1. The trained model can be install onto the smartphone, and does the prediction in the client side. In this case, the model needs to be periodically updated, and might require a large update pacakage of the application

  1. The trained model stays at the application server, while keeping the detection utility and necessary preprocessing in the client side to reduce the workload overhead on the server. Therefore, the model can be periodically updated without impact on the client application much.

Main Work Flow:

Model Details

Summary of Progress

Demo

Future Work

P.S. What will be the future of human in AI (Artificial Intelligence) age?

What will be the future of us, human in AI age? There will be no work which human have to do for living. We may have Universal Basic Income and have freedom to do what we like to do. AI may make an utopian world for human. Then to reach the utopian world as soon as possible, why not cooperate in developing AI rather than compete for limited resources and be greedy? By using technology including AI, we can make our resources rich enough to be used by all human. Human will be multiplanetary species in future and there are infinite resources out there in universe. Lets make AI work for us, human and lets enjoy our lives as a human being.