TensorFlow 之构建人物识别系统

rensanning

浏览: 3512679 次
性别:
来自: 大连

最近访客更多访客>>

blogtester

lliiqiang

film

newer_fisher

博主相关

博客

微博

相册

留言

关于我

博客专栏

: 使用Titanium Mo...
浏览量：37457

: Cordova 3.x入门...
浏览量：604220

: 常用Java开源Libra...
浏览量：677887

: 搭建 CentOS 6 服...
浏览量：87192

: Spring Boot 入...
浏览量：399741

: 基于Spring Secu...
浏览量：69042

: MQTT入门
浏览量：90419

文章分类

社区版块

存档分类

博客分类：

TensorFlow

从零构建一个自己的人物识别CNN模型，识别图像里的人是谁。这里以识别SHE的Ella和Selina为例！

只是一个简单的示例，重在理解机器学习的过程，以及机器学习的难点，比如：
- 数据（样本的数量、样本的质量）
- 模型（构成、算法）
- 学习方法（节点初始值、学习率）

机器学习的前提是需要大量的训练样本，但获取一定规模的采样数据并逐个标记并不是那么的容易。大体过程如下：
1-采用爬虫根据指定关键字爬取图像（比如百度、谷歌）
2-根据需求对爬取来的图像做特殊处理（比如通过OpenCV识别并裁剪出人脸）
3-排查并整理图像（筛选图像以及调整图像大小等）
4-整理标记文件
5-编写模型
6-训练数据
7-测试确认

版本
TensorFlow 1.2 + OpenCV 2.5

相关文章：
TensorFlow 之入门体验
TensorFlow 之手写数字识别MNIST
TensorFlow 之物体检测
TensorFlow 之构建人物识别系统

（1）文件结构

/usr/local/tensorflow/sample/tf-she-image

引用

├ ckpt [checkpoint文件]
├ data [学习结果]
├ eval_images [测试图像]
├ face [OpenCV提取的人脸]
│　├　ella
│　└　selina
├ original [从百度图片抓取的原始图像]
│　├　ella
│　└　selina
├ test [完成学习后检验模型精度的图像]
│　├　data.txt [图像路径及标记]
│　├　ella
│　└　selina
└ train [训练学习用图像]
　 ├　data.txt [图像路径及标记]
　 ├　ella
　 └　selina

（2）抓取图像

根据关键字从百度图片的搜索结果中抓取图片，Python抓取百度图片网上有很多可以参考的例子，都比较简单。由于要提供给机器作为样本学习，所以要尽可能多的抓取有脸部特征的高质量图像。

/usr/local/tensorflow/sample/tf-she-image/original/ella

/usr/local/tensorflow/sample/tf-she-image/original/selina

（3）提取人脸

作为人脸识别的样本，只需要面部的局部数据即可，所以抓下来的图需要特殊处理一下，通过OpenCV识别出图像中的人脸，提取并保存。

/usr/local/tensorflow/sample/tf-she-image/face/ella

/usr/local/tensorflow/sample/tf-she-image/face/selina

face_detect.py

import cv2
import numpy as np
import os.path

input_data_path = '/usr/local/tensorflow/sample/tf-she-image/original/ella/'
save_path = '/usr/local/tensorflow/sample/tf-she-image/face/ella/'
cascade_path = '/usr/share/OpenCV/haarcascades/haarcascade_frontalface_default.xml'
faceCascade = cv2.CascadeClassifier(cascade_path)

image_count = 16000

face_detect_count = 0

for i in range(image_count):
  if os.path.isfile(input_data_path + str(i) + '.jpg'):
    try:
      img = cv2.imread(input_data_path + str(i) + '.jpg', cv2.IMREAD_COLOR)
      gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
      face = faceCascade.detectMultiScale(gray, 1.1, 3)

      if len(face) > 0:
        for rect in face:
          x = rect[0]
          y = rect[1]
          w = rect[2]
          h = rect[3]

          cv2.imwrite(save_path + 'face-' + str(face_detect_count) + '.jpg', img[y:y+h, x:x+w])
          face_detect_count = face_detect_count + 1
      else:
        print('image' + str(i) + ': No Face')
    except Exception as e:
      print('image' + str(i) + ': Exception - ' + str(e))
  else:
      print('image' + str(i) + ': No File')

（3）整理图像

由于抓到的图质量，以及OpenCV识别率的问题，还需要重新筛选图像，只留下真正有明显面部特征的图像。

/usr/local/tensorflow/sample/tf-she-image/train/ella

/usr/local/tensorflow/sample/tf-she-image/train/selina

这一步是非常费时间的！因为提供的训练样本的质量越高识别就越准确。最终这里提取到ella 380张、selina 350张。所以要特别感谢那些已经开源的数据集的提供者们！

整理图像的labels文件：data.txt

引用

/usr/local/tensorflow/sample/tf-she-image/train/ella/ella-00001.jpg 0
/usr/local/tensorflow/sample/tf-she-image/train/ella/ella-00002.jpg 0
/usr/local/tensorflow/sample/tf-she-image/train/ella/ella-00003.jpg 0
/usr/local/tensorflow/sample/tf-she-image/train/ella/ella-00004.jpg 0
/usr/local/tensorflow/sample/tf-she-image/train/ella/ella-00005.jpg 0
/usr/local/tensorflow/sample/tf-she-image/train/ella/ella-00006.jpg 0
/usr/local/tensorflow/sample/tf-she-image/train/ella/ella-00007.jpg 0
/usr/local/tensorflow/sample/tf-she-image/train/ella/ella-00008.jpg 0
。。。
/usr/local/tensorflow/sample/tf-she-image/train/selina/selina-00344.jpg 1
/usr/local/tensorflow/sample/tf-she-image/train/selina/selina-00345.jpg 1
/usr/local/tensorflow/sample/tf-she-image/train/selina/selina-00346.jpg 1
/usr/local/tensorflow/sample/tf-she-image/train/selina/selina-00347.jpg 1
/usr/local/tensorflow/sample/tf-she-image/train/selina/selina-00348.jpg 1
/usr/local/tensorflow/sample/tf-she-image/train/selina/selina-00349.jpg 1
/usr/local/tensorflow/sample/tf-she-image/train/selina/selina-00350.jpg 1

*** 检验模型精度的图像可以选择和训练一样。

（4）编写模型

train.py

import sys
import cv2
import random
import numpy as np
import tensorflow as tf
import tensorflow.python.platform

NUM_CLASSES = 2

IMAGE_SIZE = 28

IMAGE_PIXELS = IMAGE_SIZE*IMAGE_SIZE*3

flags = tf.app.flags
FLAGS = flags.FLAGS

flags.DEFINE_string('train', '/usr/local/tensorflow/sample/tf-she-image/train/data.txt', 'File name of train data')

flags.DEFINE_string('test', '/usr/local/tensorflow/sample/tf-she-image/test/data.txt', 'File name of test data')

flags.DEFINE_string('train_dir', '/usr/local/tensorflow/sample/tf-she-image/data/', 'Directory to put the training data')

flags.DEFINE_integer('max_steps', 100, 'Number of steps to run trainer.')

flags.DEFINE_integer('batch_size', 20, 'Batch size Must divide evenly into the dataset sizes.')

flags.DEFINE_float('learning_rate', 1e-4, 'Initial learning rate.')

def inference(images_placeholder, keep_prob):
  def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial)

  def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)

  def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

  def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                          strides=[1, 2, 2, 1], padding='SAME')

  x_image = tf.reshape(images_placeholder, [-1, IMAGE_SIZE, IMAGE_SIZE, 3])

  with tf.name_scope('conv1') as scope:
    W_conv1 = weight_variable([5, 5, 3, 32])

    b_conv1 = bias_variable([32])

    h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)

  with tf.name_scope('pool1') as scope:
    h_pool1 = max_pool_2x2(h_conv1)

  with tf.name_scope('conv2') as scope:
    W_conv2 = weight_variable([5, 5, 32, 64])

    b_conv2 = bias_variable([64])

    h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)

  with tf.name_scope('pool2') as scope:
    h_pool2 = max_pool_2x2(h_conv2)

  with tf.name_scope('fc1') as scope:
    W_fc1 = weight_variable([7*7*64, 1024])
    b_fc1 = bias_variable([1024])
    h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])

    h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

    h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

  with tf.name_scope('fc2') as scope:
    W_fc2 = weight_variable([1024, NUM_CLASSES])
    b_fc2 = bias_variable([NUM_CLASSES])

  with tf.name_scope('softmax') as scope:
    y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)

  return y_conv

def loss(logits, labels):
  cross_entropy = -tf.reduce_sum(labels*tf.log(logits))

  tf.summary.scalar("cross_entropy", cross_entropy)

  return cross_entropy

def training(loss, learning_rate):
  train_step = tf.train.AdamOptimizer(learning_rate).minimize(loss)
  return train_step

def accuracy(logits, labels):
  correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))

  accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

  tf.summary.scalar("accuracy", accuracy)
  return accuracy

if __name__ == '__main__':

  f = open(FLAGS.train, 'r')
  train_image = []
  train_label = []

  for line in f:
    line = line.rstrip()
    l = line.split()

    img = cv2.imread(l[0])
    img = cv2.resize(img, (IMAGE_SIZE, IMAGE_SIZE))

    train_image.append(img.flatten().astype(np.float32)/255.0)

    tmp = np.zeros(NUM_CLASSES)
    tmp[int(l[1])] = 1
    train_label.append(tmp)

  train_image = np.asarray(train_image)
  train_label = np.asarray(train_label)
  f.close()

  f = open(FLAGS.test, 'r')
  test_image = []
  test_label = []
  for line in f:
    line = line.rstrip()
    l = line.split()
    img = cv2.imread(l[0])
    img = cv2.resize(img, (IMAGE_SIZE, IMAGE_SIZE))
    test_image.append(img.flatten().astype(np.float32)/255.0)
    tmp = np.zeros(NUM_CLASSES)
    tmp[int(l[1])] = 1
    test_label.append(tmp)
  test_image = np.asarray(test_image)
  test_label = np.asarray(test_label)
  f.close()

  with tf.Graph().as_default():
    images_placeholder = tf.placeholder("float", shape=(None, IMAGE_PIXELS))
    labels_placeholder = tf.placeholder("float", shape=(None, NUM_CLASSES))
    keep_prob = tf.placeholder("float")
    logits = inference(images_placeholder, keep_prob)
    loss_value = loss(logits, labels_placeholder)
    train_op = training(loss_value, FLAGS.learning_rate)
    acc = accuracy(logits, labels_placeholder)

    saver = tf.train.Saver()

    sess = tf.Session()
    sess.run(tf.global_variables_initializer())

    summary_op = tf.summary.merge_all()
    summary_writer = tf.summary.FileWriter(FLAGS.train_dir, sess.graph)

    for step in range(FLAGS.max_steps):
      for i in range(int(len(train_image)/FLAGS.batch_size)):
        batch = FLAGS.batch_size*i

        sess.run(train_op, feed_dict={
          images_placeholder: train_image[batch:batch+FLAGS.batch_size],
          labels_placeholder: train_label[batch:batch+FLAGS.batch_size],
          keep_prob: 0.5})

      train_accuracy = sess.run(acc, feed_dict={
        images_placeholder: train_image,
        labels_placeholder: train_label,
        keep_prob: 1.0})
      print("step %d, training accuracy %g" % (step, train_accuracy))

      summary_str = sess.run(summary_op, feed_dict={
        images_placeholder: train_image,
        labels_placeholder: train_label,
        keep_prob: 1.0})
      summary_writer.add_summary(summary_str, step)

  print("test accuracy %g" % sess.run(acc, feed_dict={
    images_placeholder: test_image,
    labels_placeholder: test_label,
    keep_prob: 1.0}))

  save_path = saver.save(sess, '/usr/local/tensorflow/sample/tf-she-image/ckpt/model.ckpt')

（5）训练数据

accuracy越接近1表示精度越高。

引用

(tensorflow) [root@localhost tf-she-image]# python train.py
step 0, training accuracy 0.479452
step 1, training accuracy 0.479452
step 2, training accuracy 0.480822
step 3, training accuracy 0.505479
step 4, training accuracy 0.531507
step 5, training accuracy 0.609589
step 6, training accuracy 0.630137
step 7, training accuracy 0.639726
step 8, training accuracy 0.732877
step 9, training accuracy 0.713699
。。。。。。
step 89, training accuracy 0.994521
step 90, training accuracy 0.994521
step 91, training accuracy 0.994521
step 92, training accuracy 0.994521
step 93, training accuracy 0.994521
step 94, training accuracy 0.994521
step 95, training accuracy 0.994521
step 96, training accuracy 0.994521
step 97, training accuracy 0.994521
step 98, training accuracy 0.994521
step 99, training accuracy 0.994521
test accuracy 0.994521

执行完成后，/usr/local/tensorflow/sample/tf-she-image/ckpt/ 里会生成以下文件：

引用

model.ckpt.index
model.ckpt.meta
model.ckpt.data-00000-of-00001
checkpoint

（6）查看训练结果

(tensorflow) [root@localhost tf-she-image]# tensorboard --logdir=/usr/local/tensorflow/sample/tf-she-image/data

（7）测试确认

准备四张图像来测试一下是否能够正确识别：

test-ella-01.jpg

test-ella-02.jpg

test-selina-01.jpg

test-selina-02.jpg

确认结果：

(tensorflow) [root@localhost tf-she-image]# python eval.py
/usr/local/tensorflow/sample/tf-she-image/eval_images/test-ella-01.jpg
[{'name': 'ella', 'rate': 85.299999999999997, 'label': 0}, {'name': 'selina', 'rate': 14.699999999999999, 'label': 1}]
/usr/local/tensorflow/sample/tf-she-image/eval_images/test-ella-02.jpg
[{'name': 'ella', 'rate': 99.799999999999997, 'label': 0}, {'name': 'selina', 'rate': 0.20000000000000001, 'label': 1}]
/usr/local/tensorflow/sample/tf-she-image/eval_images/test-selina-01.jpg
[{'name': 'selina', 'rate': 100.0, 'label': 1}, {'name': 'ella', 'rate': 0.0, 'label': 0}]
/usr/local/tensorflow/sample/tf-she-image/eval_images/test-selina-02.jpg
[{'name': 'selina', 'rate': 99.900000000000006, 'label': 1}, {'name': 'ella', 'rate': 0.10000000000000001, 'label': 0}]

可以看到识别率分别是：85.2%、99.7%、100%、99.9%。识别的还不错！

eval.py

import sys
import numpy as np
import cv2
import tensorflow as tf
import os
import random
import train

cascade_path = '/usr/share/OpenCV/haarcascades/haarcascade_frontalface_default.xml'
faceCascade = cv2.CascadeClassifier(cascade_path)

HUMAN_NAMES = {
  0: u"ella",
  1: u"selina"
}

def evaluation(img_path, ckpt_path):
  tf.reset_default_graph()

  f = open(img_path, 'r')
  img = cv2.imread(img_path, cv2.IMREAD_COLOR)

  gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
  face = faceCascade.detectMultiScale(gray, 1.1, 3)

  if len(face) > 0:
    for rect in face:
      random_str = str(random.random())

      cv2.rectangle(img, tuple(rect[0:2]), tuple(rect[0:2]+rect[2:4]), (0, 0, 255), thickness=2)

      face_detect_img_path = '/usr/local/tensorflow/sample/tf-she-image/eval_images/' + random_str + '.jpg'

      cv2.imwrite(face_detect_img_path, img)
      x = rect[0]
      y = rect[1]
      w = rect[2]
      h = rect[3]

      cv2.imwrite('/usr/local/tensorflow/sample/tf-she-image/eval_images/' + random_str + '.jpg', img[y:y+h, x:x+w])

      target_image_path = '/usr/local/tensorflow/sample/tf-she-image/eval_images/' + random_str + '.jpg'
  else:
    print('image:No Face')
    return
  f.close()
  f = open(target_image_path, 'r')

  image = []
  img = cv2.imread(target_image_path)
  img = cv2.resize(img, (28, 28))

  image.append(img.flatten().astype(np.float32)/255.0)
  image = np.asarray(image)

  logits = train.inference(image, 1.0)

  sess = tf.InteractiveSession()

  saver = tf.train.Saver()

  sess.run(tf.global_variables_initializer())

  if ckpt_path:
    saver.restore(sess, ckpt_path)

  softmax = logits.eval()

  result = softmax[0]

  rates = [round(n * 100.0, 1) for n in result]
  humans = []

  for index, rate in enumerate(rates):
    name = HUMAN_NAMES[index]
    humans.append({
      'label': index,
      'name': name,
      'rate': rate
    })

  rank = sorted(humans, key=lambda x: x['rate'], reverse=True)

  print(img_path)
  print(rank)

  return [rank, os.path.basename(img_path), random_str + '.jpg']

if __name__ == '__main__':
  TEST_IMAGE_PATHS = [ 'test-ella-01.jpg', 'test-ella-02.jpg', 'test-selina-01.jpg', 'test-selina-02.jpg' ]
  for image_path in TEST_IMAGE_PATHS:
    evaluation('/usr/local/tensorflow/sample/tf-she-image/eval_images/'+image_path, '/usr/local/tensorflow/sample/tf-she-image/ckpt/model.ckpt')

参考：
http://qiita.com/neriai/items/bd7bc36ec42c8ef65b2e