TextCNN文本分类详解--使用TensorFlow一步步带你实现简单TextCNN
-
前言
近期在项目组工作中,使用TextCNN对文本分类取得了不错的准确率,为了更清晰地了解TextCNN的结构,特翻译TensorFlow实现的TextCNN一文。
一、什么是TextCNN
TextCNN 是利用卷积神经网络对文本进行分类的算法,由 Yoon Kim 在 《Convolutional Neural Networks for Sentence Classification》 中提出.
1. Model
<center>图1 TextCNN结构图</center>第一层将单词嵌入到低维矢量中。下一层使用多个过滤器大小对嵌入的单词向量执行卷积。例如,一次滑动3,4或5个单词。接下来,将卷积层的结果最大池化为一个长特征向量,添加dropout正则,并使用softmax对结果进行分类。
二、一步一步带你实现简单的TextCNN
代码使用dennybritz实现的[Github]cnn-text-classification-tf,下面将对其代码内容进行详解
1. TextCNN类
TextCNN类的初始化方法如下:
class TextCNN(object): def __init__(self,sequence_length, num_classes, vocab_size, embedding_size, filter_sizes, num_filters,l2_reg_lambda=0.0):
各参数的介绍:
参数名 意义 例 sequence_length 句子长度--输入句子的最大长度 20
num_classes 分类数 2400
vocab_size 单词数 3000
embedding_size 单词嵌入维度 128
filter_sizes 卷积过滤器的大小 [3, 4, 5]
-- 使用大小分别为3,4,5的过滤器,每个过滤器有一个与之对应的num_filters
,本例共有3*[num_filters
]个过滤器num_filters 每个过滤器大小的过滤器数量 l2_reg_lambda l2正则权值 0.0 1.1 Input Placeholders
# Placeholders for input, output and dropout self.input_x = tf.placeholder(tf.int32, [None, sequence_length], name="input_x") self.input_y = tf.placeholder(tf.float32, [None, num_classes], name="input_y") self.dropout_keep_prob = tf.placeholder(tf.float32, name="dropout_keep_prob")
这里,我们创建了placeholder变量作为训练的输入和测试的输入,placeholder的第二项变量中第一维为batch_size,
None
意味着该维度可为任意值,使用None
将该维度交给网络自由决定。
将神经元保存在dropout层中的概率也作为网络的输入,因为我们在测试时不使用dropout。1.2 Embedding层
Embedding层将单词向量使用更低维向量表示。
with tf.device('/cpu:0'), tf.name_scope("embedding"): W = tf.Variable( tf.random_uniform([vocab_size, embedding_size], -1.0, 1.0), name="W") self.embedded_chars = tf.nn.embedding_lookup(W, self.input_x) self.embedded_chars_expanded = tf.expand_dims(self.embedded_chars, -1)
这里我们使用了一些特殊的特性:
tf.device('/cpu:0')
将Embedding操作交给cpu执行。默认情况下TensorFlow会将该操作交给gpu执行(前提是有gpu),但是当前embedding在gpu中执行会报错。tf.name_scope("embedding")
:本操作将embedding
加入到命名空间(name scope)中。命名空间将所有操作加入到名为embedding
的顶层节点中,因此在使用TensorBoard进行网络可视化时能有一个良好的层次结构。
W是我们在训练中学习的嵌入矩阵。 我们使用随机均匀分布来初始化它。
tf.nn.embedding_lookup
创建实际的嵌入操作。 嵌入操作的结果是形状为[None,sequence_length,embedding_size]
的三维张量。
TensorFlow的卷积转换操作具有对应于批次,宽度,高度和通道的尺寸的4维张量。 我们嵌入的结果不包含通道尺寸,所以我们手动添加,留下一层shape为[None,sequence_length,embedding_size,1]
。1.3 Convolution and Max-Pooling Layers
下面开始构建卷积层,再进行max-pooling。因为每个卷积产生不同形状的张量,因此为他们中的每一个创建一个层,然后合并结果为一个大的特征向量。
pooled_outputs = [] # 池化输出结果 for i, filter_size in enumerate(filter_sizes): # 遍历多个filter_size with tf.name_scope("conv-maxpool-%s" % filter_size): # Convolution Layer filter_shape = [filter_size, embedding_size, 1, num_filters] W = tf.Variable(tf.truncated_normal(filter_shape, stddev=0.1), name="W") b = tf.Variable(tf.constant(0.1, shape=[num_filters]), name="b") conv = tf.nn.conv2d( self.embedded_chars_expanded, W, strides=[1, 1, 1, 1], padding="VALID", name="conv") # Apply nonlinearity h = tf.nn.relu(tf.nn.bias_add(conv, b), name="relu") # Max-pooling over the outputs pooled = tf.nn.max_pool( h, ksize=[1, sequence_length - filter_size + 1, 1, 1], strides=[1, 1, 1, 1], padding='VALID', name="pool") pooled_outputs.append(pooled) # Combine all the pooled features num_filters_total = num_filters * len(filter_sizes) self.h_pool = tf.concat(3, pooled_outputs) self.h_pool_flat = tf.reshape(self.h_pool, [-1, num_filters_total])
这里,W是过滤器矩阵,h是将非线性应用于卷积输出的结果。 每个过滤器在整个嵌入中滑动,但是它涵盖的字数有所不同。
VALID
填充意味着我们在没有填充边缘的情况下将过滤器滑过我们的句子,执行给我们输出形状[1,sequence_length - filter_size + 1,1,1]的窄卷积。 在特定过滤器大小的输出上执行最大值池将留下一张张量的形状[batch_size,1,num_filters]。 这本质上是一个特征向量,其中最后一个维度对应于我们的特征。 一旦我们从每个过滤器大小得到所有的汇总输出张量,我们将它们组合成一个长形特征向量[batch_size,num_filters_total]。 在tf.reshape中使用-1可以告诉TensorFlow在可能的情况下平坦化维度。
1.4 Dropout层
Dropout是使卷积神经网络正则化的最受欢迎的方法,Dropout的想法很简单:Dropout层随机“禁用”神经元的一部分,这可以防止神经元共同适应并迫使他们独立学习有用的特征。神经元中启用的比例是由初始化参数中的
dropout_keep_prob
决定的,训练时我们将它定义为0.5,而在测试时定义为1(禁用Dropout)。# Add dropout with tf.name_scope("dropout"): self.h_drop = tf.nn.dropout(self.h_pool_flat, self.dropout_keep_prob)
1.5 评估和预测
使用从max-pooling中得到的特征向量(带Dropout),我们可以通过矩阵乘法生成预测并选择得分最高的分类,我们使用softmax将原分数转换为归一化概率,但它并不会改变预测结果。
with tf.name_scope("output"): W = tf.Variable(tf.truncated_normal([num_filters_total, num_classes], stddev=0.1), name="W") b = tf.Variable(tf.constant(0.1, shape=[num_classes]), name="b") self.scores = tf.nn.xw_plus_b(self.h_drop, W, b, name="scores") self.predictions = tf.argmax(self.scores, 1, name="predictions")
其中,
tf.nn.xw_plus
是一个实现矩阵乘法的一个封装方法。 1.6 loss和准确率计算
我们可以使用1.5得到的score来定义loss function。分类问题的标准损失方程为交叉熵损失方程。
# Calculate mean cross-entropy loss with tf.name_scope("loss"): losses = tf.nn.softmax_cross_entropy_with_logits(self.scores, self.input_y) self.loss = tf.reduce_mean(losses)
其中,
tf.nn.softmax_cross_entropy_with_logits
是一个对每个分类计算交叉熵损失的封装方法,通过score和正确分类作为参数,我们可以得到每一类的loss,对其求平均值,可以得到平均损失。我们也定义了准确率函数。
# Calculate Accuracy with tf.name_scope("accuracy"): correct_predictions = tf.equal(self.predictions, tf.argmax(self.input_y, 1)) self.accuracy = tf.reduce_mean(tf.cast(correct_predictions, "float"), name="accuracy")
1.7 网络的可视化
到这里,我们已经完成了网络的构建,为了得到网络的可视化图,我们可以使用TensorBoard对网络进行可视化。
<center>图4 网络可视化图</center>1.8 训练
FLAGS = tf.flags.FLAGS with tf.Graph().as_default(): session_conf = tf.ConfigProto( allow_soft_placement=FLAGS.allow_soft_placement, log_device_placement=FLAGS.log_device_placement ) sess = tf.Session(config=session_conf) with sess.as_default():
显式创建graph便于训练结束后释放资源,
1.9 实例化CNN和最小化损失
当我们实例化我们的TextCNN模型时,所有定义的变量和操作将被放置在上面创建的默认图和会话中。
cnn = TextCNN( sequence_length=x_train.shape[1], num_classes=y_train.shape[1], vocab_size=len(vocab_processor.vocabulary) embedding_size=FLAGS.num_filters, filter_sizes = map(int, FLAGS.filter_sizes.split(",")), num_filters = FLAGS.num_filters)
接下来,我们定义如何优化网络的损失函数。 TensorFlow有几个内置优化器。 我们正在使用Adam优化器。
# Define Training procedure global_step = tf.Variable(0,name="global_step",trainable=False) optimizer = tf.train.AdamOptimizer(1e-4) grads_and_vars = optimizer.compute_gradients(cnn.loss) train_op = optimizer.apply_gradients(grads_and_vars,global_step=global_step)
1.10 概览(SUMMARIES)
TensorFlow有一个概述(summaries),可以在训练和评估过程中跟踪和查看各种数值。 例如,您可能希望跟踪您的损失和准确性随时间的变化。您还可以跟踪更复杂的数值,例如图层激活的直方图。 summaries是序列化对象,并使用SummaryWriter写入磁盘。
# Output directory for models and summaries timestamp = str(int(time.time())) out_dir = os.path.abspath(os.path.join(os.path.curdir, "runs", timestamp)) print("Writing to {}\n".format(out_dir)) # Summaries for loss and accuracy loss_summary = tf.scalar_summary("loss", cnn.loss) acc_summary = tf.scalar_summary("accuracy", cnn.accuracy) # Train Summaries train_summary_op = tf.merge_summary([loss_summary, acc_summary]) train_summary_dir = os.path.join(out_dir, "summaries", "train") train_summary_writer = tf.train.SummaryWriter(train_summary_dir, sess.graph_def) # Dev summaries dev_summary_op = tf.merge_summary([loss_summary, acc_summary]) dev_summary_dir = os.path.join(out_dir, "summaries", "dev") dev_summary_writer = tf.train.SummaryWriter(dev_summary_dir, sess.graph_def)
在这里,我们分别跟踪培训和评估的总结。 在我们的情况下,这些数值是相同的,但是您可能只有在训练过程中跟踪的数值(如参数更新值)。 tf.merge_summary是将多个摘要操作合并到可以执行的单个操作中的便利函数。
1.11 CHECKPOINTING
通常使用TensorFlow的另一个功能是checkpointing- 保存模型的参数以便稍后恢复。Checkpoints 可用于在以后的时间继续训练,或使用 early stopping选择最佳参数设置。 使用Saver对象创建 Checkpoints。
# Checkpointing checkpoint_dir = os.path.abspath(os.path.join(out_dir, "checkpoints")) checkpoint_prefix = os.path.join(checkpoint_dir, "model") # Tensorflow assumes this directory already exists so we need to create it if not os.path.exists(checkpoint_dir): os.makedirs(checkpoint_dir) saver = tf.train.Saver(tf.all_variables())
1.12 初始化变量
在训练模型前,我们需要初始化变量。
# Initialize all variables sess.run(tf.global_variables_initializer())
1.13 定义单步训练函数
现在我们来定义一个训练步骤的函数,评估一批数据上的模型并更新模型参数。
def train_step(x_batch,y_batch): """ A single training step """ feed_dict = { cnn.input_x:x_batch, cnn.input_y:y_batch, cnn.dropout_keep_prob:FLAGS.dropout_keep_prob } _,step,summaries,loss,accuracy = sess.run( [train_op,global_step,train_summary_op,cnn.loss,cnn.accuracy],feed_dict ) time_str = datetime.datetime.now().isoformat() print("{}:step{},loss{:g},acc{:g}".format(time_str,step,loss,accuracy)) train_summary_writer.add_summary(summaries,step)
def dev_step(x_batch, y_batch, writer=None): """ Evaluates model on a dev set """ feed_dict = { cnn.input_x: x_batch, cnn.input_y: y_batch, cnn.dropout_keep_prob: 1.0 } step, summaries, loss, accuracy = sess.run( [global_step, dev_summary_op, cnn.loss, cnn.accuracy], feed_dict) time_str = datetime.datetime.now().isoformat() print("{}: step {}, loss {:g}, acc {:g}".format(time_str, step, loss, accuracy)) if writer: writer.add_summary(summaries, step)
1.14 TRAINING LOOP
下面写训练循环。
# Generate batches batches = data_helpers.batch_iter( zip(x_train, y_train), FLAGS.batch_size, FLAGS.num_epochs) # Training loop. For each batch... for batch in batches: x_batch, y_batch = zip(*batch) train_step(x_batch, y_batch) current_step = tf.train.global_step(sess, global_step) if current_step % FLAGS.evaluate_every == 0: print("\nEvaluation:") dev_step(x_dev, y_dev, writer=dev_summary_writer) print("") if current_step % FLAGS.checkpoint_every == 0: path = saver.save(sess, checkpoint_prefix, global_step=current_step) print("Saved model checkpoint to {}\n".format(path))
2. 模型可优化内容
- 使用word2vec词向量初始化embedding层,为了达到提升,你需要使用300维以上的词向量。
- 限制最后一层权重向量的L2范数,就像原始文献一样。你可以通过定义一个新的操作,在每次训练步骤之后更新权重值。
- 将L2正则化添加到网络以防止过拟合,同时也提高dropout比率。(Github上的代码已经包括L2正则化,但默认情况下禁用)
- 添加权重更新和图层操作的直方图summaries,并在TensorBoard中进行可视化。
后续内容待更
参考文献
-
2018.5.5更新:使用word2vec作为输入进行训练
使用Word2Vec
使用word2vec训练其实早已写完代码,一直没有整理上来。先将修改后的代码放到这里,以后有时间再介绍修改过程。
- text_cnn.py
import tensorflow as tf import numpy as np class TextCNN(object): def __init__( self, sequence_length, num_classes, vocab_size, embedding_size, filter_sizes, num_filters, l2_reg_lambda=0.0): # Placeholders for input, output and dropout self.input_x = tf.placeholder(tf.int32, [None, sequence_length], name="input_x") self.input_y = tf.placeholder(tf.float32, [None, num_classes], name="input_y") self.dropout_keep_prob = tf.placeholder(tf.float32, name="dropout_keep_prob") self.learning_rate = tf.placeholder(tf.float32, name='learning_rate') # Keeping track of l2 regularization loss (optional) l2_loss = tf.constant(0.0) # Embedding layer with tf.device('/cpu:0'), tf.name_scope("embedding"): self.W = tf.Variable( tf.random_uniform([vocab_size, embedding_size], -1.0, 1.0), name="W") self.embedded_chars = tf.nn.embedding_lookup(self.W, self.input_x) self.embedded_chars_expanded = tf.expand_dims(self.embedded_chars, -1) # Create a convolution + maxpool layer for each filter size pooled_outputs = [] for i, filter_size in enumerate(filter_sizes): with tf.name_scope("conv-maxpool-%s" % filter_size): # Convolution Layer filter_shape = [filter_size, embedding_size, 1, num_filters] W = tf.Variable(tf.truncated_normal(filter_shape, stddev=0.1), name="W") b = tf.Variable(tf.constant(0.1, shape=[num_filters]), name="b") conv = tf.nn.conv2d( self.embedded_chars_expanded, W, strides=[1, 1, 1, 1], padding="VALID", name="conv") # Apply nonlinearity h = tf.nn.relu(tf.nn.bias_add(conv, b), name="relu") # Maxpooling over the outputs pooled = tf.nn.max_pool( h, ksize=[1, sequence_length - filter_size + 1, 1, 1], strides=[1, 1, 1, 1], padding='VALID', name="pool") pooled_outputs.append(pooled) # Combine all the pooled features num_filters_total = num_filters * len(filter_sizes) self.h_pool = tf.concat(pooled_outputs, 3) self.h_pool_flat = tf.reshape(self.h_pool, [-1, num_filters_total]) # Add dropout with tf.name_scope("dropout"): self.h_drop = tf.nn.dropout(self.h_pool_flat, self.dropout_keep_prob) # Final (unnormalized) scores and predictions with tf.name_scope("output"): W = tf.get_variable( "W", shape=[num_filters_total, num_classes], initializer=tf.contrib.layers.xavier_initializer()) b = tf.Variable(tf.constant(0.1, shape=[num_classes]), name="b") l2_loss += tf.nn.l2_loss(W) l2_loss += tf.nn.l2_loss(b) self.scores = tf.nn.xw_plus_b(self.h_drop, W, b, name="scores") self.predictions = tf.argmax(self.scores, 1, name="predictions") # CalculateMean cross-entropy loss with tf.name_scope("loss"): losses = tf.nn.softmax_cross_entropy_with_logits(logits=self.scores, labels=self.input_y) self.loss = tf.reduce_mean(losses) + l2_reg_lambda * l2_loss # Accuracy with tf.name_scope("accuracy"): correct_predictions = tf.equal(self.predictions, tf.argmax(self.input_y, 1)) self.accuracy = tf.reduce_mean(tf.cast(correct_predictions, "float"), name="accuracy")
- data_helpers.py
import numpy as np import re import jieba import itertools from collections import Counter from tensorflow.contrib import learn from gensim.models import Word2Vec def clean_str(string): """ Tokenization/string cleaning for all datasets except for SST. Original taken from https://github.com/yoonkim/CNN_sentence/blob/master/process_data.py """ string = re.sub(r"[^A-Za-z0-9(),!?\'\`]", " ", string) string = re.sub(r"\'s", " \'s", string) string = re.sub(r"\'ve", " \'ve", string) string = re.sub(r"n\'t", " n\'t", string) string = re.sub(r"\'re", " \'re", string) string = re.sub(r"\'d", " \'d", string) string = re.sub(r"\'ll", " \'ll", string) string = re.sub(r",", " , ", string) string = re.sub(r"!", " ! ", string) string = re.sub(r"
\(", " \( ", string) string = re.sub(r"\)", " \) ", string) string = re.sub(r"\?", " \? ", string) string = re.sub(r"\s{2,}", " ", string) return string.strip().lower() def load_data_and_labels(train_file_path, test_file_path): """ 获取训练与测试数据 :param train_file_path: :param test_file_path: :return: 分词后的训练句子列表, 训练分类列表, 分词后的测试句子列表, 测试分类列表 """ # Load data from files x_train = list() # 训练数据 y_train_data = list() # y训练分类数据 x_test = list() # 测试数据 y_test_data = list() # y测试分类数据 y_labels = list() # 分类集 # 读取训练数据 with open(train_file_path, 'r', encoding='utf-8') as train_file: for line in train_file.read().split('\n'): sp = line.split('||') if len(sp) != 2: continue x_train.append(' '.join(jieba.cut(sp[0]))) y_train_data.append(sp[1]) # 读取测试数据 with open(test_file_path, 'r', encoding='utf-8') as test_file: for line in test_file.read().split('\n'): sp = line.split('||') if len(sp) != 2: continue x_test.append(' '.join(jieba.cut(sp[0]))) y_test_data.append(sp[1]) # 构建分类列表 for item in y_train_data: if item not in y_labels: y_labels.append(item) labels_len = len(y_labels) print('分类数为: ', labels_len) # 构建训练y y_train = np.zeros((len(y_train_data), labels_len), dtype=np.int) for index in range(len(y_train_data)): y_train[index][y_labels.index(y_train_data[index])] = 1 # 构建测试y y_test = np.zeros((len(y_test_data), labels_len), dtype=np.int) for index in range(len(y_test_data)): y_test[index][y_labels.index(y_test_data[index])] = 1 return [x_train, y_train, x_test, y_test, y_labels] def load_train_dev_data(train_file_path, test_file_path): x_train_text, y_train, x_test_text, y_test, _ = load_data_and_labels(train_file_path, test_file_path) # Load data print("Loading data...") # Build vocabulary max_train_document_length = max([len(x.split(" ")) for x in x_train_text]) max_test_document_length = max([len(x.split(" ")) for x in x_test_text]) max_document_length = max_test_document_length \ if max_test_document_length > max_train_document_length \ else max_train_document_length # 使用VocabularyProcessor处理输入 vocab_processor = learn.preprocessing.VocabularyProcessor(max_document_length) x_train = np.array(list(vocab_processor.fit_transform(x_train_text))) x_test = np.array(list(vocab_processor.fit_transform(x_test_text))) # Randomly shuffle data -- 随机搅乱数据 np.random.seed(10) shuffle_indices = np.random.permutation(np.arange(len(y_train))) x_train = x_train[shuffle_indices] y_train = y_train[shuffle_indices] print("Vocabulary Size: {:d}".format(len(vocab_processor.vocabulary_))) print("Train/Dev split: {:d}/{:d}".format(len(y_train), len(y_test))) return x_train, y_train, x_test, y_test, vocab_processor def load_embedding_vectors_word2vec(vocabulary, filename, binary): word2vec_model = Word2Vec.load(filename) embedding_vectors = np.random.uniform(-0.25, 0.25, (len(vocabulary), 200)) for word in word2vec_model.wv.vocab: idx = vocabulary.get(word) if idx != 0: embedding_vectors[idx] = word2vec_model[word] return embedding_vectors def batch_iter(data, batch_size, num_epochs, shuffle=True): """ Generates a batch iterator for a dataset. """ data = np.array(data) data_size = len(data) num_batches_per_epoch = int((len(data)-1)/batch_size) + 1 for epoch in range(num_epochs): # Shuffle the data at each epoch if shuffle: shuffle_indices = np.random.permutation(np.arange(data_size)) shuffled_data = data[shuffle_indices] else: shuffled_data = data for batch_num in range(num_batches_per_epoch): start_index = batch_num * batch_size end_index = min((batch_num + 1) * batch_size, data_size) yield shuffled_data[start_index:end_index] - train.py
#! /usr/bin/env python import tensorflow as tf import numpy as np import os import time import datetime import data_helpers from text_cnn import TextCNN from tensorflow.contrib import learn import yaml # Parameters # ================================================== # Data loading params tf.app.flags.DEFINE_float("dev_sample_percentage", .1, "Percentage of the training data to use for validation") tf.app.flags.DEFINE_string("train_file", "../data/train_data.txt", "Train file source.") tf.app.flags.DEFINE_string("test_file", "../data/test_data.txt", "Test file source.") # Model Hyperparameters tf.app.flags.DEFINE_integer("embedding_dim", 128, "Dimensionality of character embedding (default: 128)") tf.app.flags.DEFINE_string("filter_sizes", "3,4,5", "Comma-separated filter sizes (default: '3,4,5')") tf.app.flags.DEFINE_integer("num_filters", 128, "Number of filters per filter size (default: 128)") tf.app.flags.DEFINE_float("dropout_keep_prob", 0.5, "Dropout keep probability (default: 0.5)") tf.app.flags.DEFINE_float("l2_reg_lambda", 0.0, "L2 regularization lambda (default: 0.0)") # Training parameters tf.app.flags.DEFINE_integer("batch_size", 128, "Batch Size (default: 64)") tf.app.flags.DEFINE_integer("num_epochs", 100, "Number of training epochs (default: 200)") tf.app.flags.DEFINE_integer("evaluate_every", 100, "Evaluate model on dev set after this many steps (default: 100)") tf.app.flags.DEFINE_integer("checkpoint_every", 1000, "Save model after this many steps (default: 100)") tf.app.flags.DEFINE_integer("num_checkpoints", 5, "Number of checkpoints to store (default: 5)") # Misc Parameters tf.app.flags.DEFINE_boolean("allow_soft_placement", True, "Allow device soft device placement") tf.app.flags.DEFINE_boolean("log_device_placement", False, "Log placement of ops on devices") FLAGS = tf.app.flags.FLAGS print("\nParameters:") for attr, value in sorted(FLAGS.__flags.items()): print("{}={}".format(attr.upper(), value)) print("") # Data Preparation # ================================================== # Load data with open("config.yml", 'r') as ymlfile: cfg = yaml.load(ymlfile) print("Loading data...") x_train, y_train, x_test, y_test, vocab_processor = data_helpers.load_train_dev_data(FLAGS.train_file, FLAGS.test_file) embedding_name = cfg['word_embeddings']['default'] embedding_dimension = cfg['word_embeddings'][embedding_name]['dimension'] # Training # ================================================== with tf.Graph().as_default(): session_conf = tf.ConfigProto( allow_soft_placement=FLAGS.allow_soft_placement, log_device_placement=FLAGS.log_device_placement) sess = tf.Session(config=session_conf) with sess.as_default(): cnn = TextCNN( sequence_length=x_train.shape[1], num_classes=y_train.shape[1], vocab_size=len(vocab_processor.vocabulary_), embedding_size=embedding_dimension, filter_sizes=list(map(int, FLAGS.filter_sizes.split(","))), num_filters=FLAGS.num_filters, l2_reg_lambda=FLAGS.l2_reg_lambda) cnn.learning_rate = 0.01 # Define Training procedure global_step = tf.Variable(0, name="global_step", trainable=False) optimizer = tf.train.AdamOptimizer(cnn.learning_rate) grads_and_vars = optimizer.compute_gradients(cnn.loss) train_op = optimizer.apply_gradients(grads_and_vars, global_step=global_step) # Keep track of gradient values and sparsity (optional) grad_summaries = [] for g, v in grads_and_vars: if g is not None: grad_hist_summary = tf.summary.histogram("{}/grad/hist".format(v.name), g) sparsity_summary = tf.summary.scalar("{}/grad/sparsity".format(v.name), tf.nn.zero_fraction(g)) grad_summaries.append(grad_hist_summary) grad_summaries.append(sparsity_summary) grad_summaries_merged = tf.summary.merge(grad_summaries) # Output directory for models and summaries timestamp = str(int(time.time())) out_dir = os.path.abspath(os.path.join(os.path.curdir, "runs", timestamp)) print("Writing to {}\n".format(out_dir)) # Summaries for loss and accuracy loss_summary = tf.summary.scalar("loss", cnn.loss) acc_summary = tf.summary.scalar("accuracy", cnn.accuracy) # Train Summaries train_summary_op = tf.summary.merge([loss_summary, acc_summary, grad_summaries_merged]) train_summary_dir = os.path.join(out_dir, "summaries", "train") train_summary_writer = tf.summary.FileWriter(train_summary_dir, sess.graph) # Dev summaries dev_summary_op = tf.summary.merge([loss_summary, acc_summary]) dev_summary_dir = os.path.join(out_dir, "summaries", "dev") dev_summary_writer = tf.summary.FileWriter(dev_summary_dir, sess.graph) # Checkpoint directory. Tensorflow assumes this directory already exists so we need to create it checkpoint_dir = os.path.abspath(os.path.join(out_dir, "checkpoints")) checkpoint_prefix = os.path.join(checkpoint_dir, "model") if not os.path.exists(checkpoint_dir): os.makedirs(checkpoint_dir) saver = tf.train.Saver(tf.global_variables(), max_to_keep=FLAGS.num_checkpoints) # Write vocabulary vocab_processor.save(os.path.join(out_dir, "vocab")) # Initialize all variables sess.run(tf.global_variables_initializer()) vocabulary = vocab_processor.vocabulary_ initW = data_helpers.load_embedding_vectors_word2vec(vocabulary, cfg['word_embeddings']['word2vec']['path'], cfg['word_embeddings']['word2vec']['binary']) print(initW.shape) sess.run(cnn.W.assign(initW)) def train_step(x_batch, y_batch): """ A single training step """ feed_dict = { cnn.input_x: x_batch, cnn.input_y: y_batch, cnn.dropout_keep_prob: FLAGS.dropout_keep_prob } _, step, summaries, loss, accuracy = sess.run( [train_op, global_step, train_summary_op, cnn.loss, cnn.accuracy], feed_dict) time_str = datetime.datetime.now().isoformat() print("{}: step {}, loss {:g}, acc {:g}".format(time_str, step, loss, accuracy)) train_summary_writer.add_summary(summaries, step) def dev_step(x_batch, y_batch, writer=None): """ Evaluates model on a dev set """ feed_dict = { cnn.input_x: x_batch, cnn.input_y: y_batch, cnn.dropout_keep_prob: 1.0 } step, summaries, loss, accuracy = sess.run( [global_step, dev_summary_op, cnn.loss, cnn.accuracy], feed_dict) time_str = datetime.datetime.now().isoformat() print("{}: step {}, loss {:g}, acc {:g}".format(time_str, step, loss, accuracy)) if writer: writer.add_summary(summaries, step) if step % FLAGS.batch_size == 0: print('epoch ', step % FLAGS.batch_size) # Generate batches batches = data_helpers.batch_iter( list(zip(x_train, y_train)), FLAGS.batch_size, FLAGS.num_epochs) # Training loop. For each batch... for batch in batches: x_batch, y_batch = zip(*batch) train_step(x_batch, y_batch) current_step = tf.train.global_step(sess, global_step) if current_step % FLAGS.evaluate_every == 0: print("\nEvaluation:") dev_step(x_test, y_test, writer=dev_summary_writer) print("") if current_step % FLAGS.checkpoint_every == 0: path = saver.save(sess, checkpoint_prefix, global_step=current_step) print("Saved model checkpoint to {}\n".format(path))
-
占楼待更。
-
大力顶楼主
-
666666666666666 待会来学习
-
学了CS224n之后才真正透彻理解TextCNN,果然还是要一点一点把基础打好,现在再看就能更好的理解作者为什么要这么做,为什么这样效果就会好。
-
您好,请问一下你这个用word2Vec的数据也是github上的那个数据集吗?还是自己的
-
可以是自己训练的,也可以是别人训练好的,根据具体情况决定
-
@hunto 你这个就是用的是github上的那个英文数据吗?
-
不是。。。。。
-
@hunto 您那个w2vec的data_helper的代码没写完,能给我发一下完整的吗
-
582回来逛逛