AI学习笔记

目录

一.续opencv——级联分类器

二.c语言实现进化算法

三.LeNet 网络实现 MNIST
手写数字识别

四.the simplest neural network
model

五.Multi-Layered Perceptron

1.Gradient Descent
Optimization(梯度下降算法)

[2.Multi-Layered Perceptrons and
Backpropagation(多层感知器和反向传播)](about:blank#2.Multi-
Layered%20Perceptrons%20and%20Backpropagation%EF%BC%88%E5%A4%9A%E5%B1%82%E6%84%9F%E7%9F%A5%E5%99%A8%E5%92%8C%E5%8F%8D%E5%90%91%E4%BC%A0%E6%92%AD%EF%BC%89)

3.单层感知器模型

<1>创建数据集,X为特征向量,Y为标签:

<2> 前向传播计算过程:

<3>使用softmax函数转换为概率:

<4>交叉熵损失函数

<5>Loss Minimization Problem and Network
Training:

<6>函数小结

<7>Training the Model

4.网络模型

<1>定义网络类:

<2>Multi-Layered Models

5.代码整合

6.3-layer network 实现 mnist
手写数字识别

六.Neural Network
Frameworks

1.Keras

[<1>Training One-Layer Network
(Perceptron)](about:blank#%3C1%3ETraining%20One-
Layer%20Network%20%28Perceptron%29)

①模型定义

②模型编译(指定损失函数、优化方法【梯度下降等】、精度)

③训练

[<2>Multi-Class Classificatio(多分类问题)](about:blank#%3C2%3EMulti-
Class%20Classificatio%EF%BC%88%E5%A4%9A%E5%88%86%E7%B1%BB%E9%97%AE%E9%A2%98%EF%BC%89)

[<3>Multi-Label Classification(多标签分类)](about:blank#%3C3%3EMulti-
Label%20Classification%EF%BC%88%E5%A4%9A%E6%A0%87%E7%AD%BE%E5%88%86%E7%B1%BB%EF%BC%89)

<4>总结 Summary of Classification Loss
Functions


参考资料:microsoft/AI-For-Beginners: 12 Weeks, 24 Lessons, AI for All!
(github.com)

一.续opencv——级联分类器

OpenCV学习笔记——《基于OpenCV的数字图像处理》_opencv 数字图像处理-
CSDN博客


​ #include “opencv.hpp”
​ #include “highgui.hpp”
​ #include “imgproc.hpp”
​ #include
​ using namespace cv;
​ using namespace std;
​ #pragma comment(lib,”opencv_world480d.lib”)

​ VideoCapture capture(0);
​ Mat image;
​ CascadeClassifier face_cascade;
​ // 人脸检测
​ vector faces;

int main()
{
Mat frame_gray;
face_cascade.load(“OPENCV安装路径/opencv/sources/data/haarcascades/haarcascade_frontalface_alt.xml”);
while (capture.isOpened())
{
capture >> image;
if (image.empty())break;

        if (waitKey(1) == 27)break;

        // BGR2GRAY
        cvtColor(image, frame_gray, COLOR_BGR2GRAY);

        face_cascade.detectMultiScale(frame_gray, faces);

        for (size_t i = 0; i < faces.size(); i++)
        {
            // 人脸画框
            rectangle(image, faces[i], Scalar(255, 0, 0), 1, 8);
        }

        imshow("Face detection",image);

    }
}

二.c语言实现进化算法

c语言实现进化算法——人工智能导论<1>-CSDN博客

三.LeNet 网络实现 MNIST 手写数字识别

西电计科大三上计算机视觉作业

四.the simplest neural network model

one-layered perceptron, a linear two-class classification model.(单层线性感知机)

Perceptron Model:

    假设我们的模型中有N个特征,在这种情况下,输入向量将是一个大小为N的向量。感知器是一个二元分类模型,即它可以区分两类输入数据。我们将假设对于每个输入向量x,感知器的输出将是+1或-1,这取决于类别。输出将使用以下公式计算:

y(x) = f(wTx)

Training the Perceptron:

    为了训练感知器,我们需要找到一个权重向量w,它能正确地分类大多数值,即产生最小的误差。该误差由感知器准则定义如下:

E(w) = -∑wTxiti

对那些导致错误分类的训练数据点I求和,xi是输入数据,对于负例和正例,ti分别为-1或+1。

    这个标准被认为是权重w的函数,我们需要最小化它。通常,我们使用一种称为梯度下降的方法,在这种方法中,我们从一些初始权重w(0)开始,然后在每一步中根据公式更新权重:

w(t+1) = w(t) - η∇E(w)

这里η是所谓的学习率,∇E(w)表示E的梯度,计算出梯度后,我们得到

w(t+1) = w(t) + ∑ηxiti


​ //perceptron.h
​ #ifndef _PERCEPTRON_H
​ #define _PERCEPTRON_H
​ //the simplest neural network model - one-layered perceptron, a linear two-class classification model.
​ #include<stdio.h>
​ #include<time.h>

​ #define FREATURE_NUM 2 //特征数(输入向量维数)
​ #define LEARNING_RATE 1 //学习率

typedef struct input_data{
double freature[FREATURE_NUM];
int label;
}input_data;
typedef struct input_dataset{
input_data* input;
int set_num;
}input_dataset;

double weight[FREATURE_NUM]={0};

void train(input_dataset dataset,int iteration);
void perceptron(input_data *input);

#endif


​ //perceptron.c
​ #include”perceptron.h”

​ void train(input_dataset dataset,int iteration)
​ {
​ //生成随机数种子
​ srand((unsigned)time(NULL));

int set_num=dataset.set_num;
int i,j,k;
for(i=0;i<iteration;i++){
k=rand()%set_num;
//梯度下降方法搜寻
for(j=0;j<FREATURE_NUM;j++)
{
weight[j]+=1.0LEARNING_RATEdataset.input[k].freature[j]*dataset.input[k].label;
// printf(“%lf %lf\n”,weight[j],dataset.input[k].freature[j]);
}
}
return;
}

void perceptron(input_data *input){
	int i,temp;
	for(i=0,temp=0;i<FREATURE_NUM;i++)temp+=weight[i]*input->freature[i];
	if(temp>=0)input->label=1;
	else input->label=-1;
	
	printf("label:%d\n",input->label);
	return;
}


​ #include<stdio.h>
​ #include”perceptron.c”

​ int main(){

input_data input[2];
input[0].freature[0]=-3.0;
input[0].freature[1]=1.0;
input[0].label=1;
input[1].freature[0]=-1.0;
input[1].freature[1]=3.0;
input[1].label=1;
input[2].freature[0]=2.0;
input[2].freature[1]=4;
input[2].label=-1;
input[3].freature[0]=4.0;
input[3].freature[1]=-2.0;
input[3].label=-1;

input_dataset dataset;
dataset.input=input;
dataset.set_num=4;

train(dataset,10);

int i;
for(i=0;i<FREATURE_NUM;i++)printf(“%lf\n”,weight[i]);

input_data test;
scanf(“%lf%lf”,&test.freature[0],&test.freature[1]);
perceptron(&test);

return 0;
}

python实现及mnist手写数字识别(两类):[NeuralNetworks/03-Perceptron at
main](https://github.com/microsoft/AI-For-
Beginners/tree/main/lessons/3-NeuralNetworks/03-Perceptron
“NeuralNetworks/03-Perceptron at main”)

(特征:28pix*28pix)

实现N类感知器:训练N个感知器:

  1. Create 10 one-vs-all datasets for all digits
  2. Train 10 perceptrons
  3. Define classify function to perform digit classification
  4. Measure the accuracy of classification and print confusion matrix
  5. [Optional] Create improved classify function that performs the classification using one matrix multiplication.

五.Multi-Layered Perceptron

简介:

we will extend themodel above into a more flexible framework, allowing us to:

  • perform multi-class classification in addition to two-class
  • solve regression problems in addition to classification
  • separate classes that are not linearly separable

We will also develop our own modular framework in Python that will allow us to
construct different neural network architectures.

Suppose we have a training dataset X with labels Y , and we need to
build a **model f **that will make most accurate predictions. The quality
of predictions is measured by Loss function . The following loss
functions are often used:

  • For regression problem(回归问题) , when we need to predict a number, we can use absolute error **∑i|f(x(i))-y(i)| **, or squared error **∑i(f(x(i))-y(i))^2 **
  • For classification(分类问题) , we use 0-1 loss (which is essentially the same as accuracy of the model), or logistic loss.

从p对损失函数L的影响来看逻辑损失函数更好

For one-level perceptron , function f was defined as a **linear
function f(x)=wx+b **(here w is the weight matrix, x is the vector
of input features, and b is bias vector). For different neural network
architectures, this function can take more complex form.

In the case of classification , it is often desirable to get
probabilities of corresponding classes as network output. To
convert arbitrary numbers to probabilities (eg. to normalize the output), we
often use softmax function σ , and the function f becomes
f(x)=σ(wx+b)

In the definition of f above, w and b are called parameters θ=〈
w,b 〉. Given the dataset 〈X ,Y 〉, we can compute an overall error on
the whole dataset as a function of parameters θ.

The goal of neural network training is to minimize the error (Loss
function
) by varying parameters θ

1.Gradient Descent Optimization(梯度下降算法)

This can be formalized as follows:

  • Initialize parameters by some random values w(0), b(0)
  • Repeat the following step many times:
    • w(i+1) = w(i)-η∂ℒ/∂w
    • b(i+1) = b(i)-η∂ℒ/∂b

During training, the optimization steps are supposed to be calculated
considering the whole dataset (remember that loss is calculated as a sum
through all training samples). However, in real life we take small portions of
the dataset called minibatches , and calculate gradients based on a subset
of data. Because subset is taken randomly each time, such method is called
stochastic gradient descent (SGD).

2.Multi-Layered Perceptrons and Backpropagation(多层感知器和反向传播)

一个示例——两层感知器

One-layer network, as we have seen above, is capable of classifying linearly
separable classes. To build a richer model, we can combine several layers of
the network
. Mathematically it would mean that the function f would have
a more complex form, and will be computed in several steps:

  • z1=w1x+b1
  • z2=w2α(z1)+b2
  • f = σ(z2)

Here, α is a non-linear activation function , σ is a softmax
function
, and parameters θ=<_w1,b1,w2,b2_ >.

The gradient descent algorithm would remain the same, but it would be more
difficult to calculate gradients. Given the chain differentiation rule, we can
calculate derivatives as:

  • ∂ℒ/∂w2 = (∂ℒ/∂σ)(∂σ/∂z2)(∂z2/∂w2)
  • ∂ℒ/∂w1 = (∂ℒ/∂σ)(∂σ/∂z2)(∂z2/∂α)(∂α/∂z1)(∂z1/∂w1)

✅ The chain differentiation rule is used to calculate derivatives of the
loss function with respect to parameters.

链式规则、后向传播更新参数θ

Note that the left-most part of all those expressions is the same, and thus we
can** effectively calculate derivatives** **starting from the loss function
and going “backwards” **through the computational graph. Thus the method of
training a multi-layered perceptron is called backpropagation , or
‘backprop’.

即:

3.单层感知器模型

    Two outputs of the network correspond to two classes, and the class with highest value among two outputs corresponds to the right solution.

The model is defined as:

相关依赖:


​ import matplotlib.pyplot as plt
​ from matplotlib import gridspec
​ from sklearn.datasets import make_classification
​ import numpy as np
​ # pick the seed for reproducibility - change it to explore the effects of random variations
​ np.random.seed(0)
​ import random

< 1>创建数据集,X为特征向量,Y为标签:


​ n = 100
​ X, Y = make_classification(n_samples = n, n_features=2,
​ n_redundant=0, n_informative=2, flip_y=0.2)
​ X = X.astype(np.float32)
​ Y = Y.astype(np.int32)

​ # Split into train and test dataset
​ train_x, test_x = np.split(X, [n8//10])
​ train_labels, test_labels = np.split(Y, [n
8//10])


​ #显示数据集
​ print(train_x[:5])
​ print(train_labels[:5])


​ [[-0.836906 -1.382417 ]
​ [ 3.0352616 -1.1195285]
​ [ 1.6688806 2.4989042]
​ [-0.5790065 2.1814067]
​ [-0.8730455 -1.4692409]]
​ [0 1 1 1 0]

**< 2> 前向传播计算过程: **


​ class Linear:
​ #初始化权重
​ def init(self,nin,nout):
​ self.W = np.random.normal(0, 1.0/np.sqrt(nin), (nout, nin))
​ self.b = np.zeros((1,nout))
​ #前向传播计算
​ def forward(self, x):
​ return np.dot(x, self.W.T) + self.b

​ net = Linear(2,2)
​ net.forward(train_x[0:5])


​ #5个输入的输出
​ 0,1.772021,-0.253845
​ 1,0.283708,-0.396106
​ 2,-0.300974,0.305132
​ 3,-0.812048,0.560794
​ 4,-1.235197,0.339497

<3>使用softmax函数转换为概率:


​ class Softmax:
​ def forward(self,z):
​ zmax = z.max(axis=1,keepdims=True)
​ expz = np.exp(z-zmax)
​ Z = expz.sum(axis=1,keepdims=True)
​ return expz / Z

​ softmax = Softmax()
​ softmax.forward(net.forward(train_x[0:10]))


​ In case we have more than 2 classes, softmax will normalize probabilities across all of them.

<4>交叉熵损失函数


​ A loss function in classification is typically a logistic function , which can be generalized as cross-entropy loss. Cross-entropy loss is a function that can calculate similarity between two arbitrary probability distributions.


​ def cross_ent(prediction, ground_truth):
​ t = 1 if ground_truth > 0.5 else 0
​ return -t * np.log(prediction) - (1 - t) * np.log(1 - prediction)
​ plot_cross_ent()


​ Cross-entropy loss will be defined again as a separate layer , but forward function will have two input values: output of the previous layers of the network p, and the expected class y:

应用:


​ class CrossEntropyLoss:
​ def forward(self,p,y):
​ self.p = p
​ self.y = y
​ p_of_y = p[np.arange(len(y)), y]
​ log_prob = np.log(p_of_y)
​ return -log_prob.mean() # average over all input samples

​ cross_ent_loss = CrossEntropyLoss()
​ p = softmax.forward(net.forward(train_x[0:10]))
​ cross_ent_loss.forward(p,train_labels[0:10])

IMPORTANT : Loss function returns a number that shows how good (or bad)
our network performs. It should return us one number for the whole dataset,
or for the part of the dataset (minibatch). Thus after calculating cross-
entropy loss for each individual component of the input vector, we need to
average (or add) all components together - which is done by the call to
.mean().

(注意计算的是交叉熵均值 :return -log_prob.mean() # average over all input samples )

z = net.forward(train_x[0:10])    #输出
p = softmax.forward(z)            #softmax归一化
loss = cross_ent_loss.forward(p,train_labels[0:10])#cross_ent_loss = CrossEntropyLoss()
print(loss)

<5>Loss Minimization Problem and Network Training:

数学描述:

采用梯度下降法进行计算(见2.)

网络训练 包括前向和后向传播两个过程(原理 见2和3<2>)

One pass of the network training consists of two parts:

  • Forward pass , when we calculate the value of loss function for a given input minibatch
  • Backward pass , when we try to minimize this error by distributing it back to the model parameters through the computational graph.

后向传播的具体实现:

注意参数的更新在一个minibatch完全计算完后,而不是单个样本


​ def update(self,lr):
​ self.W -= lrself.dW
​ self.b -= lr
self.db
​ #LR是学习率

<6>函数小结


​ class Linear:
​ def init(self,nin,nout):
​ self.W = np.random.normal(0, 1.0/np.sqrt(nin), (nout, nin))
​ self.b = np.zeros((1,nout))
​ self.dW = np.zeros_like(self.W)
​ self.db = np.zeros_like(self.b)

​ def forward(self, x):
​ self.x=x
​ return np.dot(x, self.W.T) + self.b

def backward(self, dz):
dx = np.dot(dz, self.W)
dW = np.dot(dz.T, self.x)
db = dz.sum(axis=0)
self.dW = dW
self.db = db
return dx

    def update(self,lr):
        self.W -= lr*self.dW
        self.b -= lr*self.db


​ class Softmax:
​ def forward(self,z):
​ self.z = z
​ zmax = z.max(axis=1,keepdims=True)
​ expz = np.exp(z-zmax)
​ Z = expz.sum(axis=1,keepdims=True)
​ return expz / Z
​ def backward(self,dp):
​ p = self.forward(self.z)
​ pdp = p * dp
​ return pdp - p * pdp.sum(axis=1, keepdims=True)

​ class CrossEntropyLoss:
​ def forward(self,p,y):
​ self.p = p
​ self.y = y
​ p_of_y = p[np.arange(len(y)), y]
​ log_prob = np.log(p_of_y)
​ return -log_prob.mean()
​ def backward(self,loss):
​ dlog_softmax = np.zeros_like(self.p)
​ dlog_softmax[np.arange(len(self.y)), self.y] -= 1.0/len(self.y)
​ return dlog_softmax / self.p

<7>Training the Model

    Now we are ready to write the **training loop** , which will go through our dataset, and perform the optimization minibatch by minibatch._One complete pass through the dataset is often called**an epoch** :_


​ lin = Linear(2,2)
​ softmax = Softmax()
​ cross_ent_loss = CrossEntropyLoss()

​ learning_rate = 0.1

pred = np.argmax(lin.forward(train_x),axis=1)
acc = (pred==train_labels).mean()
print(“Initial accuracy: “,acc)

batch_size=4
for i in range(0,len(train_x),batch_size):
    xb = train_x[i:i+batch_size]
    yb = train_labels[i:i+batch_size]
    
    # forward pass
    z = lin.forward(xb)
    p = softmax.forward(z)
    loss = cross_ent_loss.forward(p,yb)
    
    # backward pass
    dp = cross_ent_loss.backward(loss)
    dz = softmax.backward(dp)
    dx = lin.backward(dz)
    lin.update(learning_rate)
    
pred = np.argmax(lin.forward(train_x),axis=1)
acc = (pred==train_labels).mean()
print("Final accuracy: ",acc)



​ Initial accuracy: 0.2625
​ Final accuracy: 0.7875

4.网络模型

<1>定义网络类

    Since in many cases neural network is just **a composition of layers** , we can build a class that will allow us to **stack layers together** and**make forward and backward passes** through them without explicitly programming that logic. We will **store the list of layers inside the`Net` class**, and **use`add()` function to add new layers**:


​ class Net:
​ def init(self):
​ self.layers = []

​ def add(self,l):
​ self.layers.append(l)

def forward(self,x):
for l in self.layers:
x = l.forward(x)
return x

    def backward(self,z):
        for l in self.layers[::-1]:
            z = l.backward(z)
        return z
    
    def update(self,lr):
        for l in self.layers:
            if 'update' in l.__dir__():
                l.update(lr)

定义网络和训练:


​ net = Net()
​ net.add(Linear(2,2))
​ net.add(Softmax())
​ loss = CrossEntropyLoss()

​ def get_loss_acc(x,y,loss=CrossEntropyLoss()):
​ p = net.forward(x)
​ l = loss.forward(p,y)
​ pred = np.argmax(p,axis=1)
​ acc = (pred==y).mean()
​ return l,acc

print(“Initial loss={}, accuracy={}: “.format(*get_loss_acc(train_x,train_labels)))

def train_epoch(net, train_x, train_labels, loss=CrossEntropyLoss(), batch_size=4, lr=0.1):
    for i in range(0,len(train_x),batch_size):
        xb = train_x[i:i+batch_size]
        yb = train_labels[i:i+batch_size]

        p = net.forward(xb)
        l = loss.forward(p,yb)
        dp = loss.backward(l)
        dx = net.backward(dp)
        net.update(lr)
 
train_epoch(net,train_x,train_labels)
        
print("Final loss={}, accuracy={}: ".format(*get_loss_acc(train_x,train_labels)))
print("Test loss={}, accuracy={}: ".format(*get_loss_acc(test_x,test_labels)))


​ Initial loss=0.8977914474068779, accuracy=0.4625:
​ Final loss=0.47908832233966514, accuracy=0.825:
​ Test loss=0.5317198099647931, accuracy=0.8:

<2>Multi-Layered Models


​ Very important thing to note, however, is that in between linear layers we need to have a non-linear activation function , such as tanh. Without such non-linearity, several linear layers would have the same expressive power as just one layers - because composition of linear functions is also linear!

在线性层之间添加激活函数,线性函数的叠加仍是线性。


​ class Tanh:
​ def forward(self,x):
​ y = np.tanh(x)
​ self.y = y
​ return y
​ def backward(self,dy):
​ return (1.0-self.y**2)*dy

​ Adding several layers make sense, because unlike one-layer network, multi-layered model will be able to accuratley classify sets that are not linearly separable. I.e., a model with several layers will be reacher.

It can be demonstrated that with sufficient number of neurons a two-
layered model
is capable to classifying any convex set of data points
, and three-layered network can classify virtually any set.

多层网络的形式见前(2.)

两层网络示例:


​ net = Net()
​ net.add(Linear(2,10))
​ net.add(Tanh())
​ net.add(Linear(10,2))
​ net.add(Softmax())
​ loss = CrossEntropyLoss()

关于线性模型和多层复杂模型的区别和**过拟合(**overfitting) 问题:

A linear model:

  • We are likely to get high training loss - so-called underfitting , when the model does not have enough power to correctly separate all data.
  • Valiadation loss and training loss are more or less the same. The model is likely to generalize well to test data.

Complex multi-layered model

  • Low training loss - the model can approximate training data well, because it has enough expressive power.
  • Validation loss can be much higher than training loss and can start to increase during training - this is because the model “memorizes” training points, and loses the “overall picture”

小结:

Takeaways

  • Simple models (fewer layers, fewer neurons) with low number of parameters (“low capacity”) are less likely to overfit
  • More complex models (more layers, more neurons on each layer, high capacity) are likely to overfit. We need to monitor validation error to make sure it does not start to rise with further training
  • More complex models need more data to train on.
  • You can solve overfitting problem by either:
    • simplifying your model
    • increasing the amount of training data
  • Bias-variance trade-off is a term that shows that you need to get the compromise
    • between power of the model and amount of data,
    • between overfittig and underfitting
  • There is not single recipe on how many layers of parameters you need - the best way is to experiment

5.代码整合


​ ###################################################################
​ # package
​ # matplotlib nbagg
​ import matplotlib.pyplot as plt
​ from matplotlib import gridspec
​ from sklearn.datasets import make_classification
​ import numpy as np
​ # pick the seed for reproducibility - change it to explore the effects of random variations
​ np.random.seed(0)
​ import random


​ ###################################################################
​ # dataset
​ n = 100
​ X, Y = make_classification(n_samples = n, n_features=2,
​ n_redundant=0, n_informative=2, flip_y=0.2)
​ X = X.astype(np.float32)
​ Y = Y.astype(np.int32)

​ # Split into train and test dataset
​ train_x, test_x = np.split(X, [n8//10])
​ train_labels, test_labels = np.split(Y, [n
8//10])


​ ###################################################################
​ # layers
​ class Linear:
​ def init(self,nin,nout):
​ self.W = np.random.normal(0, 1.0/np.sqrt(nin), (nout, nin))
​ self.b = np.zeros((1,nout))
​ self.dW = np.zeros_like(self.W)
​ self.db = np.zeros_like(self.b)

​ def forward(self, x):
​ self.x=x
​ return np.dot(x, self.W.T) + self.b

def backward(self, dz):
dx = np.dot(dz, self.W)
dW = np.dot(dz.T, self.x)
db = dz.sum(axis=0)
self.dW = dW
self.db = db
return dx

    def update(self,lr):
        self.W -= lr*self.dW
        self.b -= lr*self.db

class Tanh:
    def forward(self,x):
        y = np.tanh(x)
        self.y = y
        return y
    def backward(self,dy):
        return (1.0-self.y**2)*dy


​ class Softmax:
​ def forward(self,z):
​ self.z = z
​ zmax = z.max(axis=1,keepdims=True)
​ expz = np.exp(z-zmax)
​ Z = expz.sum(axis=1,keepdims=True)
​ return expz / Z
​ def backward(self,dp):
​ p = self.forward(self.z)
​ pdp = p * dp
​ return pdp - p * pdp.sum(axis=1, keepdims=True)


​ class CrossEntropyLoss:
​ def forward(self,p,y):
​ self.p = p
​ self.y = y
​ p_of_y = p[np.arange(len(y)), y]
​ log_prob = np.log(p_of_y)
​ return -log_prob.mean()
​ def backward(self,loss):
​ dlog_softmax = np.zeros_like(self.p)
​ dlog_softmax[np.arange(len(self.y)), self.y] -= 1.0/len(self.y)
​ return dlog_softmax / self.p


​ ###################################################################
​ # network
​ class Net:
​ def init(self):
​ self.layers = []

​ def add(self,l):
​ self.layers.append(l)

def forward(self,x):
for l in self.layers:
x = l.forward(x)
return x

    def backward(self,z):
        for l in self.layers[::-1]:
            z = l.backward(z)
        return z
    
    def update(self,lr):
        for l in self.layers:
            if 'update' in l.__dir__():
                l.update(lr)

def get_loss_acc(x,y,loss=CrossEntropyLoss()):
    p = net.forward(x)
    l = loss.forward(p,y)
    pred = np.argmax(p,axis=1)
    acc = (pred==y).mean()
    return l,acc

def train_epoch(net, train_x, train_labels, loss=CrossEntropyLoss(), batch_size=4, lr=0.1):
    for i in range(0,len(train_x),batch_size):
        xb = train_x[i:i+batch_size]
        yb = train_labels[i:i+batch_size]

        p = net.forward(xb)
        l = loss.forward(p,yb)
        dp = loss.backward(l)
        dx = net.backward(dp)
        net.update(lr)
        print("epoch={}: ".format(i),end="")
        print("Final loss={}, accuracy={}: ".format(*get_loss_acc(train_x,train_labels)))
        print("Test loss={}, accuracy={}: ".format(*get_loss_acc(test_x,test_labels)))

###################################################################
# main
net = Net()
net.add(Linear(2,10))
net.add(Tanh())
net.add(Linear(10,2))
net.add(Softmax())
train_epoch(net,train_x,train_labels)

6.3-layer network 实现 mnist 手写数字识别

训练模型,保存结果:


​ ###################################################################
​ # packages
​ import matplotlib.pyplot as plt
​ from matplotlib import gridspec
​ from sklearn.datasets import make_classification
​ import numpy as np
​ # pick the seed for reproducibility - change it to explore the effects of random variations
​ np.random.seed(0)
​ import random


​ ###################################################################
​ # dataset
​ n=70000
​ # generate data
​ # X, Y = make_classification(n_samples = n, n_features=2828,n_redundant=0, n_informative=88, flip_y=0.2)
​ # get data from mnist
​ from torchvision import datasets, transforms
​ mnist_train = datasets.MNIST(root=’./data’, train=True, transform=transforms.ToTensor())
​ X = mnist_train.data.numpy()
​ Y = mnist_train.targets.numpy()
​ X = X.reshape(X.shape[0],-1)
​ X = X.astype(np.float32)
​ Y = Y.astype(np.int32)

​ # Split into train and test dataset
​ train_x, test_x = np.split(X, [n8//10]) # 80% training and 20% test
​ train_labels, test_labels = np.split(Y, [n
8//10])


​ ###################################################################
​ # layers
​ class Linear:
​ def init(self,nin,nout):
​ self.W = np.random.normal(0, 1.0/np.sqrt(nin), (nout, nin))
​ self.b = np.zeros((1,nout))
​ self.dW = np.zeros_like(self.W)
​ self.db = np.zeros_like(self.b)

​ def forward(self, x):
​ self.x=x
​ return np.dot(x, self.W.T) + self.b

def backward(self, dz):
dx = np.dot(dz, self.W)
dW = np.dot(dz.T, self.x)
db = dz.sum(axis=0)
self.dW = dW
self.db = db
return dx

    def update(self,lr):
        self.W -= lr*self.dW
        self.b -= lr*self.db

class Tanh:
    def forward(self,x):
        y = np.tanh(x)
        self.y = y
        return y
    def backward(self,dy):
        return (1.0-self.y**2)*dy


​ class Softmax:
​ def forward(self,z):
​ self.z = z
​ zmax = z.max(axis=1,keepdims=True)
​ expz = np.exp(z-zmax)
​ Z = expz.sum(axis=1,keepdims=True)
​ return expz / Z
​ def backward(self,dp):
​ p = self.forward(self.z)
​ pdp = p * dp
​ return pdp - p * pdp.sum(axis=1, keepdims=True)


​ class CrossEntropyLoss:
​ def forward(self,p,y):
​ self.p = p
​ self.y = y
​ p_of_y = p[np.arange(len(y)), y]
​ log_prob = np.log(p_of_y)
​ return -log_prob.mean()
​ def backward(self,loss):
​ dlog_softmax = np.zeros_like(self.p)
​ dlog_softmax[np.arange(len(self.y)), self.y] -= 1.0/len(self.y)
​ return dlog_softmax / self.p


​ ###################################################################
​ # network
​ class Net:
​ def init(self):
​ self.layers = []

​ def add(self,l):
​ self.layers.append(l)

def forward(self,x):
for l in self.layers:
x = l.forward(x)
return x

    def backward(self,z):
        for l in self.layers[::-1]:
            z = l.backward(z)
        return z
    
    def update(self,lr):
        for l in self.layers:
            if 'update' in l.__dir__():
                l.update(lr)

def get_loss_acc(x,y,loss=CrossEntropyLoss()):
    p = net.forward(x)
    l = loss.forward(p,y)
    pred = np.argmax(p,axis=1)
    acc = (pred==y).mean()
    return l,acc

def train_epoch(net, train_x, train_labels, loss=CrossEntropyLoss(), batch_size=4, lr=0.1):
    for i in range(0,len(train_x),batch_size):
        xb = train_x[i:i+batch_size]
        yb = train_labels[i:i+batch_size]

        p = net.forward(xb)
        l = loss.forward(p,yb)
        dp = loss.backward(l)
        dx = net.backward(dp)
        net.update(lr)
        print("epoch={}: ".format(i//batch_size))
        print("Final loss={}, accuracy={}: ".format(*get_loss_acc(train_x,train_labels)))
        print("Test loss={}, accuracy={}: ".format(*get_loss_acc(test_x,test_labels)))

###################################################################
# main
if __name__ == '__main__':
    # model
    net = Net()
    net.add(Linear(28*28,300))
    net.add(Tanh())
    net.add(Linear(300,10))
    net.add(Softmax())
    train_epoch(net,train_x,train_labels,batch_size=1000) 

    #save the model
    import pickle
    with open('model.pkl', 'wb') as f:
        pickle.dump(net, f)    

加载模型,进行测试:


​ import OwnFramework
​ import torchvision
​ import numpy as np
​ import pickle
​ import matplotlib.pyplot as plt
​ import matplotlib.gridspec as gridspec
​ import random

​ # import the model
​ with open(‘model.pkl’, ‘rb’) as f:
​ OwnFramework.net = pickle.load(f)

# test the data from minst
test_data = torchvision.datasets.MNIST(‘./data’, train=False, download=False)
test_x = test_data.data.numpy().reshape(-1,28*28)
test_labels = test_data.targets.numpy()

# test the model
print("Test loss={}, accuracy={}: ".format(*OwnFramework.get_loss_acc(test_x,test_labels)))

# show the images and the predictions
fig=plt.figure(figsize=(8, 8))
gs = gridspec.GridSpec(4, 4)
for i in range(16):
    j=random.randint(0,len(test_x))
    ax = plt.subplot(gs[i])
    ax.imshow(test_x[j].reshape(28,28))
    ax.set_title("Predicted: {}".format(np.argmax(OwnFramework.net.forward(test_x[j:j+1]))))
    ax.axis('off')
plt.show()

# show the images that are not predicted not correctly
fig=plt.figure(figsize=(12, 8))
gs = gridspec.GridSpec(4, 4)
i=0
for j in range(len(test_x)):
    if np.argmax(OwnFramework.net.forward(test_x[j:j+1])) != test_labels[j]:
        ax = plt.subplot(gs[i])
        ax.imshow(test_x[j].reshape(28,28))
        ax.set_title("Predicted: {}, True: {}".format(np.argmax(OwnFramework.net.forward(test_x[j:j+1])),test_labels[j]))
        ax.axis('off')
        i+=1
    if i==16:
        break
plt.show()

六.Neural Network Frameworks

架构API:

to be able to train neural networks efficiently we need to do two things:

  • To operate on tensors , eg. to multiply, add, and compute some functions such as sigmoid or softmax

  • To compute gradients of all expressions, in order to perform gradient descent optimization

    While the **`numpy`  library** can **do the first part** , we need some mechanism to compute gradients. In our framework that we have developed in the previous section we had to manually program all derivative functions inside the `backward` method, which does backpropagation. Ideally, _**a framework should give us the opportunity to compute gradients of _any expression_  that we can define**_.
    
    Another important thing is to be able to **perform computations on GPU** , or any other specialized compute units, such as [TPU](https://en.wikipedia.org/wiki/Tensor_Processing_Unit "TPU"). Deep neural network training requires _a lot_  of computations, and to be able to parallelize those computations on GPUs is very important.
    

底层和高层API:

    Currently, the two **most popular neural frameworks** are:** [TensorFlow](http://tensorflow.org/ "TensorFlow") and [PyTorch](https://pytorch.org/ "PyTorch").** Both provide a **low-level API** to operate with **tensors on both CPU and GPU**. On top of the low-level API, there is also **higher-level API** , called** [Keras](https://keras.io/ "Keras") and [PyTorch Lightning](https://pytorchlightning.ai/ "PyTorch Lightning") **correspondingly.

Low-Level API| TensorFlow|

PyTorch
—|—|—
High-level API| Keras| PyTorch
Lightning

Low-level APIs in both frameworks allow you to build so-called
computational graphs. This graph defines how to compute the output
(usually the loss function) with given input parameters , and can be
pushed for computation on GPU , if it is available. There are functions to
differentiate this computational graph and compute gradients, which can then
be used for optimizing model parameters.

High-level APIs pretty much consider neural networks as a sequence of
layers
, and make constructing most of the neural networks much easier.
Training the model usually requires preparing the data and then calling a
fit function to do the job.

    The high-level API allows you to construct typical neural networks **very quickly without worrying about lots of details**. At the same time, low-level API offer much more control over the training process, and thus they are **used a lot in research** , when you are dealing with **new neural network architectures.**

    It is also important to understand that you can**use both APIs together** , eg. you can develop your own network layer architecture using low-level API, and then use it inside the larger network constructed and trained with the high-level API. Or you can define a network using the high-level API as a sequence of layers, and then use your own low-level training loop to perform optimization. Both APIs use the same basic underlying concepts, and they are designed to work well together.

过拟合检测:

How to detect overfitting

    As you can see from the graph above, overfitting can be detected by a very low training error, and a high validation error. Normally during training we will see both training and validation errors starting to decrease, and then **at some point validation error might stop decreasing and start rising**. This will be a sign of overfitting, and the indicator that we should probably **stop training at this point**  (or at least **make a snapshot of the model**).(及时备份)

1.Keras

    Keras is **a part of Tensorflow 2.x framework**. Let’s make sure we have version 2.x.x of Tensorflow installed:


​ # packages
​ import tensorflow as tf
​ from tensorflow import keras
​ import numpy as np
​ from sklearn.datasets import make_classification
​ import matplotlib.pyplot as plt
​ print(f’Tensorflow version = {tf.version}’)


​ # data prepare
​ np.random.seed(0) # pick the seed for reproducibility - change it to explore the effects of random variations

​ n = 100
​ X, Y = make_classification(n_samples = n, n_features=2,
​ n_redundant=0, n_informative=2, flip_y=0.05,class_sep=1.5)
​ X = X.astype(np.float32)
​ Y = Y.astype(np.int32)

split = [ 70*n//100 ]
train_x, test_x = np.split(X, split)
train_labels, test_labels = np.split(Y, split)

**关于张量的概念:(多维向量) **

Tensor is a multi-dimensional array. It is very convenient to use
tensors to represent different types of data:

  • 400x400 - black-and-white picture
  • 400x400x3 - color picture
  • 16x400x400x3 - minibatch of 16 color pictures
  • 25x400x400x3 - one second of 25-fps video
  • 8x25x400x400x3 - minibatch of 8 1-second videos

Tensors give us a convenient way to represent input/output data, as well we
weights inside the neural network.

归一化数据:(约束网络参数范围)Normalizing Data

Before training, it is common to bring our input featuresto the standard
range of [0,1] (or [-1,1]).
The exact reasons for that we will discuss later
in the course, but in short the reason is the following. We want to avoid
values that flow through our network getting too big or too small , and we
normally agree to keep all values in the small range close to 0. Thus we
initialize the weights with small random numbers , and we keep signals in
the same range.


​ train_x_norm = (train_x-np.min(train_x,axis=0)) / (np.max(train_x,axis=0)-np.min(train_x,axis=0))
​ test_x_norm = (test_x-np.min(train_x,axis=0)) / (np.max(train_x,axis=0)-np.min(train_x,axis=0))

<1>Training One-Layer Network (Perceptron)

①模型定义

In many cases, a neural network would be a sequence of layers. It can be
defined in Keras using **Sequential **model in the following manner:


​ model = keras.models.Sequential()
​ model.add(keras.Input(shape=(2,)))
​ model.add(keras.layers.Dense(1))
​ model.add(keras.layers.Activation(keras.activations.sigmoid))
​ model.summary()

​ # or
​ # Input size, as well as activation function, can also be specified directly in the Dense layer for brevity:
​ model = keras.models.Sequential()
​ model.add(keras.layers.Dense(1,input_shape=(2,),activation=’sigmoid’))
​ model.summary()

说明:

Here, we first create the model, and then add layers to it:

  • First Input layer (which is not strictly speaking a layer) contains the specification of network’s input size

  • Dense layer is the actual perceptron that contains trainable weights

  • Finally, there is a layer with **sigmoid Activation function **to bring the result of the network into 0-1 range (to make it a probability).

    Model: “sequential”
    _________________________________________________________________
    Layer (type) Output Shape Param #

    dense (Dense) (None, 1) 3

    activation (Activation) (None, 1) 0

    =================================================================
    Total params: 3 (12.00 Byte)
    Trainable params: 3 (12.00 Byte)
    Non-trainable params: 0 (0.00 Byte)


②模型编译(指定损失函数、优化方法【梯度下降等】、精度)

Before training the model, we need to compile it , which essentially mean
specifying:

  • Loss function , which defines how loss is calculated. Because we have two-class classification problem, we will use binary cross-entropy loss.
  • Optimizer to use. The simplest option would be to use sgd for stochastic gradient descent , or you can use more sophisticated optimizers such as adam.
  • Metrics that we want to use to measure success of our training. Since it is classification task, a good metrics would be Accuracy (or acc for short)

We can specify loss, metrics and optimizer either as strings , or by
providing some objects from Keras framework. In our example, we need to
**specifylearning_rate parameter **to fine-tune learning speed of our model,
and thus we provide full name of Keras SGD optimizer.

(可使用字符串或对象来定义)


​ model.compile(optimizer=keras.optimizers.SGD(learning_rate=0.2),loss=’binary_crossentropy’,metrics=[‘acc’])

③训练

After compiling the model, we can do the actual training by calling fit
method.
The most important parameters are:

  • x and y specify training data, features and labels respectively

  • If we want validation to be performed on each epoch, we can specify **validation_data **parameter, which would be a tuple of features and labels

  • **epochs **specified the number of epochs

  • If we want training to happen in minibatches , we can specify **batch_size **parameter. You can also pre-batch the data manually before passing it to x/y/validation_data, in which case you do not need batch_size

    model.fit(x=train_x_norm,y=train_labels,validation_data=(test_x_norm,test_labels),epochs=10,batch_size=1)

Note that you can callfit function several times in a row to further
train the network
. If you want to start training from scratch - you
need to re-run the cell with the model definition.

注:训练是叠加的,想从头训练需重定义网络

<2>Multi-Class Classificatio(多分类问题)

    If you need to solve a problem of multi-class classification, your network would have more that one output - corresponding to the number of classes .**Each output will contain the probability of a given class.(多类多输出)**

** When you expect a network to output a set of probabilities , we
need all of them to add up to 1. To ensure this, we use softmax as a final
activation function on the last layer. Softmax takes a vector input, and
makes sure that all components of that vector are transformed into
probabilities.
(softmax 使所有概率和为1)**

    Also, since the output of the network is a C-dimensional vector, we need labels to have the same form. This can be achieved by using **one-hot encoding** , when the number of a class is i converted to **a vector of zeroes, with 1 at the i-th position.(独热码,一位1其他位0)**

    To compare the probability output of the neural network with expected one-hot-encoded label, we use **cross-entropy loss** function. It takes two probability distributions, and outputs a value of how different they are.**(概率输出和独热码标签计算交叉熵损失函数)**

So, to summarize what we need to do for multi-class classification with
classes:

  • The network should have neurons in the last layer

  • Last activation function should be softmax

  • Loss should be cross-entropy loss

  • Labels should be converted to one-hot encoding (this can be done using numpy, or using Keras utils to_categorical)

    model = keras.models.Sequential([
    keras.layers.Dense(5,input_shape=(2,),activation=’relu’),
    keras.layers.Dense(2,activation=’softmax’)
    ])
    model.compile(keras.optimizers.Adam(0.01),’categorical_crossentropy’,[‘acc’])

    Two ways to convert to one-hot encoding

    train_labels_onehot = keras.utils.to_categorical(train_labels)
    test_labels_onehot = np.eye(2)[test_labels]

    hist = model.fit(x=train_x_norm,y=train_labels_onehot,validation_data=[test_x_norm,test_labels_onehot],batch_size=1,epochs=10)

Sparse Categorical Cross-Entropy(稀疏分类交叉熵)(使用整数标签代替独热码标签)

Often labels in multi-class classification are represented by class numbers.
Keras also supports another kind of loss function called sparse categorical
crossentropy
, which expects class number to be integers , and not one-
hot vectors. Using this kind of loss function, we can simplify our training
code:


​ model.compile(keras.optimizers.Adam(0.01),’sparse_categorical_crossentropy’,[‘acc’])
​ model.fit(x=train_x_norm,y=train_labels,validation_data=[test_x_norm,test_labels],batch_size=1,epochs=10)

<3>Multi-Label Classification(多标签分类)

    With multi-label classification, instead of one-hot encoded vector, we will **have a vector that has 1 in position corresponding to all classes** relevant to the input sample. Thus, output of the network should not have normalized probabilities for all classes, but rather for each class individually - which corresponds to using **sigmoid** activation function. Cross-entropy loss can still be used as a loss function.**(不再使用独热码,而是标签中所有包含的位为1)**

<4>总结 Summary of Classification Loss Functions

    We have seen that binary, multi-class and multi-label classification **differ by the type of loss function and activation function on the last layer** of the network. It may all be a little bit confusing if you are just starting to learn, but here are a few rules to keep in mind:
  • If the network has one output (binary classification), we use sigmoid activation function , for multiclass classification - softmax
  • If the output class is represented as one-hot-encoding, the loss function will be cross entropy loss (categorical cross-entropy), if the output contains class number - sparse categorical cross-entropy. For binary classification - use binary cross-entropy (same as log loss)
  • Multi-label classification is when we can have an object belonging to several classes at the same time. In this case, we need to encode labels using one-hot encoding, and use sigmoid as activation function, so that each class probability is between 0 and 1.
Classification Label Format Activation Function Loss
Binary Probability of 1st class sigmoid binary crossentropy
Binary One-hot encoding (2 outputs) softmax categorical crossentropy
Multiclass One-hot encoding softmax categorical crossentropy
Multiclass Class Number softmax sparse categorical crossentropy
Multilabel One-hot encoding sigmoid categorical crossentropy

2.Tensorflow2.x+Keras


​ Tensorflow 2.x + Keras - new version of Tensorflow with integrated Keras functionality, which supports dynamic computation graph , allowing to perform tensor operations very similar to numpy (and PyTorch)


​ import tensorflow as tf
​ import numpy as np
​ print(tf.version)

<1>简单张量操作

①创建

You can easily create simple tensors from lists of np-arrays, or generate
random ones


​ # 创建常量张量
​ a = tf.constant([[1,2],[3,4]])
​ print(a)
​ # 创建正态分布随机10*3张量
​ a = tf.random.normal(shape=(10,3))
​ print(a)

②运算

You can use arithmetic operations on tensors, which are performed element-
wise, as in numpy. Tensors are automatically expanded to required dimension,
if needed. To extract numpy-array from tensor, use
.numpy():(将张量转化为np数组)(以下是运算示例:)


​ print(a-a[0])
​ print(tf.exp(a)[0].numpy())

<2>计算梯度

For back propagation, you need to compute gradients. This is done using
tf.GradientTape() idiom:

  • Add with tf.GradientTape() as tape: block around our computations

  • Mark those tensors with respect to which we need to compute gradients by calling tape.watch (all variables are watched automatically)

  • Compute whatever we need (build computational graph)

  • Obtain gradients using tape.gradient

    a = tf.random.normal(shape=(2, 2))
    b = tf.random.normal(shape=(2, 2))

    with tf.GradientTape() as tape:
    tape.watch(a) # Start recording the history of operations applied to a
    c = tf.sqrt(tf.square(a) + tf.square(b)) # Do some math using a

    What’s the gradient of c with respect to a?

    dc_da = tape.gradient(c, a)
    print(dc_da)

监视变量、构建运算关系、计算梯度

< 3>例1:线性回归问题

生成数据集


​ import matplotlib.pyplot as plt
​ from sklearn.datasets import make_classification, make_regression
​ from sklearn.model_selection import train_test_split
​ import random

​ np.random.seed(13) # pick the seed for reproducability - change it to explore the effects of random variations

train_x = np.linspace(0, 3, 120)
train_labels = 2 * train_x + 0.9 + np.random.randn(*train_x.shape) * 0.5

plt.scatter(train_x,train_labels)

定义损失函数:


​ input_dim = 1
​ output_dim = 1
​ learning_rate = 0.1

​ # This is our weight matrix
​ w = tf.Variable([[100.0]])
​ # This is our bias vector
​ b = tf.Variable(tf.zeros(shape=(output_dim,)))

def f(x):
return tf.matmul(x,w) + b

def compute_loss(labels, predictions):
  return tf.reduce_mean(tf.square(labels - predictions))

训练函数:


​ def train_on_batch(x, y):
​ with tf.GradientTape() as tape:
​ predictions = f(x)
​ loss = compute_loss(y, predictions)
​ # Note that tape.gradient works with a list as well (w, b).
​ dloss_dw, dloss_db = tape.gradient(loss, [w, b])
​ w.assign_sub(learning_rate * dloss_dw)
​ b.assign_sub(learning_rate * dloss_db)
​ return loss

训练集生成:


​ # Shuffle the data. 打乱数据
​ indices = np.random.permutation(len(train_x))
​ features = tf.constant(train_x[indices],dtype=tf.float32)
​ labels = tf.constant(train_labels[indices],dtype=tf.float32)

训练过程:(第 i 到 i+batch_size 为一组)


​ batch_size = 4
​ for epoch in range(10):
​ for i in range(0,len(features),batch_size):
​ loss = train_on_batch(tf.reshape(features[i:i+batch_size],(-1,1)),tf.reshape(labels[i:i+batch_size],(-1,1)))
​ print(‘Epoch %d: last batch loss = %.4f’ % (epoch, float(loss)))

绘制:


​ plt.scatter(train_x,train_labels)
​ x = np.array([min(train_x),max(train_x)])
​ y = w.numpy()[0,0]*x+b.numpy()[0]
​ plt.plot(x,y,color=’red’)


​ We now have obtained optimized parameters $W$ and $b$. Note that their values are similar to the original values used when generating the dataset (W=2, b=1)

本文转自 https://blog.csdn.net/qq_32971095/article/details/137124492,如有侵权,请联系删除。

> --------------- THE END -------------- <