Turing Complete?

The point of stating that a mathematical model is Turing Complete is to reveal the capability of the model to perform any calculation, given a sufficient amount of resources (i.e. infinite), not to show whether a specific implementation of a model does have those resources. Non-Turing complete models would not be able to handle a specific set of calculations, even with enough resources, something that reveals a difference in the way the two models operate, even when they have limited resources. Of course, to prove this property, you have to do have to assume that the models are able to use an infinite amount of resources, but this property of a model is relevant even when resources are limited.

LSTM Question

Sigmoids make sense for the gates, as they control how much of the signal is left into/out of the cell. Think of it as a percentage: how many percent of the input signal should I store in the cell (or put out of the cell). It doesn’t make sense to amplify a signal and write 110% of the current cell signal to the output. That’s not what the gates are for. Likewise, it doesn’t make sense for the input unit to say “the current input is 901% relevant for the memory cell, so please store it 9 times as strongly as usual”. If that were the case, the input/output weights would have caused the signal to be 900% stronger to begin with.

For the output activation, ReLU can of course be used. However you might easily run into numerical problems, given that gradients already need to be truncated oftentimes (and ReLU doesn’t dampen them the way sigmoids do). If I recall correctly Bengio’s lab has a paper somewhere where they use ReLU for RNNs and they said they had problems of this kind (I may be wrong though, and I’m unable to find the paper right now).

Also, one of the benefits of ReLUs is that they stop vanishing gradients. But LSTM was already designed not to suffer from that, to begin with. Given that you don’t have vanishing gradient problems, it comes down to the question whether relu is better than sigmoids on principle (because it can learn better functions) or because its main advantage is easier training. Of course, this is a simplified view, especially since LSTM was not originally designed to be “deep”. if you use many layers of lstms, you might still get vanishing gradients if you use sigmoids.

UAI 上发表的 Deep Hybrid Models

Volodymyr Kuleshov, Stefano Ermon

Most methods in machine learning are described as either discriminative or generative. The former often attain higher predictive accuracy, while the latter are more strongly regularized and can deal with missing data.

Here, we propose a new framework to combine a broad class of discriminative and generative models, interpolating between the two extremes with a multiconditional likelihood objective.

Unlike previous approaches, we couple the two components through shared latent variables, and train using recent advances in variational inference.

Instantiating our framework with modern deep architectures gives rise to deep hybrid models, a highly flexible family that generalizes several existing models and is effective in the semi-supervised setting, where it results in improvements over the state of the art on the SVHN dataset.

Thompson Sampling Is Asymptotically Optimal in General Environments

Jan Leike et al

We discuss a variant of Thompson sampling for nonparametric reinforcement learning in a countable classes of general stochastic environments.

These environments can be non-Markov, nonergodic, and partially observable. We show that Thompson sampling learns the environment class in the sense that (1) asymptotically its value converges to the optimal value in mean and (2) given a recoverability assumption regret is sublinear.

Improve Your Python: Python Classes and Object Oriented Programming
The class is a fundamental building block in Python. It is the underpinning for not only many popular programs and libraries, but the Python standard library as well. Understanding what classes are, when to use them, and how they can be useful is essential, and the goal of this article. In the process, we’ll explore what the term Object-Oriented Programming means and how it ties together with Python classes.

Everything Is An Object…
What is the class keyword used for, exactly? Like its function-based cousin def, it concerns the definition of things. While def is used to define a function, class is used to define a class. And what is a class? Simply a logical grouping of data and functions (the latter of which are frequently referred to as “methods” when defined within a class).

What do we mean by “logical grouping”? Well, a class can contain any data we’d like it to, and can have any functions (methods) attached to it that we please. Rather than just throwing random things together under the name “class”, we try to create classes where there is a logical connection between things. Many times, classes are based on objects in the real world (like Customer or Product). Other times, classes are based on concepts in our system, like HTTPRequest or Owner.

Regardless, classes are a modeling technique; a way of thinking about programs. When you think about and implement your system in this way, you’re said to be performing Object-Oriented Programming. “Classes” and “objects” are words that are often used interchangeably, but they’re not really the same thing. Understanding what makes them different is the key to understanding what they are and how they work.

..So Everything Has A Class?
Classes can be thought of as blueprints for creating objects. When I define a Customer class using the class keyword, I haven’t actually created a customer. Instead, what I’ve created is a sort of instruction manual for constructing “customer” objects. Let’s look at the following example code:

class Customer(object):
“””A customer of ABC Bank with a checking account. Customers have the
following properties:

Attributes:
    name: A string representing the customer's name.
    balance: A float tracking the current balance of the customer's account.
"""

def __init__(self, name, balance=0.0):
    """Return a Customer object whose name is *name* and starting
    balance is *balance*."""
    self.name = name
    self.balance = balance

def withdraw(self, amount):
    """Return the balance remaining after withdrawing *amount*
    dollars."""
    if amount > self.balance:
        raise RuntimeError('Amount greater than available balance.')
    self.balance -= amount
    return self.balance

def deposit(self, amount):
    """Return the balance remaining after depositing *amount*
    dollars."""
    self.balance += amount
    return self.balance

The class Customer(object) line does not create a new customer. That is, just because we’ve defined a Customer doesn’t mean we’ve created one; we’ve merely outlined the blueprint to create a Customer object. To do so, we call the class’s init method with the proper number of arguments (minus self, which we’ll get to in a moment).

So, to use the “blueprint” that we created by defining the class Customer (which is used to create Customer objects), we call the class name almost as if it were a function: jeff = Customer(‘Jeff Knupp’, 1000.0). This line simply says “use the Customer blueprint to create me a new object, which I’ll refer to as jeff.”

The jeff object, known as an instance, is the realized version of the Customer class. Before we called Customer(), no Customer object existed. We can, of course, create as many Customer objects as we’d like. There is still, however, only one Customer class, regardless of how many instances of the class we create.

self?
So what’s with that self parameter to all of the Customer methods? What is it? Why, it’s the instance, of course! Put another way, a method like withdraw defines the instructions for withdrawing money from some abstract customer’s account. Calling jeff.withdraw(100.0) puts those instructions to use on the jeff instance.

So when we say def withdraw(self, amount):, we’re saying, “here’s how you withdraw money from a Customer object (which we’ll call self) and a dollar figure (which we’ll call amount). self is the instance of the Customer that withdraw is being called on. That’s not me making analogies, either. jeff.withdraw(100.0) is just shorthand for Customer.withdraw(jeff, 100.0), which is perfectly valid (if not often seen) code.

init
self may make sense for other methods, but what about init? When we call init, we’re in the process of creating an object, so how can there already be a self? Python allows us to extend the self pattern to when objects are constructed as well, even though it doesn’t exactly fit. Just imagine that jeff = Customer(‘Jeff Knupp’, 1000.0) is the same as calling jeff = Customer(jeff, ‘Jeff Knupp’, 1000.0); the jeff that’s passed in is also made the result.

This is why when we call init, we initialize objects by saying things like self.name = name. Remember, since self is the instance, this is equivalent to saying jeff.name = name, which is the same as jeff.name = ‘Jeff Knupp. Similarly, self.balance = balance is the same as jeff.balance = 1000.0. After these two lines, we consider the Customer object “initialized” and ready for use.

Be careful what you init

After init has finished, the caller can rightly assume that the object is ready to use. That is, after jeff = Customer(‘Jeff Knupp’, 1000.0), we can start making deposit and withdraw calls on jeff; jeff is a fully-initialized object.

Imagine for a moment we had defined the Customer class slightly differently:

class Customer(object):
“””A customer of ABC Bank with a checking account. Customers have the
following properties:

Attributes:
    name: A string representing the customer's name.
    balance: A float tracking the current balance of the customer's account.
"""

def __init__(self, name):
    """Return a Customer object whose name is *name*."""
    self.name = name

def set_balance(self, balance=0.0):
    """Set the customer's starting balance."""
    self.balance = balance

def withdraw(self, amount):
    """Return the balance remaining after withdrawing *amount*
    dollars."""
    if amount > self.balance:
        raise RuntimeError('Amount greater than available balance.')
    self.balance -= amount
    return self.balance

def deposit(self, amount):
    """Return the balance remaining after depositing *amount*
    dollars."""
    self.balance += amount
    return self.balance

This may look like a reasonable alternative; we simply need to call set_balance before we begin using the instance. There’s no way, however, to communicate this to the caller. Even if we document it extensively, we can’t force the caller to call jeff.set_balance(1000.0) before calling jeff.withdraw(100.0). Since the jeff instance doesn’t even have a balance attribute until jeff.set_balance is called, this means that the object hasn’t been “fully” initialized.

The rule of thumb is, don’t introduce a new attribute outside of the init method, otherwise you’ve given the caller an object that isn’t fully initialized. There are exceptions, of course, but it’s a good principle to keep in mind. This is part of a larger concept of object consistency: there shouldn’t be any series of method calls that can result in the object entering a state that doesn’t make sense.

Invariants (like, “balance should always be a non-negative number”) should hold both when a method is entered and when it is exited. It should be impossible for an object to get into an invalid state just by calling its methods. It goes without saying, then, that an object should start in a valid state as well, which is why it’s important to initialize everything in the init method.

Instance Attributes and Methods
An function defined in a class is called a “method”. Methods have access to all the data contained on the instance of the object; they can access and modify anything previously set on self. Because they use self, they require an instance of the class in order to be used. For this reason, they’re often referred to as “instance methods”.

If there are “instance methods”, then surely there are other types of methods as well, right? Yes, there are, but these methods are a bit more esoteric. We’ll cover them briefly here, but feel free to research these topics in more depth.

Static Methods
Class attributes are attributes that are set at the class-level, as opposed to the instance-level. Normal attributes are introduced in the init method, but some attributes of a class hold for all instances in all cases. For example, consider the following definition of a Car object:

class Car(object):

wheels = 4

def __init__(self, make, model):
    self.make = make
    self.model = model

mustang = Car(‘Ford’, ‘Mustang’)
print mustang.wheels

4

print Car.wheels

4

A Car always has four wheels, regardless of the make or model. Instance methods can access these attributes in the same way they access regular attributes: through self (i.e. self.wheels).

There is a class of methods, though, called static methods, that don’t have access to self. Just like class attributes, they are methods that work without requiring an instance to be present. Since instances are always referenced through self, static methods have no self parameter.

The following would be a valid static method on the Car class:

class Car(object):

def make_car_sound():
print ‘VRooooommmm!’
No matter what kind of car we have, it always makes the same sound (or so I tell my ten month old daughter). To make it clear that this method should not receive the instance as the first parameter (i.e. self on “normal” methods), the @staticmethod decorator is used, turning our definition into:

class Car(object):

@staticmethod
def make_car_sound():
print ‘VRooooommmm!’
Class Methods
A variant of the static method is the class method. Instead of receiving the instance as the first parameter, it is passed the class. It, too, is defined using a decorator:

class Vehicle(object):

@classmethod
def is_motorcycle(cls):
return cls.wheels == 2
Class methods may not make much sense right now, but that’s because they’re used most often in connection with our next topic: inheritance.

Inheritance
While Object-oriented Programming is useful as a modeling tool, it truly gains power when the concept of inheritance is introduced. Inherticance is the process by which a “child” class derives the data and behavior of a “parent” class. An example will definitely help us here.

Imagine we run a car dealership. We sell all types of vehicles, from motorcycles to trucks. We set ourselves apart from the competition by our prices. Specifically, how we determine the price of a vehicle on our lot: $5,000 x number of wheels a vehicle has. We love buying back our vehicles as well. We offer a flat rate - 10% of the miles driven on the vehicle. For trucks, that rate is $10,000. For cars, $8,000. For motorcycles, $4,000.

If we wanted to create a sales system for our dealership using Object-oriented techniques, how would we do so? What would the objects be? We might have a Sale class, a Customer class, an Inventory class, and so forth, but we’d almost certainly have a Car, Truck, and Motorcycle class.

What would these classes look like? Using what we’ve learned, here’s a possible implementation of the Car class:

class Car(object):
“””A car for sale by Jeffco Car Dealership.

Attributes:
    wheels: An integer representing the number of wheels the car has.
    miles: The integral number of miles driven on the car.
    make: The make of the car as a string.
    model: The model of the car as a string.
    year: The integral year the car was built.
    sold_on: The date the vehicle was sold.
"""

def __init__(self, wheels, miles, make, model, year, sold_on):
    """Return a new Car object."""
    self.wheels = wheels
    self.miles = miles
    self.make = make
    self.model = model
    self.year = year
    self.sold_on = sold_on

def sale_price(self):
    """Return the sale price for this car as a float amount."""
    if self.sold_on is not None:
        return 0.0  # Already sold
    return 5000.0 * self.wheels

def purchase_price(self):
    """Return the price for which we would pay to purchase the car."""
    if self.sold_on is None:
        return 0.0  # Not yet sold
    return 8000 - (.10 * self.miles)

...

OK, that looks pretty reasonable. Of course, we would likely have a number of other methods on the class, but I’ve shown two of particular interest to us: sale_price and purchase_price. We’ll see why these are important in a bit.

Now that we’ve got the Car class, perhaps we should crate a Truck class? Let’s follow the same pattern we did for car:

class Truck(object):
“””A truck for sale by Jeffco Car Dealership.

Attributes:
    wheels: An integer representing the number of wheels the truck has.
    miles: The integral number of miles driven on the truck.
    make: The make of the truck as a string.
    model: The model of the truck as a string.
    year: The integral year the truck was built.
    sold_on: The date the vehicle was sold.
"""

def __init__(self, wheels, miles, make, model, year, sold_on):
    """Return a new Truck object."""
    self.wheels = wheels
    self.miles = miles
    self.make = make
    self.model = model
    self.year = year
    self.sold_on = sold_on

def sale_price(self):
    """Return the sale price for this truck as a float amount."""
    if self.sold_on is not None:
        return 0.0  # Already sold
    return 5000.0 * self.wheels

def purchase_price(self):
    """Return the price for which we would pay to purchase the truck."""
    if self.sold_on is None:
        return 0.0  # Not yet sold
    return 10000 - (.10 * self.miles)

...

Wow. That’s almost identical to the car class. One of the most important rules of programming (in general, not just when dealing with objects) is “DRY” or “Don’t Repeat Yourself. We’ve definitely repeated ourselves here. In fact, the Car and Truck classes differ only by a single character (aside from comments).

So what gives? Where did we go wrong? Our main problem is that we raced straight to the concrete: Cars and Trucks are real things, tangible objects that make intuitive sense as classes. However, they share so much data and functionality in common that it seems there must be an abstraction we can introduce here. Indeed there is: the notion of Vehicles.

Abstract Classes
A Vehicle is not a real-world object. Rather, it is a concept that some real-world objects (like cars, trucks, and motorcycles) embody. We would like to use the fact that each of these objects can be considered a vehicle to remove repeated code. We can do that by creating a Vehicle class:

class Vehicle(object):
“””A vehicle for sale by Jeffco Car Dealership.

Attributes:
    wheels: An integer representing the number of wheels the vehicle has.
    miles: The integral number of miles driven on the vehicle.
    make: The make of the vehicle as a string.
    model: The model of the vehicle as a string.
    year: The integral year the vehicle was built.
    sold_on: The date the vehicle was sold.
"""

base_sale_price = 0

def __init__(self, wheels, miles, make, model, year, sold_on):
    """Return a new Vehicle object."""
    self.wheels = wheels
    self.miles = miles
    self.make = make
    self.model = model
    self.year = year
    self.sold_on = sold_on


def sale_price(self):
    """Return the sale price for this vehicle as a float amount."""
    if self.sold_on is not None:
        return 0.0  # Already sold
    return 5000.0 * self.wheels

def purchase_price(self):
    """Return the price for which we would pay to purchase the vehicle."""
    if self.sold_on is None:
        return 0.0  # Not yet sold
    return self.base_sale_price - (.10 * self.miles)

Now we can make the Car and Truck class inherit from the Vehicle class by replacing object in the line class Car(object). The class in parenthesis is the class that is inherited from (object essentially means “no inheritance”. We’ll discuss exactly why we write that in a bit).

We can now define Car and Truck in a very straightforward way:

class Car(Vehicle):

def __init__(self, wheels, miles, make, model, year, sold_on):
    """Return a new Car object."""
    self.wheels = wheels
    self.miles = miles
    self.make = make
    self.model = model
    self.year = year
    self.sold_on = sold_on
    self.base_sale_price = 8000

class Truck(Vehicle):

def __init__(self, wheels, miles, make, model, year, sold_on):
    """Return a new Truck object."""
    self.wheels = wheels
    self.miles = miles
    self.make = make
    self.model = model
    self.year = year
    self.sold_on = sold_on
    self.base_sale_price = 10000

This works, but has a few problems. First, we’re still repeating a lot of code. We’d ultimately like to get rid of all repetition. Second, and more problematically, we’ve introduced the Vehicle class, but should we really allow people to create Vehicle objects (as opposed to Cars or Trucks)? A Vehicle is just a concept, not a real thing, so what does it mean to say the following:

v = Vehicle(4, 0, ‘Honda’, ‘Accord’, 2014, None)
print v.purchase_price()
A Vehicle doesn’t have a base_sale_price, only the individual child classes like Car and Truck do. The issue is that Vehicle should really be an Abstract Base Class. Abstract Base Classes are classes that are only meant to be inherited from; you can’t create instance of an ABC. That means that, if Vehicle is an ABC, the following is illegal:

v = Vehicle(4, 0, ‘Honda’, ‘Accord’, 2014, None)
It makes sense to disallow this, as we never meant for vehicles to be used directly. We just wanted to use it to abstract away some common data and behavior. So how do we make a class an ABC? Simple! The abc module contains a metaclass called ABCMeta (metaclasses are a bit outside the scope of this article). Setting a class’s metaclass to ABCMeta and making one of its methods virtual makes it an ABC. A virtual method is one that the ABC says must exist in child classes, but doesn’t necessarily actually implement. For example, the Vehicle class may be defined as follows:

from abc import ABCMeta, abstractmethod

class Vehicle(object):
“””A vehicle for sale by Jeffco Car Dealership.

Attributes:
    wheels: An integer representing the number of wheels the vehicle has.
    miles: The integral number of miles driven on the vehicle.
    make: The make of the vehicle as a string.
    model: The model of the vehicle as a string.
    year: The integral year the vehicle was built.
    sold_on: The date the vehicle was sold.
"""

__metaclass__ = ABCMeta

base_sale_price = 0

def sale_price(self):
    """Return the sale price for this vehicle as a float amount."""
    if self.sold_on is not None:
        return 0.0  # Already sold
    return 5000.0 * self.wheels

def purchase_price(self):
    """Return the price for which we would pay to purchase the vehicle."""
    if self.sold_on is None:
        return 0.0  # Not yet sold
    return self.base_sale_price - (.10 * self.miles)

@abstractmethod
def vehicle_type():
    """"Return a string representing the type of vehicle this is."""
    pass

Now, since vehicle_type is an abstractmethod, we can’t directly create an instance of Vehicle. As long as Car and Truck inherit from Vehicle and define vehicle_type, we can instantiate those classes just fine.

Returning to the repetition in our Car and Truck classes, let see if we can’t remove that by hoisting up common functionality to the base class, Vehicle:

from abc import ABCMeta, abstractmethod
class Vehicle(object):
“””A vehicle for sale by Jeffco Car Dealership.

Attributes:
    wheels: An integer representing the number of wheels the vehicle has.
    miles: The integral number of miles driven on the vehicle.
    make: The make of the vehicle as a string.
    model: The model of the vehicle as a string.
    year: The integral year the vehicle was built.
    sold_on: The date the vehicle was sold.
"""

__metaclass__ = ABCMeta

base_sale_price = 0
wheels = 0

def __init__(self, miles, make, model, year, sold_on):
    self.miles = miles
    self.make = make
    self.model = model
    self.year = year
    self.sold_on = sold_on

def sale_price(self):
    """Return the sale price for this vehicle as a float amount."""
    if self.sold_on is not None:
        return 0.0  # Already sold
    return 5000.0 * self.wheels

def purchase_price(self):
    """Return the price for which we would pay to purchase the vehicle."""
    if self.sold_on is None:
        return 0.0  # Not yet sold
    return self.base_sale_price - (.10 * self.miles)

@abstractmethod
def vehicle_type(self):
    """"Return a string representing the type of vehicle this is."""
    pass

Now the Car and Truck classes become:

class Car(Vehicle):
“””A car for sale by Jeffco Car Dealership.”””

base_sale_price = 8000
wheels = 4

def vehicle_type(self):
    """"Return a string representing the type of vehicle this is."""
    return 'car'

class Truck(Vehicle):
“””A truck for sale by Jeffco Car Dealership.”””

base_sale_price = 10000
wheels = 4

def vehicle_type(self):
    """"Return a string representing the type of vehicle this is."""
    return 'truck'

This fits perfectly with our intuition: as far as our system is concerned, the only difference between a car and truck is the base sale price. Defining a Motorcycle class, then, is similarly simple:

class Motorcycle(Vehicle):
“””A motorcycle for sale by Jeffco Car Dealership.”””

base_sale_price = 4000
wheels = 2

def vehicle_type(self):
    """"Return a string representing the type of vehicle this is."""
    return 'motorcycle'

Inheritance and the LSP
Even though it seems like we used inheritance to get rid of duplication, what we were really doing was simply providing the proper level of abstraction. And abstraction is the key to understanding inheritance. We’ve seen how one side-effect of using inheritance is that we reduce duplicated code, but what about from the caller’s perspective. How does using inheritance change that code?

Quite a bit, it turns out. Imagine we have two classes, Dog and Person, and we want to write a function that takes either type of object and prints out whether or not the instance in question can speak (a dog can’t, a person can). We might write code like the following:

def can_speak(animal):
if isinstance(animal, Person):
return True
elif isinstance(animal, Dog):
return False
else:
raise RuntimeError(‘Unknown animal!’)
That works when we only have two types of animals, but what if we have twenty, or two hundred? That if…elif chain is going to get quite long.

The key insight here is that can_speak shouldn’t care what type of animal it’s dealing with, the animal class itself should tell us if it can speak. By introducing a common base class, Animal, that defines can_speak, we relieve the function of it’s type-checking burden. Now, as long as it knows it was an Animal that was passed in, determining if it can speak is trivial:

def can_speak(animal):
return animal.can_speak()
This works because Person and Dog (and whatever other classes we crate to derive from Animal) follow the Liskov Substitution Principle. This states that we should be able to use a child class (like Person or Dog) wherever a parent class (Animal) is expected an everything will work fine. This sounds simple, but it is the basis for a powerful concept we’ll discuss in a future article: interfaces.

Summary
Hopefully, you’ve learned a lot about what Python classes are, why they’re useful, and how to use them. The topic of classes and Object-oriented Programming are insanely deep. Indeed, they reach to the core of computer science. This article is not meant to be an exhaustive study of classes, nor should it be your only reference. There are literally thousands of explanations of OOP and classes available online, so if you didn’t find this one suitable, certainly a bit of searching will reveal one better suited to you.

As always, corrections and arguments are welcome in the comments. Just try to keep it civil.

Lastly, it’s not too late to see me speak at the upcoming Wharton Web Conference at UPenn! Check the site for info and tickets.

Gan-Tutorial

本报告总结了 NIPS 上 Ian Goodfellow 的 生成式对抗网络课程。其内容有:(1)为何生成式建模是值得学习的话题;(2)生成式模型如何工作,GAN 与其他生成式模型的比较;(3)GAN 工作原理;(4)GAN 的研究前沿;(5)将 GAN 和其他方法组合的当前最优的图像模型。最后给出帮助读者学习的三个练习和相应的解答。

引言

本报告总结了 NIPS2016 上的生成对抗网络的课程。回答了很多之前的听众提出来的大多数问题,并尽可能地确保它对大家有用。当然,它不能算是 GAN 这个领域的完整的回顾;很多优秀的论文并没有得到展示,因为这些论文并不是针对大家提出来的这些问题的,而本课程是作为一个两个小时的演讲,所以也没有足够的时间来涉及所有话题。

本报告给出了:(1)为何生成式建模是一个值得研究的领域;(2)生成式模型如何工作,GAN 与其他的生成模型的对比(3)GAN 工作的细节(4)GAN 研究前沿和(5)组合 GAN 和其他方法得到了当前最优的图像模型。最后这个报告会包含三个练习及其解答。

而本课程的相关演示文档也可以获得,有 pdf 和 keynote 两种格式:http://www.iangoodfellow.com/slides/2016-12-04-NIPS.pdfhttp://www.iangoodfellow.com/slides/2016-12-04-NIPS.key

图 1

图 2

课程视频是由 NIPS 录制的,过段时间应该能够看到。

生成式对抗网络是生成式模型的一种。“生成式模型”这个词有很多用法。本课程中,它表示任何一种可以接受训练集(从一个分布 pdata 采样的样本)学会表示该分布的估计的模型。其结果是一个概率分布 pmodel。有些时候,模型会显式地估计 pmodel,比如说图1所示。还有些时候,模型只能从 pmodel 中生成样本,比如说图2。有些模型能够同时这两件事情。虽说 GANs 经过设计可以做到这两点,我们这里把精力放在 GANs 样本生成上。

1 为何学习生成式建模?

大家有理由会想知道为何生成式模型值得学习,特别是那些只能够生成数据而不是能对密度函数进行估计的生成式模型。总而言之,当我们把它用在图像上是,这样的模型仅仅能够给出更多的图像,我们其实并不缺少图像啊。

下面给出几个学习和研究生成式模型的理由:

  • 训练生成式模型和从生成式模型进行采样是我们表示和操纵高维概率分布的能力的特别好的检验。高维概率分布在很多的应用数学和工程领域都是举足轻重的研究对象。
  • 生成式模型可以被以多种方式用在强化学习中。强化学习算法常常会被分类成两类:基于模型和免模型的,而基于模型的算法就是包含生成式模型的。时间序列数据的生成式模型可以被用来模型可能的未来。这样的模型可以被用来进行规划和多种方式的强化学习任务。用于规划的生成式模型可以学到在给定世界当前的状态和假设智能体会采取的行动作为输入时,关于世界的未来状态的条件分布。而智能体可以用不同的潜在行动来查询这个模型,选择模型预测为最可能从此行动得到想要的状态的行动。最近出来的这种模型例子是,Finn 等人的研究工作,而对把这个模型用于规划的例子就是 Finn 和 Levine 的研究工作。另一种生成式模型可以被用在强化学习上的方式是在一个想象的环境中进行学习,其中错误的行动不会给智能体带来真实的损失。生成式模型还可以通过追踪不同状态被访问的频率或者不同的行动被尝试的频率来指导探索,特别是 GANs,可以用在逆强化学习中,在第 5.6 节我们会讲其与强化学习的关联。
  • 生成式模型可以用 missing 数据进行训练,并能够给出输入数据的 missing 部分。而 missing 数据的特别有趣的例子就是半监督学习(semi-supervised learning),其中很多(或者几乎所有)训练样本的标签都是丢失的。现代深度学习算法一般是要求有特别多的带标签样本才能够泛化得很好。半监督学习是降低样本标签的策略。这个学习算法可以通过研究大量无标签样本提升自己的泛化性能,而这些无标签样本是很容易获得的。生成式模型尤其是 GANs,能够很好地执行半监督学习算法。这个会在第 5.4 节介绍。

图 3

  • 生成式模型尤其是 GANs,让机器学习能够更好地利用多模态输出。对很多任务,单个输入可能会对应不同的正确答案,这些答案中的每一个都是可以接受的。某些传统训练机器学习模型的方式,如最小化目标输出和模型预测输出的均方误差,并不能够训练出可以产生多个不同的正确答案的模型。这种场景的一个例子就是预测视频的下一帧,参见图 3。
  • 最后,很多任务本质上都是需要某个分布中的采样的样本的。

我们在下面列举一些此类任务的例子:

  • 单个图片超分辨率:这个任务的目标是以低分辨率图像为输入,合成高分辨率的相同内容的图片。生成式建模的作用就是让模型能够为图像加入原本就该属于输入的信息。有很多对应于低分辨图像的超分辨图像。这个模型应该选择一副采样自可能的图像的概率分布的图像。选择一幅是所有可能图像的平均图像可能会得到一个太过模糊的结果。参见图 4。

图 4

  • 一些创作艺术的任务。两个近期项目表明生成式模型,尤其是 GANs,可以用于创建交互式程序来帮助用户创作对应于他们想象的粗线条场景的更加真实的图像。参见图 5 和图 6.

图 5

图 6

  • 图像到图像的转换应用可以转换航摄照片成地图或者将线条转换为图像。有很多困难但是很有用的创造性的应用。参见图 7。

图 7

所有上面提到以及其他的生成式模型的应用都说明花时间和资源来提升生成式模型的性能是值得的。

在基于能量的模型中使用前期推断近似反向传播

Early Inference in Energy-Based Models Approximates Back-Propagation(Arxiv)
作者: Yoshua Bengio, CIFAR Senior Fellow
Montreal Institute for Learning Algorithms, University of Montreal

摘要

本文揭示了在含隐藏变量的基于能量的模型中的 Langevin MCMC 推断满足在进入稳定分布后的前期推断对应于误差梯度到内层的传播,和反向传播类似。被反向传播的错误是和课件单元相对应的,这些单元受到让他们远离稳定分布的外部驱动力的影响。反向传播误差梯度对应于隐藏单元激活函数当前的导数。这个发现可以作为解释大脑在深层结构中如何进行如反向传播那样高效地贡献分配(credit assignment)的理论基点。在该理论中,连续值的隐藏变量对应于平均的电势差(根据时间、激发和在同一微柱体(minicolumn)的神经元而定),而神经计算对应于同一时刻的近似推断和误差的反向传播。

minicolumn,人脑中最小的组件;A cortical minicolumn is a vertical column through the cortical layers of the brain, comprising perhaps 80–120 neurons, except in the primate primary visual cortex (V1), where there are typically more than twice the number.

1. 引言

很多科学家已经给出了下面的假设,给定可感知信息的状态(当前和过去的输入),神经元可以进行归并式的推断,即,转至更好地解释可感知信息的设定。我们可以将内部神经元(隐藏变量或者隐藏神经元)的设定当成是已观测到的感知数据的解释

在这个假设下,当一个不可预知的信号抵达感知(visible)神经元时,其他的神经元会改变他们的自身状态(已经处于一种随机均衡状态),以反应输入中的变化。如果这个错误信息缓慢地让感知神经元导向外部观测值,那么这会给整个网络造成扰动。本文,我们考虑了在由于感知神经元的不正确的预测造成的扰动传播到大脑的内部区域过程中的早期发生的情况。我们说明了扰动的传播其实在数学上等价于在深度神经网络中反向传播算法的激活函数的梯度的传播。

这个结果假设了隐藏变量是实值的(不像 Boltzmann Machine 中那样),所以该系统是一个基于能量的模型,通过在感知神经元和隐藏神经元上定义一个能量函数给出。同样还假设了神经元是噪声注入的遗漏积分器(leaky integrator),这样的神经计算会出现随着噪声增加能量逐步下降的现象,这就对应了 Langevin MCMC 的推断机制。

leaky integrator: A kind of deliberately imperfect integration used in numerical work is called leaky integration. The name derives from the analogous situation of electrical circuits, where the voltage on a capacitor is the integral of the current: in real life, some of the current leaks away. An equation to model leaky integration is

当然这只是一个垫脚石。在完备的学习、推断和贡献分配理论生物学合理,并且符合机器学习的观点(整个网络的全局优化,不受限于感知神经元的学习)出现前,很多地方仍然需要阐释清楚。特别地,基于能量的模型需要连接的对称性,但是需要注意的是,模型中的隐藏元史不需要完全和大脑中的神经元完全一致的(比如说,这些隐藏元可以是皮质微电路的神经元的群组)。还要对如何从学习过程本身产生对称性进行探究。另一个有趣的问题是学习算法本身。正如本文阐述的,突触根据相对于预测误差的随机梯度的比例改变,如果突触更新类似于模仿局部目标函数的最小化过程模仿脉冲时间依赖的可塑性(STDP),参见 Bengio 2015b。

2. 神经计算进行推断:能量的下降

我们考虑下面的假设:神经计算的核心解释(在一个短时间区域内,神经元的权重保持不变,神经元的行为)就是在进行迭代式推断。迭代式推断表示网络的隐藏元 $h$ 逐步地在给定输入的感知数据 $x$ 时根据当前的在一定的参数设定下的模型状态进行改变到更加可能的设定。换言之,他们在进行近似地改变到在 $P(h|x)$ 下的设定,最终能够从 $P(h|x)$ 中进行取样。

在对 Boltzmann Machine或者基于能量的模型反向传播间建立联系 前,我们看看一些数学上的细节——神经计算如何被解释为推断的。

2.1 遗漏积分神经元类比 Langevin MCMC

首先,我们看看经典的遗漏计分神经计算方程。令 $s_t$ 表示在时间 $t$ 系统的状态,每个单元一个元素的向量,其中 $s_{t,i}$ 是关联于第 $i$ 个单元的实值,对应于和时间相关的电势能。

让我们用 $x_t$ 表示感知元,这是 $s_t$ 的一个子集(就是外部驱动输入);$h_t$ 表示隐藏元集合($s_t = x_t + h_t$)。令 $f$ 为根据之前完整的状态计算 $s_t$ 新值的函数,$f = (f_x, f_h)$ 表示 $f$ 是分别在感知元和隐藏元上输出预测的。隐藏元上的时间演化要求符合遗漏积分方程,即:

其中

表示网络在神经元上产生压力,就是说,$R_i(s)$ 是网络剩下的部分要求神经元 $i$ 进行改变,而 $\tilde{s}$ 是源自突触噪声和脉冲效应。这里我们粗略给出噪声模型:

我们看到上面的方程对应于下面微分方程的离散化:

这将使得 $h$ 指数级地快速导向目标值 $R_h(s)$,伴随着一系列由 $\eta$ 导致的随机游走。我们假设$R_i(s)$ 是源自那些和神经元 $i$ 连接的神经元的输入信号的带权和,尽管下面的推导并不依赖 $R$ 具体的形式,只是要保证它对应于一个能量梯度。

2.2 机器学习解释

在公式(1) 中的 $R({\tilde{s}_t})$ 表示对一个新设定的猜测,$R({\tilde{s}_t})-s_t$ 表示移动的噪声方向。不含噪声的移动为 $R(s_t)-s_t$,但是注入噪声是探索 $P(h|x) $整个分布而不是单个mode的重要保证。

我们现在给出这和当前在使用去噪自编码器和去噪分数匹配技术的无监督学习的有趣的关联。如果 $R(s)$ 是输入率 $\rho(s)$ 的线性组合,上面的论文给出了 $R(s)-s$ 和概率模型 $P(s) \approx e^{-E(s)}$ 的能量之间的关系,发现有

在这个解释下,公式(1)的遗漏积分神经计算就和 Langevin MCMC 一致了:

其中对最后一行,我们使用了公式(3) (2),这样就从 $\tilde{s}_t$ 得到了梯度。因此从噪声状态 $\tilde{s}$ 的角度看,我们看到更新方程如下:

这就类似于能量的梯度和“学习率” $\epsilon$ 和 增加的“噪声” $\eta_{t+1}-(1-\epsilon)\eta_t$。

2.3 可能的能量函数

为了修正想法和解释驱动函数 $R$ 对应于能量函数的梯度的可能性,我们给出了下面的能量函数,和 Boltzmann Machine 比较类似,但是有连续的非线性性的保证。

其中 $W_{i,j}$ 表示单元 $j$ 到单元 $i$ 的权重,$\rho$ 是神经非线性性,某种单调的有界函数,输出是 $0$ 和 $1$ 之间的值,对应于一个激发率(firing rate)。使用这个能量函数,驱动函数 $R$ 就是

为了获得这个,我们已经假设了 $W_{i,j} = W_{j,i}$。否则,我们会得到 $R_i(s) = \rho’(s_i)(b_i + \sum_j \frac{1}{2}(W_{i,j} + W_{j,i})\rho(s_j))$,这其实是等价的。

这个公式类似于通常使用的激发率的带权和,除了一个新式的因子 $\rho’(s_i)$,这表示神经元什么时候是饱和的(saturated)(或者关闭或者在最高的激发率出激发),外部输入对这些状态没有影响。在神经更新方程(1)中剩下的项是让状态向 $0$ 导向的那个,将神经元从饱和状态带出来,重新能够反映外界的变化,只要 $\rho(0)$ 不是饱和的。这个想法下面会继续阐述。

2.3.1 不动点特性

特别地,有个有趣的关于状态的动态的不动点现象。

这表示

让我们看看那些单元是饱和的假设,即 $\rho’(s_i) \equiv 0$。这样 $R_i(s) = 0$ 神经更新公式就是

这个会收敛到 $s_i= 0$。如果原始的对应导数明显非零的区域($|\rho’(0)|>0$),我们得到当 $\rho’(s_i)=0$,网络不可能处于不动点状态。否则状态会导向 $0$ 最终不可能处于不动点处。

3. 和反向传播的关联


现在我们开始讲述主要结果,神经计算作为基于能量的模型的推断和预测误差梯度的反向传播。

3.1 扰动的传播

假设网络处于均衡状态——方程(8)的不动点处。能量的平均梯度为 $0$,平均权重更新同样是 $0$。
为了让与监督反向传播的联系更简单一些,我们考虑两种感知单元:输入元 $x$ 和输出元 $y$,所以 $s = (x,y,h)$。假设我们开始将网络设置在不动点处,$x$ 就表示观测输入值。那么我们得到了在不动点 $\hat{s}$ 处的输出 $\hat{y}$ ,其中 $R_y(\hat{s}) = \hat{y}$, $R_h(\hat{s}) = \hat{h}$。等价的,我们有:

自由元(隐藏和输出)已经设置为和输入元一致的值了。

现在假设目标值 y 被观测到,逐步让输出元远离其不动点的值 $\hat{y}$,到 $y$ 处。这是因为输出元同样是遗漏积分神经元,其状态会逐步地根据其输入信号改变,方向则是 $y$ 而非原来的 $R_y(\hat{s})$。我们用

表示 $\hat{y}$ 在从时间步 $0$ 变化到 $1$ 的初始变化(依照公式(1),但是已经用 $y$ 替代 $R_y(\hat{s})$)。这会让全局状态 $\hat{s}$ 导离均衡,到达一个 $\partial E(s)/\partial s$ 非零的区域。初始时,STDP 目标函数(22)的唯一一部分是非零的,因为预测误差:

这对应于预测 $R_y(\hat{s})$ 和目标值 $y$ 之间的差异,或者这样子

因为在均衡处,$R_y(\hat{s}) =\hat{y}$。注意

其中 $\epsilon$ 可以看做是学习率如果我们直接在 $\hat{y}$ 上进行 SGD 的话。现在,网络剩下的部分会对这个外部扰动有什么反应呢?每个隐藏元可能近似地根据 $C$ 的梯度方向进行偏移,但仅仅是这些直接连接在输出层的神经元会最先感受到最小化 $C$ 的压力。这个扰动(对真实的神经元会有以脉冲电强的形式)会流向下一圈的神经元,那些直接和 $h_1$ 相邻的神经元,如此类推。

让我们看看这里的细节。考虑一个典型的 MLP 结构,输出层 $\hat{y}$ 和最顶隐藏层 $\hat{h}_1$,$\hat{h}_1$ 和 $\hat{h}_2$ 等等之间的连接。改变 $\Delta y$ 会传递给 $\hat{h}_1$ 通过神经更新,当我们忽略掉注入噪声的影响,会得到一个 $\hat{h}_1$ 的变化:

如果 $\Delta y$ 很小(根据假设感知元只会逐步地导向目标处),我们可以通过使用 $R$ 在 $\hat{y}$ 处的 Taylor 展开式近似上面的值

在不动点处检查 $R_{h_1}(\hat{s}) = \hat{h}_1$,得到

所以,我们有

为了获得反向传播,尽管不是 而是,因为这对应于一个链式法则的应用,我们有 $\Delta h_1 \approx \partial C/\partial \hat{h}_1$。好的消息是,这个等式是正确的因为 $R$ 是与下面的能量函数关联的函数的一阶导数:

3.2 随机梯度下降权重更新

上面的结果是受 Hinton(2007)的启发,结果也是一致的,时序变化可以编码反向传播的梯度。在层 $k$ 用到的 $\Delta \hat{h}_k$ 转而成为根据预测误差 $||y-\hat{y}||^2$ 的随机梯度下降的权重更新会是什么?因为状态改变 $\dot{s}$ 表示了对应于 $s$ 的预测误差的梯度,SGD 对 $W_{i,j}$ 需要的权重改变 $\Delta W_{i,j}$ 会成比例于后突触神经元状态改变率,$\dot{s}_i$,并和前突触神经元激发率,$\rho(s_j)$ 成比例,也就是对于 $\partial s_i/\partial W_{i,j}$:

这意味着这样的学习规则允许我们根据 STDP 来模拟脉冲时序和突触改变的关系,正如在 Bengio 2015a 中所示。这样产生出一个预测性目标函数的随机梯度步骤

因此,根据这个符合 STDP 学习规则,根据内部绕的权重的改变应该是和反向传播的更新成比例的,因为它对应于 $\Delta h \partial h/ \partial W$。但是,注意到对层 $k$ 的 $\hat{h}_k$ 的 乘子 $\epsilon^{k+2}$,使得初始改变对更远的层变得相当缓慢。这是因为遗漏积分神经元没有时间来对已经产生的信息进行总合,所以实践上她会使用时间常量乘以 $k$ 作为 $\hat{h}_k$ 的改变让效果更加明显,除非我们适当调整学习率。

尽管我们看到提出来的神经动态系统和权重更新和反向传播表现的很近似,不过这里还存在很多的差异,特别是我们考虑到更远步骤之后。但是可能本文传达出来最重要的信息就是,我们知道反向传播在监督学习和非监督学习上的效果特别好。这里,反向传播在推断是无穷小的时候本质上对应于一个变分更新,我们仅仅运行推断的单步对应于能量函数中下降的方向上的一小步移动。

相关工作,贡献和未来工作

本文受到 Hinton 2007 年工作的重要启发,大脑可以通过时序导数来表示激活梯度来实现反向传播,同时还有组合此假说和 STDP 得到突触权重的 SGD。

神经计算的想法对应于一种形式的随机松弛到更低能量配置的老想法(Hinton and Sejnowski,1986)和其 Gibbs 采样技术。更多近期的研究参考(Berkes et al 2011)。而这里和 Boltzmann Machine 不同的是我们考虑了状态空间是连续的(这与期望电势相关,并整合了由脉冲决定的随机效应),而非离散的空间。我们也考虑了非常小的步(能量下降的梯度),这更加类似于 Langevin MCMC,而非允许像 Gibbs 采样那样在给定邻居设定的情形下,每个神经元依照更高的概率随机跳到一个更优状态。

同样还有很多在理论上关于 STDP 讨论的论文,读者可以参考 Markram et al. 2012,但是有关 STDP 和可以用来训练不仅仅一层网络(如 PCA 和 Hebbian 更新那样)还有深层的非监督模型的无监督学习目标函数工作还需要很多努力。很多观点 (Fiete
and Seung, 2006; Rezende and Gerstner, 2014) 依赖于 强化学习观点来估计全局目标函数的梯度(通过关联在每个神经元随机变化和整个目标函数的改变)。尽管这些原理是简单的,还不清楚这会扩展到非常大的网络,因为随神经元个数变化导致估计的方差线性增长。所以尝试寻找其他的途径,我们希望本文提出的基础和能量观点下的变分推断可以形成有用的更加有效的无监督学习的原理,用于那些和 STDP 一致的深层网络。

本工作的实践上的成果就是做出了关于突触权重改变应该随着后突触激发率保持常数时消失的预测,正如在公式(21)中所示。这应该会在实际的生物实验中变得相当有趣。

很多剩下的工作及时获得一种完备的关于跟 STDP 一致的无监督学习概率理论,但是我们相信我们已经给出了一些有趣的元素了。一个方面就是需要关于 STDP 目标函数如何帮助我们拟合感知观测 $x$ 更深发展。如果,根据上面的假设,神经计算是近似的推断(Langevin MCMC)那么在每一个推断的步中,平均看来,就把我们导向了更加可能的 $h$ 的设定。因此每步都是近似导向了 $P(h|x)$ 的能量下降的方向。现在,在一个 EM 或者 变分 EM 场景下,如果 $x$ 固定,我们希望建模的分布,并考虑一个目标——参数应该被更新肯定是$ h ~ P(h|x) $ 和$ x$ 的联合分布,现在称为$ Q(h,x)$ (推断分布),按照下面论文的术语Neal and Hinton (1999); Kingma and
Welling (2014); Bengio et al. (2015b)。通过最小化预测标准如 $J_{STDP}$ 我们推测,模型参数移动的方向让模型和 $Q(h,x)$ 更为一致,这也就能够最大化变分 EM 在数据似然 $P(x)$ 的上界。这个想法就是我们改变推断过程,使得它能够更快地抵达最终状态,这对应于可以很好地拟合观测 $x$ 的 $h$ 的设定。

另一个开放问题就是如何调和因能量函数出现的权重的对称性和实际上 $W_{i,j}$ 和 $W_{j,i}$ 在生物神经元中物理分开的两处的矛盾。有个振奋人心的观测是在自编码器上的早期工作实验性地给出了在前向和反向传播时权重并没有绑定,倾向于收敛到对称的值,在线性场景下,重建误差的最小化也给出了对称的权重(Vincent et al. 2010)。

Acknowledgments

The authors would like to thank Benjamin Scellier, Asja Fischer, Thomas Mesnard, Saizheng Zhang, Yuhuai Wu, Dong-Hyun Lee, Jyri Kivinen, Jorg Bornschein, Roland Memisevic and Tim Lillicrap for feedback and discussions, as well as NSERC, CIFAR, Samsung and Canada Research Chairs for funding.

References

  1. Alain, G. and Bengio, Y. (2013). What regularized autoencoders learn from the data generating distribution. In ICLR’2013. also arXiv report 1211.4246.
  2. Andrieu, C., de Freitas, N., Doucet, A., and Jordan, M. (2003). An introduction to MCMC for machine learning. Machine Learning, 50, 5–43.
  3. Bengio, Y., Mesnard, T., Fischer, A., Zhang, S., and Wu, Y. (2015a). An objective function for stdp. arXiv:1509.05936.
  4. Bengio, Y., Lee, D.-H., Bornschein, J., and Lin, Z. (2015b). Towards biologically plausible deep learning. arXiv:1502.04156.
  5. Berkes, P., Orban, G., Lengyel, M., and Fiser, J. (2011).
    Spontaneous cortical activity reveals hallmarks of an optimal internal model of the environment. Science, 331, 83––87.
  6. Fiete, I. R. and Seung, H. S. (2006). Gradient learning in spiking neural networks by dynamic perturbations of conductances. Physical Review Letters, 97(4).
  7. Friston, K. J. and Stephan, K. E. (2007). Free-energy and the brain. Synthese, 159, 417––458.
  8. Hinton, G. E. (2007). How to do backpropagation in a brain. Invited talk at the NIPS’2007 Deep Learning Workshop.
  9. Hinton, G. E. and Sejnowski, T. J. (1986). Learning and relearning in Boltzmann machines. In D. E. Rumelhart and J. L. McClelland, editors, Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 1: Foundations, pages 282–317. MIT Press, Cambridge, MA.
  10. Kingma, D. P. and Welling, M. (2014). Auto-encoding variational bayes. In Proceedings of the International Conference on Learning Representations (ICLR).
  11. Markram, H., Gerstner, W., and Sj¨ostr¨om, P. (2012). Spiketiming-dependent plasticity: A comprehensive overview. Frontiers in synaptic plasticity, 4(2).
  12. Neal, R. and Hinton, G. (1999). A view of the EM algorithm that justifies incremental, sparse, and other variants. In M. I. Jordan, editor, Learning in Graphical Models. MIT Press, Cambridge, MA.
  13. Rezende, D. J. and Gerstner, W. (2014). Stochastic variational learning in recurrent spiking networks. Frontiers in Computational Neuroscience, 8(38).
  14. Vincent, P. (2011). A connection between score matching and denoising autoencoders. Neural Computation, 23(7).
  15. Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., and Manzagol, P.-A. (2010). Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Machine Learning Res., 11.
  16. Williams, R. J. (1992). Simple statistical gradient following algorithms connectionist reinforcement learning.
    Machine Learning, 8, 229–256.

Elementary-Rl

Q-学习

随机梯度更新规则:

$$\hat{Q}_{\mathrm{opt}}(s,a) \leftarrow \hat{Q}_{\mathrm{opt}}(s,a) - \eta [\color{red}{\underbrace{\hat{Q}_{\mathrm{opt}}(s,a)}_\mathrm{prediction}} - \color{green}{\underbrace{(r + \gamma\hat{V}_{\mathrm{opt}}(s’))}_\mathrm{target}}]$$

这是死记硬背的办法:每个 $\hat{Q}_{\mathrm{opt}}(s,a)$ 有一个不同的值.

$\color{red}{问题就是}$ 无法泛化到未见过的状态/行动上

重新看看 Q-学习算法,从机器学习的角度来思考,你会发现我们仅仅是为每个 $(s,a)$ 对独立地处理了一下,记忆了其 Q-值.

换言之,我们不能进行泛化,而这点确实学习的最为关键的一点!

函数近似

$\color{red}{关键想法:线性回归模型}$
定义特征 $\color{blue}{\phi(s,a)}$ 和权重 $\color{blue}{\mathbf{w}}$:
$\hat{Q}_{\mathrm{opt}}(s,a;\mathbf{w}) = \color{blue}{\mathbf{w} \cdot \phi(s,a)}$

$\color{brown}{例子:过火山问题中的特征}$

  • $\phi_1(s,a) = \mathbf{1}[a=W]$
  • $\phi_2(s,a) = \mathbf{1}[a=E]$
  • $\phi_7(s,a) = \mathbf{1}[s = (5,*)]$
  • $\phi_8(s,a) = \mathbf{1}[s = (*,6)]$
  • 函数近似通过用一个权重向量和一个特征向量来参数化 $\hat{Q}_{\mathrm{opt}}$ 解决了上面的问题,这个其实和线性回归中一样.

  • 特征可以看成是状态-行动 $(s,a)$ 对的属性,可以作为在状态 $s$ 采取行动 $a$ 的质量的衡量.
  • 不过这样做的后果就是所有有着类似的特征的状态会有相似的 Q-值. 例如,假设 $\phi$ 包含特征 $\mathbf{1}[s=(*,4)]$. 如果我们在状态 $(1,4)$,采取行动 $\mathbf{E}$,并获得高奖励,那么采用函数近似的 Q-学习会将这个正的信号传播到在第 4 列的所有位置上采取任何行动.
  • 在这个例子中,我们在行动上定义特征(为了刻画向东移动通常是好的选择)而在状态上定义特征(为刻画第 6 列是最好要避开的事实,而第 5 行是好的移动位置)

$\color{blue}{算法:采用函数近似的 Q-学习}$
对每个 $(s,a,r,s’)$:
$\mathbf{w} \leftarrow \eta[\color{red}{\underbrace{\hat{Q}_{\mathrm{opt}}(s,a;\mathbf{w})}_\mathrm{prediction}} - \color{green}{\underbrace{(r + \gamma\hat{V}_{\mathrm{opt}}(s’))}_\mathrm{target}}] \color{blue}{\phi(s,a)}$

目标函数:
$$\min_\mathbf{w} \sum_{(s,a,r,s’)}\Big(\color{red}{\underbrace{\hat{Q}_{\mathrm{opt}}(s,a;\mathbf{w})}_\mathrm{prediction}} - \color{green}{\underbrace{(r + \gamma\hat{V}_{\mathrm{opt}}(s’))}_\mathrm{target}}\Big)^2$$

  • 现在我们就把线性回归搞成了一个算法. 这里,应用 RL 的随机梯度方法就有效了
  • 我们刚刚写下了最小平方目标函数,接着计算有关 $\mathbf{w}$ 的梯度而不是之前的 $\hat{Q}_{\mathrm{opt}}$. 链式法则处理剩下的事情了.

深度强化学习

定义:使用神经网络近似 $\hat{Q}_{\mathrm{opt}}$
可以玩 Atari 游戏[Google DeepMind, 2013]:

  • 最后 4 帧(图像)=> 3-层网络 => 控制杆
  • $\epsilon$-贪婪,使用 1 百万回放空间训练 1 千万帧的图像
  • 在某些游戏上(Breakout)达到人类水准的性能,在其他游戏上稍弱(space invaders)

  • 最近,由于深度学习的成功,又出现了对强化学习的兴趣. 如果某个人在一个模拟器上执行强化学习方法,那么就可以产生大量的数据,利于神经网络发挥作用.

  • 近期成功的故事来自 DeepMind,他们成功地训练出一个神经网络来表示玩 Atari 游戏的 $\hat{Q}_{\mathrm{opt}}$. 这里令人印象深刻的部分是不需要先验知识:神经网络简单地以原始图像作为输入并输出控制杆的选择.

了解未知世界

  • Epsilon-greedy:平衡探索和开发
  • 函数近似:可以泛化到未见的状态

Summary so far

  • 在线的设定: 在真实世界中学习和行动 learn and take actions in the real world!
  • 探索和开发的平衡 Exploration/exploitation tradeoff
  • 蒙特卡罗法 Monte Carlo: 从数据中估计转移概率、奖励和 Q-值 estimate transitions, rewards, Q-values from data
  • Bootstrapping: 更新依赖于估计而不是原始数据 update towards target that depends on estimate rather than just raw data

Powered by Hexo and Hexo-theme-hiker

Copyright © 2013 - 2017 Universality All Rights Reserved.

UV : | PV :