Recently, there has been increased interest in Artificial Intelligence (AI), both in the mainstream news and in the world of Intellectual Property. From completely self-driving cars that seem to be just around the proverbial if not the actual corner, to the European Patent Office organizing a seminar on patenting AI – the technology, or at least the promise thereof, seems to be everywhere.
With increasing sophistication of AI, comes increasing complexity. The increasing complexity provides challenges to the patent practitioner. How to adequately patent AI related inventions? This article will explore not so much the claiming of AI inventions – much has recently been written on that already – but rather the disclosing of AI inventions, in particular in view of the requirement that a patent application discloses an invention in sufficient detail for a skilled person to work the invention.
Where this article talks about sufficiency of disclosure, the meaning of the term as in Art 83 EPC and related case law is intended. For the US, similar requirements apply, and the remarks made here with respect to sufficiency of disclosure may be translated to the US practice as well.
OVERVIEW OF ARTIFICIAL INTELLIGENCE TECHNOLOGIES
It is important to distinguish between different forms of AI, since the main topic of this article, sufficiency of disclosure, is not equally relevant to every form of AI. We follow the overview of AI in Goodfellow, Bengio and Courville’s “Deep Learning” book (MIT press, 2016), as illustrated in the Venn diagram of Figure 1.
Figure 1: Venn diagram of Artificial Intelligence techniques, from Goodfellow, Bengio, Courville "Deep Learning", MIT Press (2016).
On the outside, a generic AI example is formed by knowledge bases (also known as expert systems), which is essentially a storage of data and a set of rules to draw logical conclusions from this data. Both the data and the rules must be supplied by operators of the AI.
The next level is Machine Learning, in which the AI will use input and output data presented by an operator, and will try to find a rule (for example using logistic regression) which maps the input data to the output data, so that it can start making predictions for input data for which no output data is available. The procedure of finding a rule is typically called “training” and the used data “training data”.
Next in sophistication is “Representation Learning”, which is a specific form of Machine Learning. Compared to the logistic regression example above, the AI now also learns to transform the input data to a form that is better suited for the specific problem at hand.
This is a vital step when the data becomes more unstructured. For example, suppose an AI model should be trained to recognize if a digital image shows a cat or a dog. Without representation learning, a team of specialists would be needed for this task. A biologist to list key physiological differences between cats and dogs, a graphical artist to draw these physiological differences in various aspects (seen from front, back, side, etc), a mathematician to design a means to calculate a degree of matching the drawings to the pixels in the digital image, and a programmer to program the matching algorithm. The contribution of the Machine Learning, devising a rule to resolve the detected match values into a binary cat-or-dog output, would be a relatively minor feat.
Representation learning AI technologies can provide superior results compared to basic Machine Learning. However, this is also where sufficiency of disclosure may become a significant factor. Representation learning comes up with a representation which may not be readily understood, and at least may be difficult to describe.
Finally, Deep Learning is a subset of Representation Learning using a model with a number of layers (the “depth”). A general term is “Multi-Layer Perceptron” (MLP) which essentially means that a number of relatively simple mathematical operations are applied, each operation adding a layer.
A slightly more whimsical (but perhaps illuminating) definition is that older AI technologies such as knowledge bases and logistic regression are typically useful for “things that are hard for humans, but easy for computers” (think of applying predetermined rules to large data sets and least squares optimizations), while newer AI technologies such as representation learning and in particular deep learning are useful for “things that are easy for humans, but hard for computers” (think of pattern recognition, image processing, natural language processing, etc).
The Deep Learning AI technologies have seen dramatic improvements in this century and are the main reason for the current excitement around AI. In the following, we will focus on Deep Learning.
CLAIMING ARTIFICIAL INTELLIGENCE
Much has been written already on this topic by various commentators. A quick summary of the situation for European patent applications will now be given, with a few comments on the situation in the US.
Fundamental AI technology
In a nutshell, the EPO has indicated that the approach it has developed for Computer Implemented Inventions (CII) also applies to the Artificial Intelligence. In effect, this means that an AI enabled invention can be patentable provided that the claimed technical features are inventive (that is, any claimed non-technical features are not considered for inventive step). Any claimed AI related features as such are not considered technical (being mathematical in nature) and are only considered to contribute to an inventive step if they support a technical effect or purpose.
This approach immediately closes the door on the patentability of fundamental AI algorithms (by which is meant an AI algorithm that is not directly coupled to a specific application). While this is certainly understandable in the case of AI technologies involving relatively simple and well-known mathematics (such as logistic regression), it is in the author’s opinion at least questionable whether this treatment is also suitable for the far more complicated, multi-layered models of deep learning, even though every layer by itself is still mathematical in nature.
While patentability of fundamental AI algorithms is effectively ruled out in Europe, in the US the door seems slightly ajar. The “two-prong approach” of the Mayo framework will work on the assumption (“prong one”) that a fundamental AI algorithm as a mathematical concept is an “abstract idea” and thus not eligible for patenting, but (“prong two”) provides a way out in that a claim is eligible if “the claim, as a whole, integrates the recited judicial exception into a practical application of that exception” (source: 2019 Revised Patent Subject Matter Eligibility Guidance).
Applied AI technologies
When AI technology is used for a technical goal, this can in principle be claimed in Europe. However, the question of inventive step will depend largely on the definition of “technical”, which can sometimes be surprising for those not fluent in EPO case law. For example:
- Computer implemented method using AI algorithms for classifying digital images, videos, audio or speech signals > patentable (technical purpose)
- Computer implemented method using AI algorithm for classifying text documents > not patentable (linguistic purpose)
Both examples come from the EPO Guidelines for Examination, Part G, section II 3.3.1 “Artificial intelligence and machine learning". With respect to classifying text documents, the Guidelines cite EPO Board of Appeal case T 1358/09 and remark that “classifying text documents solely in respect of their textual content is however not regarded to be per se a technical purpose but a linguistic one”.
There is a list of purposes that is considered “technical” by the EPO, which can be found in the Guidelines for Examination, G II 3.3). As a general rule, any purpose that is related to one of the exclusions of patentability under Article 52(2) EPC will be considered non-technical. Most notable exclusions are mathematical methods (the reason for excluding fundamental AI technologies), methods for performing mental acts or doing business (this rules out most applications of AI in finance), and presentations of information.
In the US the situation seems slightly different in the Mayo framework. The purpose of the invention does not play such a prominent role as in Europe, but rather the questions whether the claim relates to a practical application of a judicial exception.
Now that we more or less know which type of AI enabled invention may be claimed in Europe or the US, it is time to finally move on to the requirements for disclosure.
DISCLOSING ARTIFICIAL INTELLIGENCE
Basic structure of an AI model
A typical example of a Deep Learning model is a Convolutional Neural Network (CNN). An example CNN is shown below (from: Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, 2013). These types of models are widely used for image analysis, but can also be applied to video analysis, natural language processing, drug discovery, etc.
Figure 2: Diagram indicating layers of a Convolutional Neural Network, from Zeiler and Fergus, "Visualizing and Understanding Convolutional Networks" (2013).
Without going into too much detail, a diagram such as the example in Figure 2 will tell the “skilled person” exactly how many layers the model has, and how each layer should be configured. Programming libraries for AI models have also developed a handy shorthand for defining models, which could be used to disclose the model’s structure in a patent application (see e.g. Keras’ Model API).
In case a claimed AI model would use a component that is not standard (for example, a custom activation layer), this novel component would of course have to be described exactly, either in mathematical form, pseudo code or actual computer code. This, too, should present no major challenge. The same hold for novel optimization schemes or non-standard feedback loops.
Training and trained coefficients
Once the basic model is adequately described, the skilled person still does not have not enough information to implement the model. In other words, the invention is not sufficiently disclosed. What is still needed is at least one of:
- A description of the way of the model is trained, including a reference to the training data; or
- Every learned coefficients/weight of the model
The importance of the above cannot be overstated. The only parts of the structure in a deep learning model (such as the one shown in Figure 2) that are predetermined are the input image at the beginning and the output values at the end. All of the layers between input and output are called “hidden layers”, which already conveys the notion that it is not well known what exactly occurs in these layers after they have been formed during training. In fact, the earlier cited Zeiler and Fergus paper is considered a very important paper in the field for the very reason that it was one of the first to actually investigate and visualize what these layers do in a fully trained model. Figure 3 shows a few examples of the (surprising) specializations that emerged in the trained filters of the layers of Figure 2.
Figure 3: Examples of filter visualizations in a fully trained CNN, showing image features which elicit a large response from the various filters. In layer 1 (left), only basic edge detectors are found (such as might be designed by hand), layer 3 (middle) shows mostly pattern detectors, while in layer 5 (right) very specialized filters that trigger on assorted recurring features of the training set such as dog's faces, bicycle wheels, and bird eyes, have (surprisingly) emerged (from Zeiler and Fergus, "Visualizing and Understanding Convolutional Networks" (2013)).
Regarding option 1, describing the method of training is generally not very hard. The training data can be a mixture of publicly available data (such as ImageNet data) combined with domain specific data (which will typically not be publicly available). It can also consist exclusively of domain specific data.
For the domain specific data, the question arises whether a description of said data suffices (“1000 pictures of cats and 1000 pictures of dogs”), or whether the training data itself must be made available to the public to ensure sufficiency of disclosure. If the latter, how should it be made available? A large library like ImageNet contains millions of images, so it is not a workable proposition to include such a dataset with a patent application. Even smaller data sets used for domain specific training typically include thousands of images.
Regarding option 2, a drawback is that the amount of data to be disclosed in the description can be quite high. For example, in the CNN of Figure 2, layer 1 will consist of 96 x 7 x 7 = 4704 trained values, layer 2 of 256 x 5 x 5 = 6400, and so on. However, it is quite possible to include this data in tables to fully disclose at least one embodiment of a fully trained model.
However, a bigger drawback is the fact that the trained coefficient data is not very useful as a basis for continued research. It is thus imaginable that future case law could decide that such tables do not comply with the “spirit” of the obligation to disclose the invention. After all, the goal of patent publications is to further the global knowledge (in exchange for a temporary monopoly for the applicant), and merely disclosing the trained coefficients might be just enough to enable the skilled person to reproduce a particular embodiment of the invention, but it hardly allows him/her to improve on it (e.g. by tweaking the model structure), since that would require access to the actual training data and training methodology in order to train a modified model.
Based on the above, it is recommended to disclose the method of training and the training data, rather than only the trained model coefficients. The available options for disclosing the structure and the content of the AI model are shown in Figure 4.
Figure 4: Various options for disclosing an AI modela
A drawback of disclosing the training data is that the applicant may have spent a significant effort in meticulously gathering, and in the case of supervised learning, labelling training data. The applicant may not be inclined to make this data set available to the public, reasoning that a competitor could use the very same data set to quickly train a different AI model (carefully selected to avoid infringing the applicant’s claims), and thus gain an unfair competitive advantage. This is possibly one reason that often only a description of the training method is included in the description, while the training data is omitted.
It must be stated that the author is not aware of any present case law which held an AI related patent (application) invalid for lack of disclosure of the training data. However, such case law may well develop in the future and may then adversely affect patent applications being drafted today. Therefore, it is recommended to attempt to “future-proof” AI related patent applications in this respect.
Microorganisms to the rescue?
It is interesting to note that a similar disclosure problem has already been addressed a long time ago in an entirely different field in patenting, namely for inventions involving micro-organisms.
A biotechnology invention might use certain microorganisms to produce a useful substance from basic materials, just like an AI invention might use a trained AI model to make useful predictions based on input data. In the case of microorganisms, there is the similar problem of how to allow the public access to these microorganisms in order to work the invention. This has resulted in the system of biological material deposits, under the Budapest Treaty of 1977. Access to the biological material is given under strict conditions, such as for research purposes.
While not a perfect solution, a similar system for AI training data deposits would give the public access to proprietary data for research, while attempting to safeguard the interests of the applicants who have collected of said data.
CONCLUSION
With the increasing attention that AI inventions are receiving, it is to be expected that new case law will develop in the coming years. With increasing complexity of the AI models, sufficiency of disclosure may no longer be a given.
Certain precautions can already be taken today. At least the structure of an exemplary AI model should be clearly described. In addition, the skilled person should have all the required information needed to either train the model or to set the model’s coefficients with properly trained values. It remains to be seen if the required information for training also includes the used training data.
As the “neural networks” are actually modelled on their biological counterparts, it is perhaps not even surprising if patenting neural networks will ultimately come to inherit certain traits from patenting biotechnology, such as depositing training data similar to depositing biological material.