1 INFERENCE MODEL TRAINING TECHNICAL FIELD [0001] The present invention relates to methods and systems for training inference models that 5 determine one or more parameters of a product of a fabrication process. BACKGROUND [0002] A lithographic apparatus is a machine constructed to apply a desired pattern onto a substrate. A lithographic apparatus can be used, for example, in the manufacture of integrated circuits 10 (ICs). A lithographic apparatus may, for example, project a pattern (also often referred to as “design layout” or “design”) at a patterning device (e.g., a mask) onto a layer of radiation-sensitive material (resist) provided on a substrate (e.g., a wafer). [0003] To project a pattern on a substrate a lithographic apparatus may use electromagnetic radiation. The wavelength of this radiation determines the minimum size of features which can be 15 formed on the substrate. Typical wavelengths currently in use are 365 nm (i-line), 248 nm, 193 nm and 13.5 nm. A lithographic apparatus, which uses extreme ultraviolet (EUV) radiation, having a wavelength within the range 4-20 nm, for example 6.7 nm or 13.5 nm, may be used to form smaller features on a substrate than a lithographic apparatus which uses, for example, radiation with a wavelength of 193 nm. 20 [0004] Low-k1 lithography may be used to process features with dimensions smaller than the classical resolution limit of a lithographic apparatus. In such process, the resolution formula may be expressed as CD = k 1×λ/NA, where λ is the wavelength of radiation employed, NA is the numerical aperture of the projection optics in the lithographic apparatus, CD is the “critical dimension” (generally the smallest feature size printed, but in this case half-pitch) and k 1 is an empirical 25 resolution factor. In general, the smaller k 1 the more difficult it becomes to reproduce the pattern on the substrate that resembles the shape and dimensions planned by a circuit designer in order to achieve particular electrical functionality and performance. [0005] To overcome these difficulties, sophisticated fine-tuning steps may be applied to the lithographic projection apparatus and/or design layout. These include, for example, but not limited to, 30 optimization of NA, customized illumination schemes, use of phase shifting patterning devices, various optimization of the design layout such as optical proximity correction (OPC, sometimes also referred to as “optical and process correction”) in the design layout, or other methods generally defined as “resolution enhancement techniques” (RET). Alternatively, tight control loops for controlling a stability of the lithographic apparatus may be used to improve reproduction of the 35 pattern at low k1. Confidential 2 [0006] In lithographic processes, it is desirable frequently to make measurements of the structures created, e.g., for process control and verification. Various tools for making such measurements are known, including scanning electron microscopes, which are often used to measure critical dimension (CD), and specialized tools to measure overlay, the accuracy of alignment of two 5 layers in a device. Recently, various forms of scatterometers have been developed for use in the lithographic field. [0007] Examples of known scatterometers often rely on provision of dedicated metrology targets. For example, a method may require a target in the form of a simple grating that is large enough that a measurement beam generates a spot that is smaller than the grating (i.e., the grating is underfilled). In 10 so-called reconstruction methods, properties of the grating can be calculated by simulating interaction of scattered radiation with a mathematical model of the target structure. Parameters of the model are adjusted until the simulated interaction produces a diffraction pattern similar to that observed from the real target. [0008] In addition to measurement of feature shapes by reconstruction, diffraction-based overlay 15 can be measured using such apparatus, as described in published patent application US2006066855A1. Diffraction-based overlay metrology using dark-field imaging of the diffraction orders enables overlay measurements on smaller targets. These targets can be smaller than the illumination spot and may be surrounded by product structures on a wafer. Examples of dark field imaging metrology can be found in numerous published patent applications, such as for example 20 US2011102753A1 and US20120044470A. Multiple gratings can be measured in one image, using a composite grating target. The known scatterometers tend to use light in the visible or near-infrared (IR) wave range, which requires the pitch of the grating to be much coarser than the actual product structures whose properties are actually of interest. Such product features may be defined using deep ultraviolet (DUV), extreme ultraviolet (EUV) or X-ray radiation having far shorter wavelengths. 25 Unfortunately, such wavelengths are not normally available or usable for metrology. [0009] On the other hand, the dimensions of modern product structures are so small that they cannot be imaged by optical metrology techniques. Small features include for example those formed by multiple patterning processes, and/or pitch-multiplication. Hence, targets used for high-volume metrology often use features that are much larger than the products whose overlay errors or critical 30 dimensions are the property of interest. The measurement results are only indirectly related to the dimensions of the real product structures, and may be inaccurate because the metrology target does not suffer the same distortions under optical projection in the lithographic apparatus, and/or different processing in other steps of the manufacturing process. While scanning electron microscopy (SEM) is able to resolve these modern product structures directly, SEM is much more time consuming than 35 optical measurements. Moreover, electrons are not able to penetrate through thick process layers, Confidential 3 which makes them less suitable for metrology applications. Other techniques, such as measuring electrical properties using contact pads are also known, but provide only indirect evidence of the true product structure. By decreasing the wavelength of the radiation used during metrology it is possible to resolve smaller structures, to increase sensitivity to structural variations of the structures and/or 5 penetrate further into the product structures. One such method of generating suitably high frequency radiation (e.g. hard X-ray, soft X-ray and/or EUV radiation) may be using a pump radiation (e.g., infrared IR radiation) to excite a generating medium, thereby generating an emitted radiation, optionally a high harmonic generation comprising high frequency radiation. 10 SUMMARY [00010] According to an aspect of the invention, there is provided a method of training an inference model to determine one or more parameters of a product of a fabrication process from measurements of the product. The method comprises obtaining a dataset of measurements of one or more products of the fabrication process. Each of the measurements comprises an array of values 15 obtained by measuring a corresponding one of the products. The method further comprises selecting a proper subset of the dataset for use in training the inference model. The subset is selected by applying an optimisation procedure to an objective function providing a measure of differences between each measurement in the dataset and corresponding reproduced values of the measurements obtained using a reproduction function having a domain comprising the measurements in the subset and excluding 20 the measurements not in the subset. The inference model is trained using (only) the proper subset of the dataset of measurements; that is, the portion of the dataset of measurements which is not included in the proper subset, is not used in training the inference model. [00011] This has the advantage that the training of the interference model is easier, since the number of measurements provided as input to the training model may be smaller than if the entire 25 dataset were used. Experimentally, it has been found that interference results can be obtained which are almost as accurate as using the entire dataset of measurements in the training procedure, but with much reduced computational cost. [00012] The reproduction...