HOLISMOKES -- II. Identifying galaxy-scale strong gravitational lenses in Pan-STARRS using convolutional neural networks

We present a systematic search for wide-separation (Einstein radius>1.5"), galaxy-scale strong lenses in the 30 000 sq.deg of the Pan-STARRS 3pi survey on the Northern sky. With long time delays of a few days to weeks, such systems are particularly well suited for catching strongly lensed supernovae with spatially-resolved multiple images and open new perspectives on early-phase supernova spectroscopy and cosmography. We produce a set of realistic simulations by painting lensed COSMOS sources on Pan-STARRS image cutouts of lens luminous red galaxies with known redshift and velocity dispersion from SDSS. First of all, we compute the photometry of mock lenses in gri bands and apply a simple catalog-level neural network to identify a sample of 1050207 galaxies with similar colors and magnitudes as the mocks. Secondly, we train a convolutional neural network (CNN) on Pan-STARRS gri image cutouts to classify this sample and obtain sets of 105760 and 12382 lens candidates with scores pCNN>0.5 and>0.9, respectively. Extensive tests show that CNN performances rely heavily on the design of lens simulations and choice of negative examples for training, but little on the network architecture. Finally, we visually inspect all galaxies with pCNN>0.9 to assemble a final set of 330 high-quality newly-discovered lens candidates while recovering 23 published systems. For a subset, SDSS spectroscopy on the lens central regions proves our method correctly identifies lens LRGs at z~0.1-0.7. Five spectra also show robust signatures of high-redshift background sources and Pan-STARRS imaging confirms one of them as a quadruply-imaged red source at z_s = 1.185 strongly lensed by a foreground LRG at z_d = 0.3155. In the future, we expect that the efficient and automated two-step classification method presented in this paper will be applicable to the deeper gri stacks from the LSST with minor adjustments.


Introduction
Strongly lensed systems with time-variable sources provide competitive probes of the Hubble constant H 0 that are independent from CMB observations (Planck Collaboration 2018) and the local distance ladder (Riess et al. 2019;Freedman et al. 2019Freedman et al. , 2020, and allow one to assess the significance of the current H 0 tension. The H0LiCOW and COSMOGRAIL projects (e.g., Suyu et al. 2017;Courbin et al. 2018) have recently established the capacity of combining time-delay measurements and robust strong lensing models to constrain H 0 , and measured H 0 = 73.3 +1.7 −1.8 km s −1 Mpc −1 in flat ΛCDM cosmology using six lensed quasars (Wong et al. 2019). The seventh lens has been analysed by the STRIDES collaboration (Shajib et al. 2020;Buckley-Geer et al. 2020), and a detailed study of systematic effects is presented by Millon et al. (2019) as part of the TDCOSMO organisation. Moreover, the first two strongly lensed supernovae (SNe) with spatially-resolved multiple images have been detected in recent years, one core-collapse SN behind the strong lensing cluster MACS J1149.5+222.3 (SN Refsdal, Kelly et al. 2015), and one type Ia SN behind an isolated lens galaxy (iPTF16geu, Goobar et al. 2017). These findings open a promising window for future H 0 measurements with lensed SNe. Such systems are indeed well suited for time-delay measurements given the smooth, non-erratic SNe light curves which require shorter high-cadence monitoring than lensed quasars, and the possibility to reduce microlensing ef-fects by focusing on the early, achromatic expansion phase a few weeks after explosion, and by using color light curves Huber et al. 2019;Bonvin et al. 2019;Goldstein et al. 2018, Huber et al., 2020). For lensed type Ia SNe, the standardizable intrinsic peak luminosity of the source is also a valuable input for breaking the mass-sheet degeneracy in lens mass models (Falco et al. 1985). Constraints on H 0 have already been derived with SN Refsdal (Grillo et al. 2018(Grillo et al. , 2020 and illustrate the great potential of such measurements, in particular for galaxy-scale strong lens systems that have simpler lens mass distributions than galaxy clusters. Lensed SNe with adequate image separations providing time delays of a few days to weeks are particularly promising and are relatively less sensitive to microlensing effects Huber et al. 2019).
Besides precise measurements of the Hubble constant, strongly lensed SNe open an interesting window on early-phase SN studies. Multiply-imaged SNe detected from the first image can be combined with strong lensing models to predict the time delays and future SN reappearance (as was done for SN Refsdal) in order to trigger follow-up observations within a few days of explosion, which is currently not feasible for unlensed SNe beyond the local universe due to their late discovery near peak luminosity. Such early-phase studies are particularly valuable to tackle the progenitor problem of type Ia SNe and disentangle the single-degenerate (Whelan & Iben 1973), double-degenerate (Tutukov & Yungelson 1981), and additional scenarios that have been extensively debated over the last decades. For core-collapse SNe, these observations are important to characterize the progenitor properties and compare with current stellar evolution models. Early-phase spectroscopy of type II SNe would yield novel constraints on the mass-loss history just before explosion.
We recently initiated the Highly Optimised Lensing Investigations of Supernovae, Microlensing Objects, and Kinematics of Ellipticals and Spirals (HOLISMOKES, Suyu et al. 2020) programme to address these fundamental questions on stellar physics and cosmology. The number of strongly lensed SNe is expected to grow over the next few years, thanks to the on-going Zwicky Transient Facility (ZTF, Masci et al. 2019) high-cadence survey on the northern hemisphere and the forthcoming Rubin Observatory Legacy Survey of Space and Time (LSST, Ivezić et al. 2019) on the south. Oguri & Marshall (2010) predict 45 strongly lensed type Ia SNe over the 10 years of LSST, which corresponds to a few events for the shallower ZTF survey. This assumes a selection from spatially-resolved multiple images targeting the most useful wide-separation systems. Using complementary selection techniques solely based on magnification of SN Ia light curves, Goldstein & Nugent (2017) predict 10 to 20 times more lensed SN Ia albeit mostly with small image separations (see also Wojtak et al. 2019). Importantly, new lensed SNe candidates have to be selected early enough to start the follow-up sequence in a timely manner. One way is to extend the numerous, successful searches of galaxy-scale strong lenses that were traditionally conducted on surveys with optimal image quality (e.g. More et al. 2016;Sonnenfeld et al. 2018), to surveys with largest sky coverage, in order to quickly identify transients matching the position of background lensed sources. Ultimately, these lens finding pipelines will be directly applicable to the deep LSST stacks which are expected to yield approximately a hundred thousand new systems (Collett 2015).
Galaxy-scale strong gravitational lenses without timevariable sources also provide valuable insights into the lens total mass distributions, including the inner dark-matter fractions (e.g., Gavazzi et al. 2007;Grillo et al. 2009;Sonnenfeld et al. 2015;Schuldt et al. 2019), the slopes of the total and dark-matter mass density profiles (e.g., Treu & Koopmans 2002;Koopmans et al. 2009; Barnabè et al. 2011;Shu et al. 2015), and the spatial extent of dark-matter halos (e.g., Halkola et al. 2007;Suyu & Halkola 2010). Such systems play a crucial role in characterizing the lens stellar initial mass function (IMF), a major ingredient for stellar mass estimates, as a function of galaxy physical properties (e.g., Cañameras et al. 2017b; Barnabè et al. 2013;Sonnenfeld et al. 2019), and they are well-suited to search for dark-matter substructures (e.g., Vegetti et al. 2012;Hezaveh et al. 2016;Ritondale et al. 2019). Moreover, high magnification factors provide unique diagnostics on the local interstellar medium physical conditions in lensed high-redshift galaxies and on the local feedback mechanisms driving their evolution (e.g., Danielson et al. 2011;Cañameras et al. 2017a;Cava et al. 2018).
Strong lensing events are rare, about 1 in 1000 for highresolution space-based imaging (e.g., Marshall et al. 2009) and down to about 1 in 10 5 for seeing-limited ground-based data (e.g., Jacobs et al. 2019a), and their identification thus requires dedicated and automated methods. For instance, arcfinder algorithms (e.g., Gavazzi et al. 2014;Sonnenfeld et al. 2018) and citizen-science classification projects (Space Warps, Marshall et al. 2016) have been developed over the last decade. In particular, convolutional neural networks (CNNs) are supervised machine-learning algorithms optimized to image analysis (LeCun et al. 1998) that have proven to outperform other classification techniques and that require little pre-processing. They are very efficient to peer into large imaging data sets and have been increasingly used in the field of astronomy over the last five years. These studies have established the ability of CNNs in recognizing galaxy morphologies (Dieleman et al. 2015), including the key features of strong gravitational lenses (Metcalf et al. 2019). Several CNN searches for new strong lens candidates have focused on ground-based imaging data, from the CFHTLS (Jacobs et al. 2017), KiDS DR3 (Petrillo et al. 2017) and DR4 (Petrillo et al. 2019;Li et al. 2020), DES Year 3 (Jacobs et al. 2019b,a), or the DESI DECam Legacy survey (Huang et al. 2019). Efficient classification pipelines using deep neural networks have also been developed and tested on simulated Euclid and LSST images to prepare for these forthcoming surveys which will tremendously increase the number of detectable strong lensing systems (Lanusse et al. 2018;Schaefer et al. 2018;Davies et al. 2019;Cheng et al. 2019;Avestruz et al. 2019).
In the meantime, no systematic searches of galaxy-galaxy strong lenses have so far taken advantage of the Pan-STARRS imaging covering the entire Northern sky. With this survey, Berghea et al. (2017) identified a strongly lensed QSO forming a quadruple system by cross-matching the position of a variable AGN selected in the mid-infrared. More recently, Rusu et al. (2019) performed a more systematic search of lensed QSOs by applying color/magnitude cuts and visually inspecting Pan-STARRS image cutouts of AGN candidates from the Wide-field Infrared Survey Explorer (Secrest et al. 2015). In this paper, we perform a comprehensive search for galaxy-scale strong lensing systems with luminous red galaxies (LRGs) as deflectors and typical high-redshift galaxies as background sources, using the extended footprint of nearly 30 000 deg 2 of the Pan-STARRS 3π survey on the Northern sky. The automated pipeline, based on a catalog-level pre-selection of galaxies and a convolutional neural network, results in a ranked list of candidates which we further inspect visually to select those with higher confidence for spectroscopic follow-up.
The outline of the paper is as follows. In Sect. 2 and 3, we give a short overview of the Pan-STARRS surveys and the overall search methodology and, in Sect. 4, we describe the simulation of strong lenses. In Sect. 5, we present the networks and training processes, and we extensively test the CNN performance. In Sect. 6 we finally apply the CNN to pre-selected Pan-STARRS image cutouts, provide the list of strong lens candidates from visual inspection, and characterize their overall properties. Our main conclusions appear in Sect. 7. Throughout this work, we adopt the flat concordant ΛCDM cosmology with Ω M = 0.308, and Ω Λ = 1−Ω M (Planck Collaboration XIII 2016), and with H 0 = 72 km s −1 Mpc −1 ).

The Pan-STARRS1 survey
The Pan-STARRS1 (PS1) surveys were conducted with a 7 deg 2 field-of-view camera mounted on a 1.8 m telescope near the Haleakala summit, Hawaii, in the five broadband grizy SDSSlike filters . The camera has a pixel size of 0.258 pix −1 (Tonry & Onaka 2009). PS1 includes both the 3π and Medium Deep surveys. The former was completed in 2014 and made publicly available in DR1 and DR2. It covers 30 000 deg 2 on the Northern sky down to -30 deg in the five grizy filters with a depth of 21-23 mag. The median seeing FWHM is 1.31 , 1.19 , and 1.11 in g, r, and i bands, respectively, but reaches >1.60 , >1.45 , and >1.35 over 20% of the footprint . The Medium Deep survey consists of 10 fields covering a total of 70 deg 2 , with multiple visits in five filters optimized for transient detection. The few hundred exposures will eventually provide deep stacks with 5σ point source detection limits down to i ∼ 26.0 mag and will be available in the next data release (DR3).
We performed our search on the entire Northern sky with the 3π survey that overlaps nicely with on-going optical timedomain surveys on the Northern hemisphere (e.g., ZTF, Masci et al. 2019) which provide a wealth of astronomical transients, including strongly-lensed SNe. PS1 extends the SDSS to lower declinations and achieves higher depth. In particular, we apply our pipeline to gri stack images from DR1 which provide the optimal coaddition of individual exposures and have higher 5σ point-source sensitivities than z and y, probing down to 23.3, 23.2, and 23.1 mag in g, r, and i, respectively . These three bands conveniently span a wavelength range sensitive to young stellar populations in blue star-forming galaxies at z 1 and to more dust-obscured or evolved early-type galaxies. Although the overall survey strategy provided limited variations in depth and image quality over large scales, 3π images have non uniform coverage on small scales due to the stacking process. We found that limiting magnitudes vary by up to ∼0.5 mag on PS1 cutouts over the extragalactic sky and accounted for this effect in our analysis.

Overview of the lens-search method
In this paper, we aim at identifying galaxy-scale strong lensing systems on the extragalactic sky covered by PS1. We focus our search on typical high-redshift galaxies strongly lensed by massive LRGs, which have a higher lensing cross-section (Turner et al. 1984) and smooth light profiles that help separate the foreground and background emissions. In particular, given the long standing difficulty in distinguishing strong lensing features from arms of low redshift spirals, lenticular galaxies, tidal tails and other contaminants with arc-like features (e.g., Huang et al. 2019;Jacobs et al. 2019a), restricting to lens LRGs increases our chance of robustly identifying multiple lensed images with the > 1 average PSF FWHM of PS1.
Selecting these rare systems on the entire Northern sky requires an efficient analysis of the properties of the 3 billion sources detected in the PS1 3π survey image stacks. To circumvent memory limitations, a number of CNN searches in the literature have focused on subsets of galaxies with LRGlike photometry using, for instance, the Baryon Oscillation Spectroscopic Survey (BOSS) sample (Schlegel et al. 2009;Dawson et al. 2013) or dedicated color/magnitude gri cuts adapted from Eisenstein et al. (2001). However, this approach requires low contamination from lensed images to the lens multiband photometry and was essentially applied to deeper surveys with better image quality (sub-arsec PSF FWHM in optical bands) than Pan-STARRS, such as the Hyper Suprime-Cam Subaru Strategic Program (HSC-SSP, Aihara et al. 2018;Sonnenfeld et al. 2018) or the Kilo-Degree Survey with OmegaCAM on the VLT Survey Telescope (KiDS, de Jong et al. 2013;Petrillo et al. 2019). Due to the lower image quality, applying such simple cuts on the photometry tabulated in Pan-STARRS DR2 catalogs would exclude significant fractions of interesting systems with strongly lensed arcs blended with the lens and altering its photometry. We therefore adopt a two-step approach: (1) a catalog-based neural network classification of source photometry, (2) a CNN trained on gri image cutouts.
In addition, we aim at finding wide-separation lens systems because these configurations provide longer time delays of a few days to weeks between multiple images, which is crucial to measure accurate, microlensing-free time delays for cosmology (Huber et al. 2019). Recently, the extensive follow-up of the lensed Type Ia SN iPTF16geu at z = 0.4 with θ E ∼ 0.3 (Goobar et al. 2017) illustrated the difficulty in reaching the time-delay precision required for cosmography on small-separation systems (∆t < 2 days, Dhawan et al. 2020), and demonstrated the impact of microlensing Yahalomi et al. 2017;Bonvin et al. 2019). Focusing on wide separations is an effective search strategy for the Pan-STARRS survey with limited angular resolution, and will help triggering timely imaging and spectroscopic follow-up.
As further described in Sect. 5, CNNs capture image characteristics by learning the coefficients of convolutional filters (kernels) of given width and height and creating a range of feature maps. They are invariant to translation and rotation. During the learning phase, the CNNs rely on training sets with representative labelled images to minimize the difference between predictions and ground truth. Classification algorithms require training sets of a few 10 4 to a million of labelled images depending on the number of classes, image complexity and network depth. In contrast to the recent computer vision image recognition challenges using deep CNNs (e.g., Russakovsky et al. 2015), relatively modest training sets of few 10 5 examples are sufficient for our two-class problem applied to small, galaxy-scale image cutouts (e.g., Jacobs et al. 2017Jacobs et al. , 2019a. Nonetheless, the small number and heterogeneous properties of spectroscopically-confirmed strong lenses (see the MasterLens database) make it necessary to use simulated systems.
Generating realistic mocks that account for the complexity of PS1 stack images is a critical ingredient to reach optimal classification performances (e.g., Lanusse et al. 2018). We construct our mocks by painting lensed arcs on PS1 gri images of LRGs with known redshift and velocity dispersion from SDSS spectroscopy. This approach captures the 3π survey properties, such as background artifacts, the presence of line-of-sight neighbouring galaxies, and local variations of seeing FWHM, exposure time and noise levels, while also accounting for variations in individual bands. In contrast to fully-simulated images, using real cutouts also guarantees positive examples that best mimic the small scale background properties inherited from the complex masking and stacking of individual PS1 exposures. As background sources, we use representative high-redshift galaxies from the COSMOS field. High S/N image cutouts are taken from HSC-SSP and Hubble Space Telescope (HST) to simulate lens distortions and magnifications, and we use gri bands (similar filter set as PS1) to provide color information.
In Sect. 4, we present the selection of lens and source galaxies and the pipeline to produce a set of mocks. Our first cataloglevel network, described in Sect. 5.2, is trained on the multiband photometry of mocks and non-lens systems from the PS1 catalog, and assigns an output score, p cat , ranging between 0 and 1. A much lower fraction of sources with p cat > 0.5 are then classified with the CNN presented in Sect. 5.3, resulting in the final score, p CNN . We eventually examine visually all sources with highest scores to assign grades and collect a list of high-confidence candidates for future validation.

Selection of lens galaxies
Realistic strong lensing simulations require knowledge on the lens mass distribution and redshift. We therefore draw our sample of lens LRGs from the SDSS spectroscopic samples with reliable velocity dispersion measurements, to have a proxy of the lens total mass. We use the SDSS large scale structure catalogs of galaxies and QSOs for cosmological studies, including LOWZ and CMASS samples for BOSS (from SDSS DR12), and the higher-redshift LRG catalog for eBOSS (from SDSS DR14, Bautista et al. 2018). QSOs are excluded using SDSS class flag. This results in a broad sample of LRGs selected for their redder rest-frame colors using gri color/magnitude cuts (Eisenstein et al. 2001). The sample is volume limited up to z ∼ 0.4 (LOWZ), with additionally more luminous LRGs in the range 0.4 < z < 0.7 (CMASS). eBOSS LRGs lie at higher redshift (z med ∼ 0.7, Prakash et al. 2016) due to a combination of optical/mid-infrared cuts in SDSS and WISE bands.
We clean this spectroscopic catalog to keep LRGs with reliable velocity dispersions, using v disp ≤ 500 km s −1 and v disp,err ≤ 100 km s −1 , and obtain 1 192 472 LRGs to build the mocks. We then cross-match with the PS1 catalog to obtain their photometry, image depth and seeing FWHM in PS1.

Selection of background sources
The sample of galaxies used to mock up high redshift lensed sources is drawn from the COSMOS field to take advantage of the wealth of existing data including ultra-deep optical imaging, multiband photometry, spectroscopic follow-up, and morphological classification. We select galaxies with morphological information from Galaxy Zoo: HST  and within the COSMOS2015 photometric catalog (Laigle et al. 2016) that also lists physical parameters from SED fitting. The former is a citizen science project that extends the original Galaxy Zoo (Lintott et al. 2008(Lintott et al. , 2011Willett et al. 2013) with a thorough visual classification of galaxies with ACS imaging from the Hubble legacy surveys (see Scoville et al. 2007;Koekemoer et al. 2007, for the COSMOS field). In particular for COSMOS, Galaxy Zoo: HST relies on 3-color images obtained by combining the HST F814W mosaic with color gradients from ground-based imaging 1 .
We clean the resulting catalog from sources identified as stars or artifacts (COSMOS2015 flag or visual identification), and remove very extended galaxies with R eff > 1.5 , as well as galaxies contaminated by emission from companions within 5 , and brighter by 1 mag in r band (Laigle et al. 2016). The output sample includes 52 696 galaxies for the strong lensing simulations. Redshifts are taken from public spectroscopic redshift catalogs drawn from surveys with VLT/VIMOS (zCOSMOSbright, Lilly et al. 2007), VLT/FORS2 (Comparat et al. 2015), Subaru/FMOS (Silverman et al. 2015), VLT/VIMOS (VUDS, Le Fèvre et al. 2015;Tasca et al. 2017), Keck/DEIMOS (Hasinger et al. 2018), or the best photometric redshift estimate from Laigle et al. (2016) for galaxies without z spec available.
For the purpose of using this pipeline in future lensed SN searches, the properties of COSMOS sources are compared with expectations for high-redshift SN hosts. Firstly, the cleaned catalog has a redshift distribution peaking at z ∼ 0.8 with a tail extending to z 1.5, akin to the mock lensed SN catalog of Oguri & Marshall (2010), both for LSST-like imaging or for current, shallower surveys probing down to R ∼ 20-21 (e.g., ZTF, Masci et al. 2019). Secondly, the morphologies and star formation activities can be put in context with properties of SN hosts constrained in the local universe. Using a compilation of >3000 SN and host properties from SDSS-DR8, Hakobyan et al. (2012) show that ∼13% of SNe of all types at z 0.1 explode in galaxies with elliptical or lenticular morphologies, typical of earlytype hosts. We augment the quiescent vs. star-forming classes of Laigle et al. (2016) based on redshift-dependent NUV − r/r − J cuts, with Galaxy Zoo morphologies to conclude that our sample is strongly dominated by star-forming galaxies, with ∼15% 1 Galaxy Zoo: CANDELS (Simmons et al. 2017) performs similar classifications using deeper, multiband HST images but only over a subset of the COSMOS field (Grogin et al. 2011;Koekemoer et al. 2011) and strongly restricts the sample size.
classified as quiescent. This shows that our sample of sources broadly matches the expected properties of SN hosts.

Downloading and processing image cutouts
For the lenses, PS1 gri image cutouts of 20 × 20 are downloaded from the PS1 cutout service 2 . We characterized the image depth in individual cutouts with SExtractor (Bertin & Arnouts 1996) and verified that the observing strategy leads to nearly uniform depth. Although it depends on several observing factors, the depth is weakly correlated with the number of individual warp exposures used in a given stack. In particular, it rapidly drops by 0.2-0.3 mag for the 10% of cutouts obtained by coadding less than 8, 10, and 12 frames in g, r, and i bands, respectively. A small fraction of <5% of these PS1 images were discarded from the analysis.
Multiband images of COSMOS galaxies used as lensed sources are taken from the first data release of the HSC SSP (Aihara et al. 2018). The HSC ultra-deep stacks are providing the deepest optical exposures with best image quality over the 2 deg 2 of COSMOS and are well suited for the simulation pipeline. The 5σ point-source sensitivities are 27.8, 27.7, 27.6 mag, in g, r, i, respectively, and seeing conditions are excellent, with median values of 0.92 , 0.57 , and 0.63 in g, r, and i bands, and negligible variations over the COSMOS field (Tanaka et al. 2017).
We download gri cutouts of 10 × 10 , sufficient to enclose all emission from galaxies with R eff < 1.5 . Fainter companions within a few arcsec are masked using segmentation maps created in r band with SExtractor (using relatively few deblending subthresholds, Bertin & Arnouts 1996) to isolate the central galaxy of interest. To overcome the limited spatial resolution of groundbased images, we combine these frames with the HST F814W high-resolution images over the COSMOS field (see Leauthaud et al. 2007;Scoville et al. 2007;Koekemoer et al. 2007) to produce pseudo color images following the steps described in Griffith et al. (2012). First of all, F814W images are aligned and rescaled as if observed in HSC i band, and masked HSC frames are resampled with SWarp (Bertin et al. 2002) to the HST scaling of 0.03 pix −1 using nearest-neighbour interpolation. Secondly, we multiply each resampled frame by an illumination map, defined as F814W divided by HSC i band. This process preserves HSC source photometry, and results in gri images with highresolution light profiles and colour gradients with seeing-limited resolution.

Strong lensing simulations
Due to the limited angular resolution of Pan-STARRS, we focus on the search for wide-separation lens systems with bright arcs that can be easily recognized by eye. We impose a lower limit on the Einstein radius θ E of mocks of 1.5 , larger than the median FWHM of PS1 seeing in gri bands. This ensures that individual counter-images are well deblended from each other in the mocks (albeit often blended with the lens). Each lens deflector drawn from the LRG catalog is cross-matched with a random COSMOS source at z source > z deflector , rejecting pairs with θ E < 1.5 3 , and repeating the process iteratively to obtain 90 000 lens+source pairs. Focusing on larger Einstein radii amounts to selecting LRGs in the high-mass range, with v disp ∼ 230-400 km s −1 , and redshifts z d ∼ 0.1-0.7 representative of the in-put BOSS sample, and sources in the redshift range z s ∼ 0.5-3.0. The pairs mainly cover the θ E range of 1.5-3.0 , which is dominated by galaxy-scale dark-matter lens halos on the highend of the mass distribution (Oguri 2006), while group-scale lenses contribute predominantly for image separations above 3 . θ E values are not uniformly distributed but drop by a factor 100 from 1.5 to 3.0 , akin to real galaxy-scale lenses, implying that our CNN will be predominantly exposed to mock systems with θ E ∼ 1.5 .
For each pair, mock images are created with the simulation pipeline described in Schuldt et al. (in prep.). In short, the lens potential is modeled with a Singular Isothermal Ellipsoid (SIE) profile, based on the known v disp and z d , and using the centroid, axis ratio, and position angle from the i-band light distribution, with random perturbations typical of SLACS lenses (Bolton et al. 2008). The combined HSC+F814W cutouts of COSMOS sources are randomly positioned in the source plane, over regions next to the caustics corresponding to magnifications µ ≥ 5. The sources are then lensed onto the image plane with the GLEE software (Suyu & Halkola 2010;Suyu et al. 2012). The resulting frames are convolved with the Pan-STARRS PSF model described below, resampled and rescaled using Pan-STARRS zeropoints, and eventually coadded with the lens LRG cutouts to obtain the final mock image. The process is repeated for gri bands. In order to produce a set of mocks with systematically bright lensing features, we artificially boost the lensed source brightness by one magnitude in all bands. Systems with lensed sources having S/N < 10 in i band 4 are placed iteratively closer to the caustics to meet this S/N threshold, or completely discarded.
The Pan-STARRS analysis pipeline computes PSF models at individual positions of stack images over a grid of about 8 steps, and interpolates these models to predict the PSF FWHM across the sky ). This introduces deviations between the modeled and true PSF, since the later varies on very small scales due to stacking of individual exposures with variable FWHM. However, we found that these deviations are usually within 10% when comparing the tabulated FWHMs with those measured on isolated, unsaturated stars from GSC-DR2 (Lasker et al. 2008). We therefore create a library of gri PSF models in steps of 0.05 FWHM, by stacking PS1 postage stamps of 9 to 11 stars with adequate PSF FWHM. For each mock, lensed arcs are convolved with the PSF model corresponding to the PSF FWHM listed on PS1 tables at the position of the lens LRG.
We generate a total of 90 000 mock lens systems (see Fig. 1). Lens LRGs selected multiple times are rotated by kπ/2 and used only once for a given orientation, with different lensed arc configurations, so the networks never get the exact same image several times as input. The mocks cover realistic source colors and lensing configurations, including quads, near-complete Einstein rings, fold and cusp arcs, and doubles (see Schuldt et al., in prep.). Constraining the source plane positions to large magnifications likely biases our set to lower fractions of doubles than in blind samples of real lenses.

Photometry of mock lens systems
Focusing the CNN lens search on a subset of the 3 billion sources detected in the PS1 3π survey requires a pre-selection of sources based on their properties released in public catalogs. For this purpose, we compute the photometry of mock lensed systems in the same way as the PS1 image processing pipeline (Magnier Fig. 2. Aperture magnitudes and colors of galaxies in the lens LRG catalog (red regions) and in the set of 90 000 mock lens systems (blue regions). The red and blue dots show the median of distributions. Left: (g − i) vs. i diagram for the R3 aperture, a circular aperture of 1.04 radius. Right: Difference in (g − i) color between the inner R3 aperture and concentric annuli between R3 and R4 (1.76 outer radius), and between R4 and R5 (3.00 outer radius). Grey regions mark the position of 100 000 random sources with reliable gri aperture photometry selected from the PS1 DR2 catalogs. Orange regions show galaxies with p cat > 0.5 on the catalog-level network, which match the colors and magnitudes of mocks lens systems. Solid and dashed lines show the 0.5, 1, and 2σ contours. et al. 2016). Fixed aperture photometry is particularly important to measure reliable colors. We derive the integrated magnitudes of our mocks within the four smaller PS1 circular apertures of 1.04 (R3), 1.76 (R4), 3.00 (R5), and 4.64 (R6) radii, which are best suited to capture color gradients due to the presence of lensed arcs at angular separations of 1.5-3.0 with respect to the lens center. The two largest of these apertures are also expected to be relatively good proxies of the integrated magnitudes of mocks. We use SExtractor in dual-image mode, with a 3σ detection threshold in r-band, and assuming an ideal sky subtraction in PS1 deep stacks (see details in Waters et al. 2016). To compute the aperture magnitudes of a given mock, the image zero-points are taken from the PS1 catalog of stack detections at the position of the LRG used to produce this mock.
The method is tested on LRG-only images by comparing SExtractor estimates with those from the PS1 catalog, for standard stacks. For the four apertures, fitting the distributions of magnitude offsets between these two estimates with Gaussian functions leads to µ = 0.00-0.02 and σ = 0.02-0.05 in g band, µ = 0.00-0.01 and σ = 0.01-0.02 in r band, µ = 0.00 and σ = 0.01-0.02 in i band, which proves the overall robustness of our photometry. Unsurprisingly, the scatter only rises above these average values for the large aperture magnitudes of fainter objects, which mostly enclose background noise. These residual biases up to 0.1-0.2 mag for 22.0 mag likely indicate small differences in the local background subtraction between both methods, or the contamination from neighbours which are subtracted by the PS1 pipeline  but not with SExtractor. Nonetheless, these offsets remain minor compared to other uncertainties in the analysis. On the contrary, Kron, Petrosian and Sersic photometry are discarded due to systematic biases with respect to values tabulated in PS1 catalogs.

Systematic search of strong lenses
The next sections describe the steps followed for the generic lens search on the full Pan-STARRS 3π survey.

Pre-selecting Pan-STARRS detections
As shown in Fig. 2, the 90 000 mock lens systems have globally bluer colors than the LRG sample due to the relative color of lensed arcs, and they are brighter than ∼70% of sources detected in the PS1 stack images. This population of fainter and bluer galaxies can be excluded from the analysis. We use simple color-magnitude cuts in the (g − i) vs. i, (g − r) vs. r, and (r − i) vs. i diagrams for the R3, R4, R5, R6 circular apertures to rule out regions in these diagrams that are not representative of the mocks (i.e. not colored in blue in Fig. 2). These cuts are conservative and include 96% of the mocks, according to the aperture magnitudes obtained in Sect. 4.5, while excluding ∼84% of PS1 sources. They are applied to the complete catalog of stack detections from PS1 DR2 using the PanSTARRS1 Catalog Archive Server Jobs System 5 , with detectionFlags3 > 65536 to remove multiple entries and select detections from the optimal stack image (Flewelling et al. 2016).
The stars are removed from the resulting sample using the r-band cuts from Farrow et al. (2014): This selection conservatively includes 98% galaxies, those discarded being mainly at the faint end (r 21) and with higher magnitudes than our mocks. While these cuts misidentify saturated stars with r 14 as galaxies (Farrow et al. 2014), such bright sources have already been excluded from the analysis.
Regions with elevated Galactic dust extinction are removed as strong reddening could alter our selection on the catalog level. The interstellar dust reddening 2D map of Schlegel et al. (1998) are loaded with the dustmaps python interface from M. Green (2018). After converting to PS1 bandpasses using coefficients in Table 6 of Schlafly & Finkbeiner (2011), we apply a reddening threshold of E(g − i) < 0.3. These steps result in a catalog of 23.1 million galaxies for classification.

Applying machine learning to the Pan-STARRS photometry
Limitations due to download speed of PS1 cutouts from the archive can be overcome by further reducing the size of this catalog with additional selection criteria. The simple color/magnitude cuts applied to the complete PS1 catalog do not capture all photometric properties of mock lens systems, such as their precise locus on two-dimensional color/color and color/magnitude diagrams or their radial color gradients. For instance, Figure 2 indicates that mocks are generally redder within the smaller R3 aperture of 1.04 radius than within external, concentric annuli between R3 and R4, and between R4 and R5. These gradients are caused by the presence of bluer, lensed arcs at >1.5 from the lens center and disappear on the LRG-only sample. To exploit this information, we train a simple fullyconnected neural network on the photometry of mocks and random PS1 sources using aperture fluxes that ensure robust colors.
The data set contains gri fluxes in the four apertures for 90 000 lens and 90 000 non-lens examples as inputs, and the ground truth labels of 1.0 and 0.0, respectively, as outputs for binary classification. The negative examples are fluxes of random sources that match our loose color/magnitude gri cuts in Sect. 5.1. The data set is split into training, validation, and test sets with respective fractions of 56%, 14%, and 30%. All fluxes are normalized to the average over the entire data set in order to speed up the learning process. The network architecture consists of 12 dimensional input data, three fully connected hidden layers of 50, 30, and 5 neurons each, with Rectified Linear Unit (ReLU, Nair & Hinton 2010) non-linear activations 6 , and a single-neuron output layer with sigmoid activation 7 .
During the training phase, the network derives a model for classifying galaxies in the training set as lenses or non-lenses according to their input photometry, as briefly summarized with the following stages. After the weight parameters and bias in each neuron are initialized, a subset of the training data is passed through the entire network to calculate predicted labels (forward propagation), and the difference between predictions and ground truth labels is quantified with a loss function L. This information is propagated to the network weights and biases (back propagation, Rumelhart et al. 1986) which are then modified using a gradient descent algorithm to minimize the total loss and improve the model. These stages are repeated iteratively to perform a complete pass through the entire training set, corresponding to one epoch, and then over multiple epochs until the model reaches optimal accuracy. After each epoch, the validation loss is evaluated by classifying inputs from the validation set, in order to determine whether the decrease in training loss reveals better performance or an overfitting to the training set. After training, the network performance is finally quantified using independent data from the test set. Further details can be found in the review of LeCun et al. (2015).
Network parameter optimization is performed via mini-batch stochastic gradient descent, a common variant that consists of splitting the training set into small batches and adjusting the weights according to the average corrections over each batch. Our network minimizes the cross-entropy loss function which Fig. 3. Architecture of the convolutional neural network, inspired from LeNet (LeCun et al. 1998), and comprised of three convolutional layers with 11×11, 7×7, and 3×3 kernel sizes, and 32, 64, and 128 filters, respectively, followed by three fully connected hidden layers with 2048, 512, and 32 neurons. ReLU activations are applied between each layers. Max-pooling layers with 2×2 kernel sizes and stride = 2 are inserted after the first two convolutional layers, and dropout of 0.5 is used before the fully connected layers.
penalizes robust and incorrect predictions, and is expressed as follows for a binary classification problem where y i are the ground truth labels, and p i the network predictions, namely scores in the range [0, 1] resulting from the sigmoid activation on the output layer. The loss is computed over each batch of size N. To avoid unbalanced splits of the data set, we use 5-fold cross-validation that consists of reshuffling the training and validation sets and building the performance metrics. Cross-validation runs trained over 500 epochs are used to optimize the neural network hyperparameters with a grid search, varying the learning rate over the range [0.0001, 0.1] and the weight decay over [0.00001, 0.01], with momentum fixed to 0.9. We train a final network with the entire data set using an optimal learning rate and weight decay of 0.001 and 0.00001, respectively, and applying early stopping at epoch 193 that matches the lowest average loss over the cross-validation runs. Among the 23.1 million input galaxies, 1 050 207 are assigned scores p cat > 0.5 indicating that their gri aperture photometry is consistent with the mocks. Fig. 2 illustrates the network predictions by comparing the colors and magnitudes of random galaxies, mocks, and galaxies having p cat > 0.5. The good agreement between 2σ contours of mocks and p cat > 0.5 shows that the network correctly captures the position of mocks in color/magnitude diagrams, as well as their color variations within different apertures. Moreover, our photometric selection is successfully tested on the aperture fluxes of known strong lensing systems listed as grade A or B with θ E = 1.0-3.0 in the MasterLens database 8 , and with lensed arcs visible in the PS1 stack images. We therefore keep these 1 050 207 galaxies for CNN classification.

Training the convolutional neural networks
Data sets for the CNNs include 180 000 images in g, r, and i bands with the same fraction of positive (lens) and negative (nonlens) examples. Positive examples are taken from the sample of simulated lens galaxies with θ E > 1.5 described in Sect. 4.4. The choice of negative examples is strongly influencing the network predictions and is modified iteratively to improve the network performances (see Sect. 5.5). In short, we boost the fraction of galaxies with specific morphological types using incorrect identifications of strong lenses in previous networks. This  allows the network to learn how to discriminate strongly lensed arcs from the usual contaminants such as extended arms of low redshift spirals, lenticular galaxies, and mergers, and to distinguish isolated LRGs from LRGs with the relevant strong lensing features depicted in the mocks. Our resulting set of 90 000 negative examples includes: -30% LRGs selected directly from the catalog of SDSS LRGs on the high-end of the mass distribution used to create the mocks, -20% spirals classified as likely face-on galaxies in Galaxy Zoo 2 (Willett et al. 2013) and with r-band Sersic radii <4.5 from PS1 (Flewelling et al. 2016), to restrict to blue spiral arms with similar extension as the lensing features present in the mocks, -10% smooth, isolated galaxies from Galaxy Zoo 2 without bright companion and bluer colors than LRGs, -<1% galaxies with apparent dust lanes identified in Galaxy Zoo 2 (Willett et al. 2013), -32% randomly selected galaxies from the PS1 catalog, including diverse types, groups and mergers, and with negligible contamination from the rare strong lenses, -7% false positives from previous neural networks selected by visually classifying candidates with scores > 0.9.
Negative examples are not included in the p cat > 0.5 sample to be classified with the CNN, but they broadly follow the same color and magnitude distributions in gri bands. Some examples are shown in Fig. 1. The data set is split into training, validation, and test sets with the same proportions as before (Sect. 5.2).
We use data augmentation and apply random shifts between -5 and +5 pixels to all images so the network becomes invariant on small positional offsets. This results in input images of 70×70 pixels which conservatively include all emission from the central galaxy of interest. All survey galaxies in the set of negative examples are unique, and LRGs used multiple times in the sample of mocks have been rotated by kπ/2 so they appear only once with a given orientation in the set of positive examples. This ensures that the network is not sensitive to preferential orientations. Other common image augmentation techniques, such as image stretching, normalisation and rescaling (e.g., Petrillo et al. 2017Petrillo et al. , 2019 have been discarded as they do not significantly improve the learning process. The CNN architecture is inspired from LeNet (LeCun et al. 1998) and from the lens modeling CNN of Schuldt et al. (in prep.). After the 70×70×3 input layer, it contains three convolutional layers with 11×11, 7×7, and 3×3 kernel sizes, and 32, 64, and 128 filters, respectively, followed by three fully connected hidden layers with 2048, 512, and 32 neurons (see Fig. 3). ReLU activations act as non-linear transformations between each one of these layers. Max-pooling layers (Ranzato et al. 2007) with 2 × 2 kernel sizes and stride = 2 are inserted after the first two convolutional layers and are essential to make the CNN invariant to local translations of the relevant features in gri image cutouts, while reducing the network parameters. Dropout regularization (Srivastava et al. 2014) with a dropout rate of 0.5 is applied before the fully connected layers. This is an efficient regularization method that consists of randomly ignoring neurons during training in order to reduce overfitting on the training set and improve the CNN generalization. The output layer consists of a single neuron with sigmoid activation and results in a score, p CNN , in range [0, 1] which corresponds to the network lens/non-lens prediction 9 . Our CNN with moderate depth is well suited for binary classification of small PS1 cutouts.
During the training process, the CNN learns the relevant patterns in gri images by adjusting the convolutional kernel weights, through a minimization of the binary cross-entropy loss between ground truth and predicted labels. After the gradient calculation and optimization, information learned by the network is stored in the two-dimensional filters. As for the cataloglevel network, we use mini-batch gradient descent with a batch size of 128 and performed five cross-validation runs. We find an optimal learning rate and weight decay of 0.0006 and 0.001, respectively, using a grid search with momentum fixed to 0.9. The number of training epochs is then chosen from the minimum average validation loss over the cross-validation runs, which corresponds to optimal network performance without overfitting.
The evolution of the training and validation loss for the network with optimized hyperparameters is shown in Fig. 4, until epoch 47 which corresponds to the lowest validation loss. The gap between both curves (generalization gap) is small, showing that the model predictions do not deteriorate much on new data with similar properties as the training set. The final network performances are characterized with the test set which was not seen during training and validation and contains about 54 000 entries. In Fig. 5, we show the model probability predictions for all lens and non-lens examples in the test set. Lenses dominate the distribution for p CNN > 0.6. The model reaches 94.2% accuracy, 93.1% purity and 95.5% completeness on this set suggesting good pattern recognition abilities on new images 10 . In addition, the Receiver Operating Characteristic (ROC) curve in Fig. 6 illustrates the relation between the true positive rate (TPR, the number of lenses correctly identified over the total number of lenses) and the false positive rate (FPR, the number of nonlenses identified as lenses over the total number of non-lenses) in our trained model. The curve is obtained by varying the network probability threshold for lens identification in the range [0, 1], and an ideal network would correspond to an area under curve (AUC) of 1. Our CNN gives AUC = 0.985, and can correctly identify 95.5% of lenses in the test set with a FPR of 7.1% using p CNN > 0.5, or 77.0% of lenses in the test set with a FPR of 0.8% using p CNN > 0.9.
After optimizing hyperparameters and quantifying the performance of the network on the test set, we train a final CNN with the complete data set of 180 000 labelled images and we fix the resulting model parameters to classify the 1 050 207 galaxies with p cat > 0.5.

Evaluating the network performance
The performance of the network is further evaluated on galaxies with various properties on the test set. Fig. 7 depicts the normalized distributions of scores for positive examples in different bins of θ E , θ E /R eff , and r Kron , where R eff is the effective radius from the r-band Sersic fit of the LRG-only image (Flewelling et al. 2016). Somewhat counterintuitively, scores are closer to 1.0 for lenses with θ E < 2 . The low fraction of θ E > 2 in the training set might explain the slightly lower CNN performance on these systems, which generally do not have the most challenging morphologies. Histograms for θ E /R eff demonstrate the ability of the network to assign p CNN closer to 1.0 for mocks 10 Accuracy is defined as the sum of true positives and true negatives over the total number of systems (lenses + non-lenses), and purity is defined as the number of lenses correctly identified with p CNN > 0.5 over the number of systems with p CNN > 0.5. with Einstein radius larger than the effective radius of the lens light distribution, where lensed arcs are in principle better deblended from the lens. We also find a higher fraction of scores p CNN > 0.9 for lens LRGs with higher r Kron (i.e. fainter LRGs), perhaps because the brightest lenses outshine the lensed source emission. Nonetheless, these variations remain generally minor.
Finding acceptable network performances on the test set might be misleading as it relies on our choices in simulating the strong lenses and assembling a set of negative examples. A valuable independent test consists of applying the CNN to strong lenses from the literature. For that purpose, we collect all grade A and B galaxy-scale lenses in the MasterLens catalog, restricting to Einstein radii ∼1-3 similar to the range probed by our 90 000 mocks. These systems were discovered from various techniques including the identification of emission lines from star-forming galaxies behind LRGs using spatiallyintegrated spectra (SLACS, Bolton et al. 2008), and the analysis of high-quality imaging from HST (e.g., in COSMOS, Faure et al. 2008) or deep multiband surveys (e.g., CFHTLS SL2S, Cabanac et al. 2007;Gavazzi et al. 2014;More et al. 2016). Most of these lenses are not detectable in gri Pan-STARRS stacks and need to be excluded from our test set. We thoroughly scanned all PS1 gri single-band and 3-color images of this sample to find those with detected lensed arcs, and assembled a test set of 16 systems. While these published lenses have colors, z d , z s , and configurations similar to our mocks, some of their multiple images are strongly blended with the lenses and difficult to identify with PS1 data.
Pan-STARRS cutouts of these 16 lenses are scored with our trained neural network and the results are presented in Fig. 8. A total of 14/16 lenses are correctly identified as p CNN > 0.5 by the CNN, while 9/16 and 7/16 have higher scores p CNN > 0.8 and p CNN > 0.9, respectively. SL2SJ0217−0513 and SDSSJ1112+0826 are the two incorrect classifications. SL2SJ0217−0513 has a faint lens galaxy falling in the upper range of the redshift distribution (z d = 0.646) and blended with a faint blue arclet. It is worth noting that the network performs better for SDSSJ1110+3649 (p CNN = 0.971) which has a similar morphology to SL2SJ0217−0513, but with the addition of a low S/N, blue counter image on the other side of the lens galaxy. SDSSJ1112+0826 has typical properties of our mocks but lacks a bright counter image. On the other hand, lenses with well-detected and deblended arcs and counter images (e.g., SDSSJ1430+4105, SDSSJ0201+3228, CSWA21) have high scores, showing that the network is able to extract relevant features.
Our comparison extends to systems with Einstein radii below the 1.5 cutoff applied to our simulations. The CNN performance in this regime remains acceptable, as illustrated by the scores of 0.786, 0.612, and 0.971 assigned respectively to SDSSJ1134+6027, SDSSJ2156+1204, SDSSJ2231−0849 that have θ E ∼ 1.1 . In contrast, with θ E = 3.26 due to its groupscale environment , CSWA21 falls on the upper range of the Einstein radius distribution that is underrepresented with only a few hundred examples in our training set. This system is nonetheless given a score of 0.873 and confirms that the CNN can identify these simple configurations. Interestingly, the four systems with ≥ 4 magnified images according to the MasterLens database have p CNN > 0.9. The small number of test lenses however prevents robust estimates of the method purity and completeness.

Discussion on the CNN training
The final version of the CNN from Sect. 5.3 is selected from a range of networks with different architectures, after testing the impact of the training set content. To identify the optimal network, we compare scores assigned to the 16 known test lenses and require low false positive rates by examining gri cutouts of the few hundred galaxies with highest scores p CNN . Overall, different choices of positive and negative examples have much stronger impact than changes in the CNN architecture.
The sets of negative examples tested include: (1) random PS1 sources drawn from the pre-selection in Sect. 5.1; (2) typical LRGs selected as in Eisenstein et al. (2001), mostly less massive than LRGs in mocks; (3) high-mass LRGs similar to those used in Sect. 4; (4) a combination of LRGs, face-on spirals, and random sources (varying the fractions of LRGs and contaminants). Scores on the MasterLens systems from CNNs using sets (1) and (4) are comparable to those in Fig. 8, but introduce an overwhelming number of 400 000 galaxies with p CNN > 0.5 and 250 000 with p CNN > 0.9 implying high false positive rates, and are ruled out. Other sets of positive examples are tested by (1) modifying the θ E lower limit, and (2) suppressing the artificial boost in arc brightness to get more realistic lens/source flux ratios. Both CNNs trained on fainter arcs, more strongly blended with the lens significantly reduce the fraction of genuine lens examples with scores p CNN ∼ 1 and are also discarded. Our tests on the architecture include: (1) adding and removing one convolutional layer, or one fully-connected layer; (2) changing the number of neurons, the kernel sizes or number of feature maps in these layers; (3) using smaller or larger strides on the max-pooling layers; (4) modifying the dropout rates; (5) implementing batch normalization (Ioffe & Szegedy 2015). Each of these changes degrade the network performance as measured from the loss and ROC curves on the test set, and from the 16 MasterLens systems. Using dropout normalization before fully connected layers (as in Fig. 3) turns out to be the most efficient solution to reduce overfitting.

Visual classification
Galaxies with high CNN scores, p CNN , are visually inspected by different authors to assign a final grade. We started classifying galaxies with p CNN close to 1.0, and progressively lowered the threshold to introduce additional galaxies until the fraction of reliable candidates from visual inspection became too low.
We grade the single-band and 3-color PS1 cutouts in gri bands, zoomed to 12 × 12 , and optimally displayed with linear and arcsinh scaling using dedicated scripts to emphasize faint, sometimes blended strong lensing features 11 . To aid the visual classification, we plot grz single-band and 3-color postage stamps from the DESI Legacy Imaging Surveys that significantly overlaps the PS1 3π survey footprint on the extragalactic Northern sky, and that provides slightly deeper, higher quality images (Dey et al. 2019), and we plot residual frames from subtraction of the best-fit light profile. On smaller regions of the sky, we also include gri images from HSC DR2 wide-field surveys (Aihara et al. 2019). The set of CNN candidates is divided into four equal parts, each inspected either by R. C., S. S., S. H. S., or S. T., in order to assign one of the following grades: 0: non-lens, 1: maybe a lens, 2: probable lens, 3: definite lens, similarly to Sonnenfeld et al. (2018) and Jacobs et al. (2019a). All candidates with grades ≥ 2 from this first iteration are then inspected by the three other authors, so final grades were averaged over the four authors. The output list of candidates corresponds to average grades ≥ 2.  Table 1 and Sect. 6.1) Those marked in blue show unambiguous spectral signatures of high-redshift background sources in our inspection of SDSS BOSS DR16 data. Bottom: Examples of random false positives with p CNN = 1. All cutouts are 15 × 15 .
Non-lenses are galaxies clearly identified as nearby spirals, ring galaxies, groups, or other contaminants from their morphology, or cutouts with artifacts. Candidates listed as grade 1 have faint companions and/or weakly distorted features suggesting possible strong lensing signatures, but may also correspond to galaxy satellites or spiral arms. Probable lenses show multiple elongated sources with similar colors, and orientation and angular separation expected for counter-images, while the available 3-color images cannot firmly rule out contaminants. Those assigned grades of 3 have similar, although brighter, non-blended and unambiguous signatures of galaxy-scale strong lenses.

Final candidates from visual inspection
Out of the 1.1 million galaxies the CNN scores 598 130 with p CNN = 0, and 105 760, 12 382, and 1714 as candidate lenses with p CNN > 0.5, p CNN > 0.9, and p CNN = 1, respectively. Scores p CNN > 0.5 amount to 10% of sources ranked by the CNN, and only 0.5% of the input catalog of 23.1 million galaxies. The human inspection process is necessary to increase purity of the candidate sample. However, a systematic visual classification of galaxies with p CNN > 0.5 would be unrealistic and we use a higher p CNN threshold which impacts the purity and completeness in a way that is difficult to quantify. Predictions for mock lenses on the test set (see Fig. 5) and for known lenses (see Fig. 8 A). These systems are selected as high confidence candidates with CNN scores > 0.9, and average grades ≥ 2.0 from visual inspection. Columns are: source name; right ascension; declination; output score from the CNN; average of the visual grades from four authors; g-, r−, and i-band Kron magnitudes of the lens and source blends from the PS1 catalog; g-, r−, and i-band aperture magnitudes of 1.04 radii covering the lens central regions; SDSS photometric redshifts or spectroscopic redshifts marked as ( * ) where available; previously published confirmed or candidate systems ( p CNN > 0.9 with only a few more in the range 0.5 < p CNN < 0.9, and we therefore started inspecting all 12 382 with p CNN > 0.9. As the fraction of visual grades ≥ 2 quickly drops with decreasing CNN scores, down to 1% when extending to 0.8 < p CNN < 0.9, we restricted our final classification to p CNN > 0.9. Our selection results in 321 high-confidence candidates with grades ≥ 2 (hereafter, grades refer to the average grades from the visual inspectors). Respective fractions of 36%, 34%, and 30% of these candidates have scores in the intervals 0.99 < p CNN ≤ 1.00, 0.95 < p CNN ≤ 0.99 and 0.90 < p CNN ≤ 0.95, which demonstrates that the CNN learns meaningful information and assigns high scores for most of the probable or definite lenses. The rate of grades ≥ 2 over the 12 382 galaxies with p CNN > 0.9 is about 2.5%. This fraction is slightly lower but comparable to previous CNN lens searches using deeper, higher quality imaging surveys (KiDS DR4, DES Year 3, Petrillo et al. 2019;Jacobs et al. 2019a). This is not surprising as some of these searches have focused on catalogs of LRGs with robust photometry and are less subject to contamination by low-redshift spirals. Obtaining equivalent performance suggests that our initial catalog-level classification plays an important role in our systematic search that balances the suboptimal imaging quality of the PS1 3π survey. Finally, 37 additional candidates from previous CNNs with p CNN = 1.0 and grades ≥ 2 are included (see Sect. 5.5), bring- Fig. 10. Pan-STARRS lens candidate with prominent Balmer absorption features from a background galaxy at z s = 1.185 blended with the BOSS spectrum of the lens LRG at z d = 0.3155. The black and grey lines show the spectrum observed with a fiber of 2 diameter and the 1σ noise level, respectively, and the blue line corresponds to the best-fit LRG template from the automated SDSS pipeline at λ < 8000 Å (see inset), which poorly fits the continuum and Balmer lines at longer wavelengths. The red line shows that combining the SED templates of an LRG at z d = 0.3155 (brown curve) and a recently quenched galaxy at z s = 1.185 (orange curve) correctly fits the spectrum over the entire range (see details in text).
ing the total number of resulting lens candidates to 358 from our two-step search.
Examples of good candidates and false positives (grades ≤ 1) after visual inspection are shown in Fig. 9. In most cases, false positives belong to specific galaxy types frequently misclassified by the network such as nearly face-on spirals with obvious and extended arms, nearby lenticular galaxies, and bright LRGs with faint unlensed companions. Galaxy groups, mergers with perturbed morphologies, and cutouts with background artifacts are more rare. Other ambiguous systems listed as false positives are galaxies with red bulges and faint arms, with color gradients or companions, where all components are blended and mimic lensed arcs.
The PS1 lens candidates are cross-matched with galaxyscale strong lenses from previous searches with the status of spectroscopally confirmed or candidate systems, by building upon and expanding the current MasterLens database. We compiled grade A and B systems (or equivalent) from a number of recent searches based on optical/near-infrared imaging from DES (Diehl et al. 2017;Jacobs et al. 2019b,a), CFHTLS Jacobs et al. 2017), KiDS (Petrillo et al. 2017(Petrillo et al. , 2019Li et al. 2020), DESI (Huang et al. 2019), and HSC (Wong et al. 2018;Sonnenfeld et al. 2018;Jaelani et al. 2020;Sonnenfeld et al. 2020). Since our network may also be sensitive to lensed quasars with colors and configurations similar to our mock lenses, we also cross-matched with the all-sky database of ∼220 confirmed lensed quasars in the literature (Lemon et al. 2019, and references therein) 12 , and with previously identified lensed quasars from HSC ) and from PS1 (Rusu et al. 2019). For the candidates not included in those tables, we searched in the SIMBAD Astronomical Database 13 .
To our knowledge, besides the test lenses from the MasterLens database, 23 of our 358 CNN candidates are already listed in the literature and corresponding references are listed in Table 1. The vast majority of them are also galaxy-scale strong lens candidates from ground-based multi-12 https://www.ast.cam.ac.uk/ioa/research/lensedquasars/ 13 http://simbad.u-strasbg.fr/simbad/sim-fcoo band imaging searches and lack both spectroscopic and highresolution imaging follow-up. The only galaxy-galaxy lens systems confirmed with spectroscopy and/or space-based imaging are PS1J0143+1607, PS1J0145−0455, PS1J2343−0030, and PS1J0454−0308. Firstly, PS1J0143+1607 is CSWA 116, discovered through the search of blue lensed features near LRGs in SDSS imaging (Stark et al. 2013). Its properties (z d = 0.415, z s = 1.499, θ E = 2.7 ) are very similar to our mock lenses, and the system was assigned p CNN = 1.0 and a visual grade of 2.75. Secondly, PS1J0145−0455 is CSWA 103, also presented by Stark et al. (2013), and has z d = 0.633, z s = 1.958, and θ E = 1.9 , akin to our mocks. It has p CNN = 1.0 and a visual grade of 2.50. Thirdly, PS1J2343−0030 is a SLACS lens with z d = 0.181 and z s = 0.463 which was discovered and modeled by Auger et al. (2009) using SDSS spectroscopy and HST imaging. It has an Einstein radius of 1.5 , higher than the majority of lenses in the SLACS sample, which explains its elevated scores from the CNN (p CNN = 0.917) and PS1 cutout inspection (2.50). Lastly, PS1J0454−0308 (p CNN = 1.000 and grade = 2.75) is a peculiar system thoroughly studied in Schirmer et al. (2010), where the main lens is the brightest elliptical galaxy of a fossil group at z = 0.26 only 8 from the MS0451−0305 cluster at z = 0.54. The HST F814W frame clearly resolves the background source into an extended arc and a compact counter image, corresponding to θ E = 2.4 . Most other galaxy-galaxy lenses in the literature were missed by our selection due to limited PS1 depth or lens configurations not represented in our mocks (e.g., higher Einstein radii). Finally, two of these 23 published systems are confirmed lensed quasars. PS1J2350+3654 (p CNN = 1.0 and grade of 2.25) was discovered in Gaia DR2 (Lemon et al. 2019), and PS1J1640+1932 (p CNN = 1.0 and grade of 2.25) from SDSS (Wang et al. 2017).
Postage stamps of the 335 newly-discovered and 23 published galaxy-scale lens candidates from our Pan-STARRS CNN search are shown in Fig. 9 and in Appendix A, together with p CNN and visual inspection grades. Given that some visual identifications rely on Legacy imaging, which is sometimes deeper than PS1, we also show Legacy 3-color images of our candidates located in the Legacy footprint in Appendix A. Fig. 11. Example of a lens candidate with multiple emission lines from a high-redshift background galaxy overlaid on the SDSS BOSS spectrum of the foreground LRG. The black, grey and blue lines show the observed spectrum, the 1σ noise level, and the best-fit SDSS template for the LRG, respectively. The spectrum is zoomed on the spectral features associated with the background line-emitter at z = 0.8198 rather than with the LRG at z = 0.2386.

Ancillary spectroscopy
We inspect SDSS BOSS spectra from the 16 th data release available for 104 out of 358 lens candidates, in order to characterize the candidate lens galaxies and to search for spectral signatures of high-redshift background galaxies. This approach was previously used to select the SLACS sample (Bolton et al. 2008) and relies on spectral features captured within the small, 2 diameter aperture fibers. Our examination results in: Finding a vast majority of LRGs 14 and only 3/104 star-forming galaxies at lower redshift demonstrates the validity of our method. This result shows that our CNN and visual inspection method predominantly selects the targeted population of galaxyscale strong lens candidates with lens LRGs, and efficiently distinguishes strong lensing features from usual interlopers (e.g., spiral arms, tidal features, blue rings). Moreover, the five false positives (PS1J2249+3228, PS1J1551+2156, PS1J1452+1047, PS1J0834+1443, and the QSO PS1J1156+5032) are all listed as lower confidence candidates with grades ≤ 2.25. The rarity of spectroscopic signatures from lensed galaxies is not surprising because the 2 BOSS fibers enclose little lensed source emission in the θ E 1.5 systems targeted by our search. Background line flux can only be detected in few favorable cases, like in the presence of counter-images closer to the lens center. In addition, emission lines in most lensed galaxies might be too faint for SDSS spectroscopy, and the observable spectral band of BOSS excludes Hβ4863, [OIII]4960 and [OIII]5008 for z > 0.9, which rules out multiple line detections for the majority of sources expected at z > 1.
The only Pan-STARRS candidate robustly confirmed as a strong lens system is PS1J1415+1112. Pan-STARRS imaging spatially resolves the system into a quadruply-imaged background source forming a typical fold configuration, and with redder color than the lens LRG (Fig. 9). The best-fit model from the BOSS pipeline primarily identifies spectral features from the foreground LRG at z d = 0.3155, but poorly fits the continuum and remarkably prominent Balmer absorption lines at λ obs > 8000 Å (Fig. 10). These features are associated with the background galaxy at z s = 1.185. Further inspection of the BOSS spectrum shows that the strong Balmer absorption lines, together with the weak 4000 Å break indicative of relatively young stellar populations, and the lack of nebular emission lines at z s = 1.185 suggesting little on-going star formation, favor the scenario of a recently quenched galaxy. To verify this hypothesis, we use the BC03 stellar population synthesis models (Bruzual & Charlot 2003) to combine the SED template of an elliptical galaxy at z d = 0.3155 with 7 Gyr old stellar populations and solar metallicity with various templates of post-starbursts at z s = 1.185. After varying the stellar age, metallicity, and dust attenuation, we find that rescaling the SED of a young galaxy with 25 Myr old stellar populations, no on-going star formation, and Z = 0.2 Z correctly fits the overall wavelength range in Fig. 10. Together with the red colors of multiple images this adds further evidence that the lensed galaxy has indeed ceased star formation recently.
The last four CNN candidates showing multiple emission lines from a background, possibly lensed galaxy do not exhibit such clear configurations in Pan-STARRS and Legacy gri images. Their BOSS spectra reveal simultaneous detections of [OII]λλ3727, Hβλ4863, [OIII]λ4960, and [OIII]λ5008 on top of the prominent stellar continuum from the foreground LRG (see Fig. 11). The first three are PS1J2345−0209 with z LRG = 0.2940 and z s = 0.6581, PS1J1134+1712 with z LRG = 0.3752 and z s = 0.8121, and PS1J0917+3109 with z LRG = 0.2386 and z s = 0.8198. These candidates were listed as probable lenses from our grades and, although z LRG , z s , and σ * LRG suggest plausible Einstein radii ∼ 1 for singular isothermal sphere profiles, high-resolution follow-up imaging is needed to ascertain their configuration. These data will determine whether the background, spectroscopically-confirmed galaxies are those visually identified with Pan-STARRS and indeed multiply-imaged, or whether they are out of the strong lensing regime. Lastly, PS1J1724+3146 comprises a foreground LRG at z LRG = 0.2097 and a background source at z s = 0.3461, likely too close to the lens to enter the strong lensing regime.

Properties of candidates
Our visual inspection stage implies that resulting lens candidates have morphologies and configurations easily detectable by the human eye. Figure 9 confirms that higher grades are given to systems with extended arcs, clear counter images and often compact lens light profiles. Postage stamps make it clear that our most reliable systems have relatively brighter and distorted arcs (as plausible strongly lensed background galaxies) in gri bands which will facilitate their future follow-up. Interestingly, besides the numerous blue arcs (e.g., PS1J1647+1117, PS1J1508−1652), several candidates have lensed sources redder than the foreground LRGs (e.g., PS1J1559−3147, PS1J1445+3649) and others have compact, nearly point-like morphologies suggestive of lensed quasars (e.g., PS1J1926−2138). Since the presence of possible counter images is a prerequisite for lens identifications, our candidate set is presumably biased towards higher fractions of quads than doubles, at least for grades ∼ 3. Other biases on the relative positions of lenses and sources might also arise. For instance, although our tests suggest that systems with θ E /R eff < 1 are correctly classified by the CNN, with only minor differences in p CNN with respect to θ E /R eff > 1 (Fig. 7), candidates and known lenses recovered by visual inspection should be relatively less blended and lie on the high-end of the θ E /R eff distribution. Quantifying this effect is nevertheless an arduous task due to uncertainties in reliably measuring R eff from Pan-STARRS imaging. As previously mentioned, systems with θ E differing from simulated lenses in our training set are likely all discarded.
Our 358 candidates have lens redshifts in the range z d ∼ 0.15-0.65. The distribution does not change much between candidates with confirmed lens z spec from SDSS, and those with more uncertain SDSS z phot due to lens/source blending. It closely matches the lens redshift ranges in other samples selected through multiband, ground-based imaging such as CASSOWARY (Stark et al. 2013) and SL2S (Sonnenfeld et al. 2013). Interestingly, Pan-STARRS candidates with SDSS stellar velocity dispersion have σ * mean ∼ 275 km s −1 . This is significantly lower than σ * mean ∼ 310 km s −1 in the set of LRGs for simulations (Sect. 4.4), indicating that lens LRGs in our candidate set are less massive on average.
To understand implications of such differences in σ * we followed Petrillo et al. (2017) and, for each lens candidate with z d and σ * available from BOSS, we calculated Einstein radii for a singular isothermal sphere profile and different source redshifts spanning the most plausible z s = 0.5-3.0 range. We assumed σ SIS ∼ σ * . In the vast majority of cases (86%), Einstein radii predicted from the lens galaxy dynamics match the approximate θ E ∼ 1-2 of PS1 candidates from a basic inspection of 3-color images in Fig. 9. This range is also expected from construction of the training set and our CNN selection function. Predicted and observed image configurations usually match for z s > 1. This qualitative test argues in favour of plausible lens configurations and few contaminants despite the moderate σ * from BOSS.
In some cases, the large Einstein radii 1.5 are likely to be partly attributed to a contribution from the environment of the primary lens galaxy, probably from a smooth dark-matter halo associated with an extended group-or cluster-scale structure (Oguri 2006). As a matter of fact, obtaining a few strong lens candidates with θ E in the same range as our lens simulations but with lower SDSS stellar velocity dispersions proves that, if they are genuine strong lenses, they host moderately massive lens LRGs with significant external shear and magnification contributions from the environment. We verified that some of our Pan-STARRS candidates are indeed matching the position of the central brightest cluster galaxy of SDSS clusters (e.g., PS1J0907+4233, PS1J2153+1154, PS1J1417+2120, PS1J1210+2843, PS1J1730+3405, Szabo et al. 2011), and of candidate galaxy clusters from a red sequence search in DES imaging (e.g., PS1J0919+0336, PS1J2233+3012, PS1J1210+2843, PS1J1414+6447, Rykoff et al. 2016). This environmental effect is well illustrated by the case of PS1J0454−0308 characterized in Schirmer et al. (2010). We also found that PS1J2210+0620 and its bright red arc of ∼8 extension lie behind the core of Abell 2422 galaxy cluster (Abell et al. 1989). Lastly, PS1J1142+5830 is located in the field of the rich, X-ray luminous galaxy cluster MACSJ1142.4+5831 at z = 0.325, also known as Abell 1351, which has a bimodal velocity distribution suggesting an on-going merger (Barrena et al. 2014). The SDSS photo-z of PS1J1142+5830 is consistent with the cluster redshift but our candidate is offset by 1.3 from the brightest cluster galaxy.
The Pan-STARRS gri photometry of our 358 candidates is comparable to that of our lens simulations. Observed (g − i) colors of the deflector central regions within 2 diameter apertures range between 1.3 and 2.7 mag and broadly follow the color distribution of mocks and p cat > 0.5 galaxies in Fig. 2. Our best candidates are therefore representative of the global sample and visual inspection does not introduce significant biases in terms of lens color. Kron (g − i) colors from blended deflector and source emissions are ∼0.3 mag lower because lensed arcs are predominantly bluer than the lens LRGs. Given uncertainties on the lensed source colors, quantitative analyses would require subtracting the lens LRG light profile, a hazardous task with Pan-STARRS imaging. However, it is worth noting that a majority of p CNN > 0.9 systems with blue arcs closely mimicking extended arms of star-forming spirals were conservatively discarded by visual inspection, while systems with similar configurations but redder arcs where considered less ambiguous and assigned higher grades. Consequently, the output set is likely biased towards a higher fraction of red lensed arcs compared to all genuine lens systems assigned p CNN > 0.9.
We did not find significant correlations between average grades and photometric properties. Although our high and lowerconfidence candidates have comparable lens brightness, those with higher grades have slightly redder lenses by ∼0.15 mag in (g − i) on average. Mean lens redshifts and velocity dispersions are also increasing in the top half of the candidate set, by 0.05 and 50 km s −1 , respectively.

Discussion
Our study demonstrates that a two-step approach combining catalog-level and image-level classifications, as already applied to lensed quasars searches (Agnello et al. 2015), is efficient in selecting galaxy-scale strong lenses from very wide-field surveys without imposing restrictive cuts on the input galaxy catalog. Using three bands and multi-aperture photometry, our cataloglevel neural network conveniently reduces the sample size for CNN classification to the order of 1 million sources which, in the case of Pan-STARRS, allows us to download all gri cutouts within a manageable amount of time of about one week. This first stage will be particularly beneficial to overcome data volume limitations in the new era of deep, large scale surveys such as LSST. At the same time, the low false negative rate and good performance on known lens systems of our catalog-level network applied to Pan-STARRS photometry shows that this stage discards very few potentially interesting systems. After that, we obtain encouraging results from our CNN combined with visual inspection. The compilation of 330 newly-discovered lens candidates (after removing the five false-positives identified from SDSS BOSS spectra) and confirmation of 23 lenses already selected from other, more suitable surveys (e.g., DES, KiDS) demonstrates the efficiency of our approach regardless of the limited depth and seeing in Pan-STARRS.
The strategy presented in this paper is expected to be easily applicable to the deeper LSST gri stacks in order to identify the large set of new, wide-separation galaxy-scale lenses from this survey. The simulations of Collett (2015) suggest that the LSST final stacks in gri bands will yield about 39 000 detectable lens systems, with θ E greater than the seeing FWHM and large enough to spatially-resolve multiple images, and with total S/N of lensed arcs > 20 in at least one band. Under these assumptions, future LSST lenses will mainly cover θ E = 1-3 , a similar range as our Pan-STARRS candidates, and their systematic identification will be crucial for several science cases including the search for strongly lensed SNe with spatially-resolved multiple images for early-phase SN spectroscopy and cosmography . In order to achieve this, we can customize our simulation pipeline of realistic Pan-STARRS mock lenses to produce LSST mocks and quickly assemble a training set, by using gri cutouts of the relevant LRG lenses from LSST stacks rather than Pan-STARRS. Extending the search to broader populations of lens galaxies might also become feasible, in light of the ∼4 mag deeper gri LSST stacks.
Tests on the CNN performance presented in this paper are likely valid essentially for Pan-STARRS seeing and depth, while better imaging quality should enable identifications of broader, more complex image patterns and would likely benefit from updates on the neural network. The main challenge is the minimization of false positives which, given the rarity of strong lenses per deg 2 , will largely dominate the sample of CNN candidates even with FPR down to 1% or less. Potential avenues for improvements include adding other convolutional layers or boosting the learning process with residual networks (He et al. 2015), as the latter might give better performance for image fea-tures not represented in the training data (for LSST simulations, see Lanusse et al. 2018). A possibility would be to use CNNs optimized for image outlier detection (e.g., Margalef-Bentabol et al. 2020), either to identify strong lenses as the outlier class, or simply to exclude cutouts with partial coverage, background artifacts or other anomalies before classification. In addition, although Teimoorinia et al. (2020) found little difference with single-band HST images, implementing additional classes for the usual interlopers (spirals, ring galaxies, ...) might help distinguish signatures of strong lenses in gri bands. Our Pan-STARRS lens search demonstrates that ultimately, most progress should come from the choice of positive and negative examples in the training set (see Sect. 5.5). Highly-realistic lens simulations are one of the main ingredients (see Lanusse et al. 2018). For instance, as multiple images are strongly blended with lenses in about half of the mocks produced in Sect. 4.4, one could only include those visually classified as definite lenses from their bright arcs and unambiguous configurations. Fine-tuning the selection of positive examples in this way could help the network assign high scores exclusively to robust candidates that are worth including for follow-up campaigns.
Reducing the false positive rates and making sure that only high-confidence candidates get p CNN ∼ 1 will be crucial for future lens searches in the LSST era. However, the most efficient neural networks should provide orders of magnitudes more lens candidates than Pan-STARRS and their systematic visual inspection might become unrealistic. Assembling complete strong lens samples will therefore require one to abandon this nonautomated, human-classification stage and to search for alternatives. For instance, calibrating the CNN scores as probabilities (e.g., Guo et al. 2017) could help quantify purity and completeness, and automatically select a subset of candidates for highresolution imaging or spectroscopic follow-up.

Conclusion
In this paper, we present a systematic search for wide-separation, galaxy-scale strong lenses in Pan-STARRS using machine learning classification of gri images from the 3π survey on the entire northern sky. We focused our search on massive LRGs acting as strong lenses and producing Einstein radii ≥ 1.5 , and we simulated a set of highly-realistic mocks by painting lensed arcs on Pan-STARRS image cutouts of LRGs with known redshift and velocity dispersion from SDSS. This strategy ensures that mocks include background artifacts and field galaxies, and that the network becomes invariant under the small scale exposure and FWHM variations on the survey stacks.
We computed the gri aperture photometry of mocks to preselect a conservative catalog of 23.1 million sources, and followed a two-step approach for classification: (1) a catalog-based neural network on the source photometry, (2) a CNN trained on gri image cutouts. The catalog-level network assigned p cat > 0.5 for 1 050 207 galaxies while excluding little known lenses from the MasterLens catalog. The image-level network then yielded sets of 105 760 and 12 382 candidates with scores p CNN > 0.5 and > 0.9, respectively. We visually inspected those with p CNN > 0.9 and combined with previous CNNs to assemble a final set of 330 high-quality newly-discovered candidates with average visual grades ≥ 2. Publicly available BOSS spectroscopy of the lens candidates' central regions proves that the vast majority are indeed LRGs at z ∼ 0.1-0.7, and five newly-discovered candidates show robust signatures of blended, high-redshift background sources. Pan-STARRS clearly resolves one of them as a quadruply-imaged red galaxy at z s = 1.185 (likely recently quenched), behind a lens LRG at z d = 0.3155.
This new set of bright lens candidates is particularly valuable for future lensed SNe searches. Strong lenses with such image separations are indeed likely to produce long time delays of a few days to weeks which alleviates difficulties affecting θ E < 1 lensed SNe such as iPTF16geu and allows one to measure accurate, microlensing-free time delays for cosmography. Such time delays are also well-suited to trigger timely imaging and spectroscopic follow-up of a SN reappearance to characterize the SN's early-phase behaviour within a few days after explosion. In the meantime, validating these Pan-STARRS lens candidates and deriving strong lensing models will require high-resolution imaging and/or spectroscopic follow-up.
Our CNN exhibits good performance on known lenses detected in Pan-STARRS, correctly classifying 14/16 systems as lenses and assigning scores p CNN > 0.9 to 7/16. We found that CNN predictions strongly depend on the construction of the training set but little on the network architecture. Our search also recovered 23 confirmed or candidate strong lenses in the literature out of the MasterLens catalog. In the near future, the release of deep stacks with best seeing conditions in smaller fields as part of Pan-STARRS DR3 will give the opportunity to extend our search to optimal PS1 data quality. In addition, we expect that the efficient and automated two-step classification method presented in this paper will be applicable to the deep gri image stacks from LSST with only minor adjustments.