A Wide and Deep Exploration of Radio Galaxies with Subaru HSC (WERGS). IV. Rapidly Growing (Super-)Massive Black Holes in Extremely Radio-Loud Galaxies

We present the optical and infrared properties of 39 extremely radio-loud galaxies discovered by cross-matching the Subaru/Hyper Suprime-Cam (HSC) deep optical imaging survey and VLA/FIRST 1.4 GHz radio survey. The recent Subaru/HSC strategic survey revealed optically-faint radio galaxies (RG) down to $g_\mathrm{AB} \sim 26$, opening a new parameter space of extremely radio-loud galaxies (ERGs) with radio-loudness parameter of $\log \mathcal{R}_\mathrm{rest} = \log (f_{1.4 \mathrm{GHz,rest}}/f_{g,\mathrm{rest}})>4$. Because of their optical faintness and small number density of $\sim1~$deg$^{-2}$, such ERGs were difficult to find in the previous wide but shallow, or deep but small area optical surveys. ERGs show intriguing properties that are different from the conventional RGs: (1) most ERGs reside above or on the star-forming main-sequence, and some of them might be low-mass galaxies with $\log (M_\star/M_\odot)<10$. (2) ERGs exhibit a high specific black hole accretion rate, reaching the order of the Eddington limit. The intrinsic radio-loudness ($\mathcal{R}_\mathrm{int}$), defined by the ratio of jet power over bolometric radiation luminosity, is one order of magnitude higher than that of radio quasars. This suggests that ERGs harbor a unique type of active galactic nuclei (AGN) that show both powerful radiations and jets. Therefore, ERGs are prominent candidates of very rapidly growing black holes reaching Eddington-limited accretion just before the onset of intensive AGN feedback.


INTRODUCTION
The formation of supermassive black holes (SMBHs) and their growth across the cosmic time are fundamental arXiv:2108.02781v1 [astro-ph.GA] 5 Aug 2021 questions in modern astronomy. In the local universe at z ∼ 0, the mass of SMBH (M BH ) and their host properties show tight correlations for the SMBH mass range of 6 log(M BH /M ) 10 with a scatter of σ ∼ 0.3 dex (e.g., Magorrian et al. 1998;Gebhardt et al. 2000;Ferrarese & Merritt 2000;Tremaine et al. 2002;Marconi & Hunt 2003;Häring & Rix 2004;Sani et al. 2011;Kormendy & Ho 2013;McConnell & Ma 2013). Such a tight correlation is considered to be established by a balance of feeding and feedback processes between the central SMBHs and the host galaxies. Local Radio galaxies have been primary targets for investigating the effect of SMBHs on the host galaxies because powerful radio galaxies or radio-loud active galactic nuclei (AGN) mainly reside in massive galaxies whose star-formation is quenched, with the presence of strong jets dispersing the interstellar medium (e.g., Morganti et al. 2005;Holt et al. 2008;Nesvadba et al. 2017), or producing cavities in the host galaxies (e.g., Rafferty et al. 2006;McNamara & Nulsen 2007;Bîrzan et al. 2008;Blandford et al. 2019). Those radio galaxies tend to show a low accretion rate, i.e., Eddington ratio of λ Edd < 10 −2 , suggesting that the energy release is dominated by the kinetic power by jet, not by the radiation from AGN accretion disk.
However, the situation may be different at z > 1. Using over 10 3 radio AGN selected from the Very Large Array (VLA)-COSMOS 3 GHz large project (Smolčić et al. 2017a), Delvecchio et al. (2018) demonstrated that SMBH accretion in radio-bright (L 1.4GHz > 10 25 W Hz −1 ) AGN becomes more radiatively efficient (λ Edd > 10 −2 ) at z > 1. They reside in star-forming galaxies, which contain plenty of cold gas. This picture of radio AGN is completely different from those seen in the local universe in the same radio luminosity range (e.g., Hickox et al. 2009). Still, the survey volume of VLA-COSMOS surveys is small so that they may be missing a rare, but radio-bright population. The FIRST survey is the best tool to explore such a radio bright end since it covers half the sky with the VLA at 1.4 GHz (e.g., Becker et al. 1995;Helfand et al. 2015). However, cross-matching the VLA/FIRST sources with the SDSS survey catalog identified optical counterparts in only 30% of the radio sources (Ivezić et al. 2002;Best & Heckman 2012).
Recent Subaru/Hyper Suprime-Cam (HSC; Miyazaki et al. 2018) strategic survey shed light on such a situation. We have conducted a search for optically-faint radio-galaxies (RGs) using the Subaru HSC survey catalog (Aihara et al. 2018b) and the VLA/FIRST radio continuum catalog, and we have found a large number (> 3 × 10 3 sources) of RGs at z ∼ 0-5 (Ya-mashita et al. 2018(Ya-mashita et al. , 2020. The project is called Wide and deep Exploration of Radio Galaxies with Subaru/HSC (WERGS; Yamashita et al. 2018). Yamashita et al. (2018) demonstrated that over 60% of populations now have reliable optical counterparts thanks to deep HSC/optical imaging. Figure 1 shows that the WERGS sample spans a wide range of optical magnitude g AB = 18 − 26 and radio-loudness parameter of log R rest = log(f 1.4GHz,rest /f g band,rest ) = 1-6 (e.g., Ivezić et al. 2002).
Our particular interest is the new parameter space of optically faint RGs that are extremely radio-loud galaxies (ERGs) with log R rest > 4. Previously known radio quasars have a peak of radio-loudness at log R rest = 2-3, and rarely contain any sources with log R rest > 4 (Ivezić et al. 2002;Inoue et al. 2017). Therefore, the optical emission of ERGs seems not originate from the AGN accretion disk anymore probably due to dust obscuration of the nuclei, but it traces the stellar-component of the host galaxies (e.g., Terashima & Wilson 2003). Given that the optical band is very faint with median magnitudes of g AB ∼ 24.5, ERGs might have smaller stellar mass than previously known radio sources. Considering the very small number density of ERGs (∼ 1 source deg −2 ), it is not surprising that deep but smaller volume surveys (such as VLA-COSMOS Smolčić et al. 2017a,b) have rarely detected such ERGs, and the very wide SDSS survey (e.g., SDSS-selected radio galaxies Best & Heckman 2012) could not detect ERGs either because of their shallow sensitivity down to only g AB ∼ 22.
In this study, we will explore the physical properties of ERGs. Toba et al. (2019) already compiled multiwavelength data covering the optical, IR, and radio band for the WERGS sample, and derived physical parameters of host galaxies such as stellar mass, star-formation rate as well as the AGN luminosity estimated from the mid-IR bands. By using this data set, we will explore the AGN and host galaxy properties of the ERGs. Thoroughout this paper, we adopt the same cosmological parameters as Yamashita et al. (2018) and Toba et al. (2019); H 0 = 70 km s −1 Mpc −1 , Ω M = 0.27, and Ω Λ = 0.73.

SAMPLE SELECTION AND PROPERTIES
Our initial sample is based on the WERGS sample (∼ 3600 sources; Yamashita et al. 2018), which compiled the HSC Subaru Strategic Program (HSC-SSP) optical counterparts of the VLA/FIRST data. Here, we briefly summarize the WERGS sample and the reader should refer to Yamashita et al. (2018) for the WERGS catalog and Toba et al. (2019) for the IR catalog of the WERGS sample. We also describe further selection criteria im- log Rrest versus rest g band magnitude of our sample. The original WERGS sample obtained by Yamashita et al. (2018) is shown with gray circles. The finally selected 987 WERGS sample with log Rrest < 4 used in this study (we call the sample "NRGs" in the text) and 39 ERGs (the sample with log Rrest > 4) are shown with blue circles, and orange stars, respectively. The vertical dashed-line indicates the magnitude limit of the SDSS survey at gAB = 22.
posed on the sample, which are suitable for this study. The number of WERGS sources after each sample selection cut is also summarized in Table 1 The VLA/FIRST survey contains radio imaging data at 1.4 GHz with a spatial resolution of 5.4 arcsec (Becker et al. 1995;White et al. 1997), which completely covers the footprint of the HSC-SSP Wide-layer (see the text below). Yamashita et al. (2018) utilized the final release catalog of FIRST (Helfand et al. 2015) with the flux limit of > 1 mJy, and extracted 7072 FIRST sources in the HSC-SSP footprint.

Subaru/HSC-SSP data
The HSC-SSP is an ongoing wide and deep imaging survey covering five broadband filters (g-, r-, i-, z-, and y-band; Aihara et al. 2018b;Bosch et al. 2018;Furusawa et al. 2018;Kawanomoto et al. 2018;Komiyama et al. 2018;Huang et al. 2018), consisting of three layers (Wide, Deep, and UltraDeep). Yamashita et al. (2018) utilized the Wide-layer S16A data release, which contains observations from 2014 March to 2016 January with a field coverage of 154 deg 2 (based on six fields; XMM-LSS, GAMA09H, WIDE12H, GAMA15H, HECTOMAP, and VVDS), and forced photometry of 5σ limiting magnitude down to 26.8, 26.4, 26.4, 25.5, and 24.7 for g, r, i, z, and y-band, respectively (Aihara et al. 2018a). The average seeing in the i-band is 0.6 arcsec, and the astrometric root mean squared uncertainty is about 40 mas. After removing spurious sources flagged by the pipeline, Yamashita et al. (2018) cross-matched the FIRST sources with a search radius of 1 arcsec. Yamashita et al. (2018) also required detections with S/N > 5 in the r-, i-, and z-bands to qualify as an optical counterpart. This initial sample of Yamashita et al. (2018) comprised 3579 sources.
We then applied additional cuts to this sample. To remove radio galaxies from the initial WERGS sample whose radio emission might be dominated by the star-formation of host galaxies, and not by the AGN, we set a lower-limit to radio-luminosity of L 1.4GHz > 10 24 W Hz −1 , which is equivalent to the radio-emission from the host galaxies with a star-formation rate of SFR ≈ 250 M yr −1 (Condon 1992;Condon et al. 2013). This cut is also supported by a steep decline of radio luminosity function of starburst galaxies above L 1.4GHz = 10 23 W Hz −1 (e.g., Kimball et al. 2011;Tadhunter 2016;Padovani 2016). This criterion reduces the sample size to 3147.
Next, we limit our sample to compact sources in the radio band. This is important in order to reduce false optical identification by mismatching the optical sources with the locations of spatially extended radio-lobe emission. Yamashita et al. (2018) discussed this and categorized radio compact sources using the ratio of the total integrated radio flux density to the peak radio flux density f int /f peak . They treated the source as compact if the ratio fulfills either of two following equations, where f peak /rms is the S/N, and rms is a local rms noise in the FIRST catalog, and the above two equations are obtained from the study by Schinnerer et al. (2007) (see also the original criterion: Ivezić et al. 2002). We set an additional conservative requirement that there shall be no FIRST source in the surrounding 1 arcmin (corresponding to 500 kpc at z ∼ 1) to select the isolated radio core emission and avoid coincident matching with radio lobes of other nearby FIRST sources. This reduces the sample to 2583 sources. A further ten sources were removed in crowded regions of the HSC footprint due to multiple HSC detections within < 1 arcsec of the HSC counterpart, bringing the sample down to 2573 sources.
Finally, we restricted the sample to sources with spectroscopic redshift (spec-z; z spec ) or reliable photometric redshift (photo-z; z phot ) values. The spec-z were ob-  Figure 2. Luminosity of WERGS sample as a function of redshift. The points are the same as in Figure 1. (left) Rest-frame radio luminosity log L1.4GHz (W Hz −1 ) as a function of redshift, where L1.4GHz is obtained from VLA/FIRST (Helfand et al. 2015), and the k-correction was made by Yamashita et al. (2018) based on redshift. The horizontal dashed line is a luminosity cut we adopt to exclude galaxies with strong starbursts. The faint cyan and orange circle points are data obtained from FIRST-SDSS (Best & Heckman 2012) and VLA-COSMOS (Smolčić et al. 2017a), respectively. (Right) IR AGN luminosity LAGN,IR (erg s −1 ) as a function of redshift. For the entire WERGS population, we plot the sources only when it is included in Toba et al. (2019), in which the IR AGN luminosity was measured.
tained from SDSS DR12 (Alam et al. 2015), the GAMA project DR2 (Driver et al. 2011;Liske et al. 2015), and WiggleZ Dark Energy Survey project DR1 (Drinkwater et al. 2010). In this study, we limit the sample to the spec-z range 0.3 < z spec < 1.6 to cover the same redshift range as the photo-z sample. For the sources without spec-z, we utilized the photo-z estimated using the Mizuki SED-fitting code which is one of the standard photo-z packages for the HSC-SSP survey. The method utilizes the photometries of the five HSC-SSP bands (see Tanaka 2015;Tanaka et al. 2018, for more details) for the SED fitting. Yamashita et al. (2018) discussed that the HSC-SSP photo-z derived by MIZUKI is reliable for z phot < 1.6 based on comparison with spectroscopic redshift in the COSMOS field. In addition, the redshift of sources with z phot < 0.3 are sometimes erroneous because they lack the Balmer break tracer in the HSC optical bands. Therefore, we limit the sample to 0.3 < z phot < 1.6. We further imposed requirements based on the reliable photo-z fitting quality (reduced χ 2 < 3, σ/z phot < 0.2, and σ/(1 + z phot ) < 0.1), following Toba et al. (2019). The resulting WERGS sample contains 1770 sources. Given that our sample largely relies on photo-z results, we further discussed how possible erroneous photo-z might affect our main results in Appendix A.

WERGS IR catalog
To study the AGN and host galaxy properties in WERGS, Toba et al. (2019) compiled optical, near-IR, mid-IR, far-IR, and radio data for the WERGS sample. (2019) considered the contamination of Synchrotron radiation in the IR bands by extrapolating the power-law from the radio bands. Thus, L AGN,IR is obtained purely from the dust emission heated by AGN. The bolometric AGN luminosity is estimated by using the conversion L AGN,bol 3 × L AGN,IR (Delvecchio et al. 2014;Inayoshi et al. 2018). Because Toba et al. (2019) restricted themselves to objects in an area of ∼ 95 deg 2 , in which mutli-wavelength information from u-band to far-IR was available, our sample is reduced to 1055 sources.
Some of the selected sources have intensive star formation rates, reaching SFR ≈ 250M yr −1 , so we applied an additional cut to remove possible star-formation dominated radio galaxies using the ratio of IR and radio luminosity (q IR ; Helou et al. 1985;Ivison et al. 2010) defined by where L IR is the total IR luminosity in units of W derived from CIGALE and 3.7 × 10 12 is the frequency in Hz corresponding to 80 µm, which makes q IR a dimensionless quantity. We set q IR < 1.68 (Del Moro et al. 2013) to select radio excess (meaning jet-emission dominated) sources. As a result, our final sample reduces to 1026 sources, and the resulting final number of ERGs fulfilling log R rest > 4 is 39 sources, and they are shown in orange stars in Figure 1. We refer to the remaining 987 WERGS sources with 1 < log R rest < 4 as "normal radio galaxies (NRGs)" (shown with blue circles in Figure 1) hereafter. Table 1 shows the number of sources for the full WERGS sample and the ERGs after each selection cut. ERGs suffers the photo-z quality cut more than the full WERGS sample because ERGs contain a relatively large fraction of optically faintest sources with i AB > 25, which sometimes makes the reliable photo-z estimation difficult. The IR selection cut is another main factor in reducing the number of sources. This is mainly because the region available for the IR data is limited in the area of ∼ 95 deg 2 out of the 154 deg 2 (Toba et al. 2019). Note that this IR selection reduces the similar fraction of sources for the full WERGS sample (60%) and ERGs (55%).

RESULTS
We summarize the properties of the obtained 39 ERGs with IR detections. First, we show the basic differences between ERGs and NRGs. Then, we show the properties of ERGs on SMBHs and host galaxies.

Basic Sample Properties
3.1.1. log Rrest vs. g-band magnitude Figure 1 shows the distribution of the radio-loudness parameter R rest as a function of the rest-frame g-band magnitude (g AB ). The vertical dashed-line at g AB = 22 indicates the SDSS magnitude limit (e.g., Ivezić et al. 2002). Since most ERGs are fainter than the SDSS limit, Subaru/HSC has enabled us to investigate this unique population for the first time. Figure 1 also shows that most ERGs are very faint in the optical, with median magnitudes of g AB = 24.5. The current 8 m class telescopes are sufficiently sensitive to obtain spec-z for the bright end of the ERGs sample down to g AB ≈ 24.0, and with 30 m class telescopes, spec-z can be obtained for sources down to g AB ≈ 26.0. Figure 2 shows the radio and IR AGN luminosities of our sample as a function of redshift. The 1.4 GHz radio luminosity (L 1.4GHz W Hz −1 ) is taken from Yamashita et al. (2018) or Toba et al. (2019), in which the integrated flux density (f int ) obtained by the VLA/FIRST final catalog (Helfand et al. 2015) is used, and kcorrection is applied by assuming a power-law radio spectrum with f ν ∝ ν α , where α is estimated with α = log(f 1.4GHz /f 150MHz )/ log(ν 1.4GHz /ν 150MHz ) for the objects having TGSS 150 MHz data (Intema et al. 2017), and α = −0.7 for all others (e.g., Condon 1992).

L-z plane
As shown in the left panel of Figure 2, ERGs show relatively high radio luminosities with a median of log(L 1.4GHz /W Hz −1 ) = 26.3, which is one order of magnitude higher than that of the NRGs ( log(L 1.4GHz /W Hz −1 ) = 25.1). On the other hand,  in terms of the IR AGN luminosity shown in the right panel, there is no clear difference between ERGs and the NRGs log(L AGN,IR /erg s −1 ) = 44.9 for ERGs and log(L AGN,IR /erg s −1 ) = 44.6 for the NRGs. We therefore assume ERGs to be intrinsically radio-louder sources than NRGs, and we will discuss this in Section 4.1.

Host Galaxy Properties
The left panel of Figure 3 shows the distribution of our RGs in the SFR and M plane, where both val-ues are obtained from the optical+IR SED fitting with CIGALE (Toba et al. 2019). It is known that most starforming galaxies follow the main-sequence (MS) whose normalization moves upward with redshift, as galaxies are more gas-rich at higher z (e.g., Brinchmann et al. 2004;Noeske et al. 2007;Elbaz et al. 2011;Whitaker et al. 2012;Schreiber et al. 2015a). Above the MS, galaxies are referred to as starburst galaxies, producing stars more efficiently than regular star-forming galaxies. Pearson et al. (2018) measured the SFR and M using multi-wavelength data from UV to far-IR employing the CIGALE code, and their MS is delineated with the shaded area in Figure 3, whose bottom and top boundaries correspond to redshifts of z = 0.3 and z = 1.6 with the scatter of σ = 0.35 at z = 0.3 and σ = 0.24 at z = 1.6, respectively. In this study, we define the sources as starburst galaxies if they are located above the yellow shaded MS area, which is roughly consistent with the criteria in previous studies (e.g., Elbaz et al. 2011).
Considering that our sample spans a wide redshift range in 0.3 < z < 1.6 and their rapid redshift evolution of MS, redshift dependence is a key factor to understand the relation in the left panel of Figure 3. Therefore we also calculate the ratios between the observed SFRs and those predicted from the MS relations at a given stellar-mass, that is SFR/SFR(MS), and its redshift dependence is shown in the right panel of Figure 3. We also compile how the SFR and M relation evolves with redshift in Appendix B and Figure 10 (top panels). Figure 3 shows several sequences of sources, notably at around sSFR∼ 10 −10.5 , ∼ 10 −9 , and ∼ 10 −8 yr −1 , and the sources locate scarcely between the sequences. This does not originate from the properties of nature, but more mainly from the complications of the limited bin numbers of the SED fitting parameters of star-forming history (SFH). In our case, the most notable contribution is considered to be f burst ; mass fraction of the burst population. Toba et al. (2019) set a limited parameter binning of f burst = 0.001, 0.1, and 0.3, which is roughly corresponding to the bindings at sSFR∼ 10 −10.5 , 10 −9 , and 10 −8 yr −1 . This banding might disappear once we follow the same SFH parameter binning of Pearson et al. (2018), which was impossible for our studies because of the additional parameters of AGN and radio components which increase the computational time exponentially. Therefore instead we assume that SFRs in this study have large uncertainty of ∼ 0.5 dex. Please see the more detailed discussion on the choice of the SFH parameters and their effects on the SFR estimations (e.g., Schreiber et al. 2015b;Ciesla et al. 2017;Pearson et al. 2018).
Although our SFR estimation by Toba et al. (2019) might harbor large uncertainties as discussed above, Figure 3 shows three important suggestions. One is that most ERGs are distributed above the MS by a factor of ∼ 10 (see the right panel), suggesting that most ERGs are in starburst mode and they contain a copious amount of gas. This is indirect evidence for the availability of cold gas supply fueling both star-formation, and the central SMBHs. The specific star-formation rate defined by sSFR = SFR/M reaches sSFR ∼ 10 −8 yr −1 for ERGs, suggesting that ERGs are in a rapid stellarmass assembly phase with mass doubling times of only ∼ 100 Myr. This value is also comparable to one starburst duration (e.g., McQuinn et al. 2010) and thus most of the stellar component is expected to be dominated by a very young population.
Second, the onset of AGN feedback on the host galaxies has apparently not (yet) happened to most ERGs considering their location above the MS. This indicates a different picture from the conventional local RGs, which show a preference of massive host galaxies (M > 10 11 M ) and fall below the MS because their strong AGN feedback quenches star formation (e.g., Seymour et al. 2007). Note that our sample requires IR detections for the CIGALE SED fitting to derive stellarmass and SFR, therefore the WERGS not detected in IR do not appear in Figure 3. In addition to that, our sample requires radio compact morphologies, which removes a large population of local, radio-extended RGs. These two selection criteria may introduce a selection bias, since it is possible that these IR non-detection or radio extended sources are clustered below the SF mainsequence. It is nonetheless intriguing that ERGs are located at the top edge of the distribution within the WERGS sample with IR detections.
The third point is that, if we limit ourselves to the sources with z < 1 (red star points), most of these are clustered at the low-M end, typically at log M /M < 10. This suggests that the low-z subsample of ERGs is potentially important with regard to the study of BH seeds. Considering that all of our sources fulfill (1) L 1.4GHz > 10 24 W Hz −1 , that is, they lie above the threshold where their radio luminosity could be produced by star formation, as discussed in Section 2.1.2, and (2) the radio-excess selection with q IR < 1.68, their radio emission can only be achieved by the AGN jet, requiring the existence of a SMBH at the center. Therefore, ERGs contain massive BH candidates residing in low-mass galaxies. Figure 4 shows the distribution of ERGs and NRGs in the M and L 1.4GHz plane, and illustrates the third point above from a different aspect. ERGs populate regions of comparatively small stellar-mass and high L 1.4GHz , showing that higher R rest sources tend to have smaller M as well as higher L 1.4GHz . The combined results from above and from Figure 1 suggest that smaller M sources tend to observed as optically fainter sources, and are therefore observed as high R rest sources. We will discuss this point later in Section 4.1.

SMBH Properties: Relation between sBHAR and M
Considering that most ERGs are very faint ( g AB = 24.5) in optical, as shown in Figure 1 and Section 3.1.1, obtaining the BH virial mass from spectroscopy (e.g., McLure & Dunlop 2004) is time-consuming and difficult even if they are type-1 AGN. This means that the direct measurement of the Eddington ratio (λ Edd ≡ , is difficult at this stage. Instead, we investigate the SMBH properties through the specific black hole accretion rate (hereafter, sBHAR), which may be considered a proxy for the Eddington ratio. The sBHAR is conventionally defined as sBHAR = L AGN,bol /M erg s −1 M −1 (e.g., Mullaney et al. 2012). The left panel of Figure 5 shows the distribution of sBHAR for ERGs and NRGs. The median sBHAR of ERGs is log(sBHAR/erg s −1 M −1 ) = 35.3, which is an order of magnitude higher than that of the NRGs with log(sBHAR/erg s −1 M −1 ) = 34.0. Based on Best & Heckman (2012), the boundary of radiatively inefficient and efficient AGN is at an Ed-  dington ratio of log λ Edd ∼ −2, which corresponds to log(sBHAR/erg s −1 M −1 ) = 32-33. Thus, almost all of our sources are considered to be radiatively efficient AGN.
This difference in sBHAR between the two subgroups is more pronounced in the right panel of Figure 5 sBHAR(RV15) = 3.2 × 10 34 λ Edd M /10 10 M 0.05 .
The right panel of Figure 5 exhibits that ERGs are more clustered in the high Eddington ratio regime. The expected median Eddington ratio by using Eq. (4) is log λ Edd ≈ −0.4. Some sources appear to exceed the Eddington limit. NRGs cover a broader range in sB-HAR, and some massive galaxies with M > 10 11 M fall into the radiatively in-efficient regime with λ Edd < −2.
We caution that the above discussion does not take into account the redshift evolution of the M BH -M relation, which has been suggested in some studies (e.g., Woo et al. 2008;Merloni et al. 2010;Ding et al. 2020). In this case, the estimated M BH becomes larger at a given M , and the resulting Eddington ratio becomes smaller. Assuming the evolutional trend of Merloni et al. (2010), with M BH /M (z) ∝ (1 + z) 0.68 , and using a median redshift for the ERGs of z = 1.1, the normalization of sBHAR becomes slightly smaller by a factor of 1.6. As shown in Figure 5, this difference does not change our overall result. Although there are other significant uncertainties and possible systematics of the expected λ Edd values, under the basic assumptions of Equations (4) or (5), our results suggest that higher R rest sources have higher sBHAR, and ERGs might contain sources achieving super-Eddington accretion. We will discuss this point in Section 4.1.
In summary, our results indicate that ERGs exhibit high sBHAR with the expected Eddington ratio of log λ Edd ≈ −0.4, and some ERGs may be experiencing a super-Eddington phase. In addition, ERGs are likely starburst galaxies and their stellar-mass is relatively lower than NRGs. ERGs appear to be in a phase preceding the onset of AGN feedback. If we limit the sample to z < 1, most ERGs are low-mass galaxies with M < 10 10 M . Therefore, lower-z ERGs are also candidates of massive, rapidly growing BHs. . The relationship between Ljet/L AGN,bol = Rint and sBHAR. The yellow shaded area represents the corresponding λ Edd = 1 region assuming that MBH is based on the MBH-M relation by Kormendy & Ho (2013). The symbols are same as those in Figure 1, but other WERGS sources are shown with color gradation based on Rrest in the range of 1 < log Rrest < 4. The redshift evolution of the relation is summarized in the bottom panels of Figure 10. Figures 1 and 2 indicate that ERGs or high R rest sources tend to trace optically-faint, but radio-loud AGN. In addition, Figure 4 shows that higher R rest tends to pick up relatively lower stellar-mass population. This means that the selection of ERGs using R rest has a preference of selecting smaller M and, therefore, R rest using optical bands does not seem to trace AGN accretion disk emission in the optical band. This is not a new result, but it is a long-standing issue surrounding the use of the optical radio-loudness R rest : it is strongly affected by host galaxy contamination and/or the nuclear obscu-ration around the SMBHs (Terashima & Wilson 2003;Ho 2008). This raises the question of whether ERGs are intrinsically radio-loud or not.
To investigate this, we estimate the "intrinsic" radioloudness R int ; the energy balance between the jet and accretion-disk defined by R int = L jet /L AGN,bol , which is frequency-independent, and also free from the contamination of host galaxy component, unlike R rest (e.g., Terashima & Wilson 2003). For the accretion disk luminosity, we use the bolometric AGN luminosity L AGN,bol obtained from the IR AGN luminosity (L AGN,IR ), since L AGN,IR is derived from IR SED decomposition, which traces the re-radiation of the dust powered by AGN and is mostly insusceptible to absorption (Gandhi et al. 2009;Asmus et al. 2015;Ichikawa et al. 2012Ichikawa et al. , 2019. The total jet power of the AGN (L jet ) can be estimated from the radio luminosity of the compact core or from the total radio emission (Willott et al. 1999;Cavagnolo et al. 2010;O'Sullivan et al. 2011). We adopt the relation between L jet and L 1.4GHz of Cavagnolo et al. (2010), L jet = 7.3 × 10 43 (L 1.4GHz /10 24 W Hz −1 ) 0.70 erg s −1 .
Note that the above equation is derived with the assumption that most radio sources have a spectral index of α = 0.8. This value is almost consistent with the median value of our WERGS sample; α = 0.65 for the sources with detections at both TGSS 150 MHz and VLA/FIRST 1.4 GHz, or the fiducial value of α = 0.7 for the sources whose radio band detection is only in one band (i.e., VLA/FIRST 1.4 GHz only) (Condon 1992). Shabala & Godfrey (2013) recently investigated the effect of the radio source size in the jet power estimation, and found the dependence on the source size to be ∝ D 0.58 , suggesting that the effect of the source size is small. Since our sample is selected only for radiocompact emission in VLA/FIRST with a spatial resolution of ∼ 5 arcsec (or 20 kpc at z ∼ 1) this again mitigates the size dependence; otherwise the sources are ultra-compact sources.
The left panel of Figure 6 shows the distribution of L jet , spanning 44.0 < log(L jet /erg s −1 ) < 46.4, and the median values are log L jet /erg s −1 = 45.4 for ERGs and log L jet /erg s −1 = 44.6 for NRGs, which is equivalent to the radio-loud quasar level (e.g., Best et al. 2005;Inoue et al. 2017). This is naturally expected since we apply the simple conversion relation (Eq. 6), all of our sources are selected based on VLA/FIRST detection with the > 1 mJy flux limit, and the radio luminosity range is similar to that of the FIRST-SDSS sample, although they are located in relatively higher-z (see Figure 2). This high L jet is powerful enough to produce a radio jet with the size of 10 kpc that can disperse the interstellar medium (e.g., Nesvadba et al. 2017) and create cavities in the galactic inster-stellar medium and larger scale intergalactic environment to possibly quench starformation in host galaxies (e.g., McNamara & Nulsen 2007). Given the compact radio sizes of the sources in our sample (< 5 arcsec or 20 kpc at z ∼ 1) and given the fact that the host galaxies are still in an active star-forming phase (Figure 3), ERGs and the WERGS sources above or on the SF main-sequence may still be in the phase before the onset of huge kinetic radio feedback.
The middle panel of Figure 6 shows the distribution of R int , and the median values of the two populations are log R int = 0.0 for ERGs, and log R int = −0.5 for NRGs. For half of the ERGs, R int > 1, that is, the jet kinetic power is higher than the radiation from the accretion disk. The physical properties of higher R rest sources are more clearly illustrated in Figure 7, which shows the distribution of ERGs and other WERGS sources in the R int -sBHAR plane. The yellow shaded area is the region corresponding to λ Edd = 1 assuming Equation (4) for the range of 8 < log(M /M ) < 12. The other WERGS sources are shown with a color gradation in log R rest , with higher R rest lying toward the top right of the plot. This means that higher R rest sources tend to have both higher λ Edd and higher R int . It is well known that radio galaxies are more radio-loud as the Eddington-scaled accretion rate decreases, producing the sequence from the top left region toward the bottom right one in Figure 7 (Ho 2002;Merloni & Heinz 2007;Panessa et al. 2007;Sikora et al. 2007Sikora et al. , 2013Ho 2008), and overall NRGs also follow this trend. However, ERGs are located in the top right part of the plane, suggesting that ERGs contain rapidly growing BH in the center, and at the same time harbor powerful, compact radio-jets.
These radio galaxies are different from the conventional radio galaxies in the local universe (Seymour et al. 2007). Delvecchio et al. (2018) recently found that high-z (z > 1.5) radio galaxies discovered in the VLA/COSMOS survey tend to have high Eddington ratios with log λ Edd > −2 and that they are also starforming or starburst galaxies. This is consistent with the properties of our overall WERGS sample (see also Toba et al. 2019), and thus ERGs occupy the extreme end of highly accreting BHs in radio galaxies. Figure 7 indicates that ERGs are a very interesting population, showing both radiative (high λ Edd ) and jetefficient (high R int ) emission, which were missed in previous surveys. To investigate the jet properties in more detail, we here discuss the jet efficiency of ERGs and compare the obtained values with other radio sources. The jet production efficiency is defined as

η jet and the Origin of Radio Emission in ERGs
where η rad is the radiation efficiency of an AGN accretion disk, c is the speed of light, andṀ BH is the mass accretion rate onto the SMBH through the disk. In the following, we adopt a canonical value of η rad = 0.1 based on the Soltan-Paczynski argument (Soltan 1982). The distribution of η jet is shown in the right panel of Figure 6. The median values are log η jet = −1.0 ± 0.3 for ERGs and log η jet = −1.5 ± 0.4 for NRGs. These values are slightly higher or consistent with the nearby radio AGN population (log η jet ∼ −1.5) whose host galaxies are massive (M > 10 11 M ; Nemmen & Tchekhovskoy 2015). However, the local radio AGN population generally has much smaller Eddington ratios of log λ Edd < −2. This population would be located in the top left region in Figure 7, so they are clearly different from ERGs. The η jet of ERGs is higher by a factor of ∼ 10 compared to the SDSS radio quasar population, whose radio jet efficiency lies at log η jet = −2.0 ± 0.5 (e.g., van Velzen & Falcke 2013; Inoue et al. 2017). Considering that the SDSS radio quasars on average show log λ Edd ∼ −1, those radio quasars would be located at the bottom-center to bottom-right of Figure 7, which is again different to the distribution of ERGs. Therefore, the disk-jet connection may be fundamentally different from the standard disk model, which conventionally describes the quasar/AGN accretion disk emission.
One possible origin of this high production efficiency of radiation and jets is that the accretion disk of ERGs is actually in the "radiatively inefficient" state, but the physical origin of radiative inefficiency is different from the disks in the local radio galaxies. For example, the ERGs may be undergoing more rapid mass accretion (so called slim disk model; Abramowicz et al. 1988). Recent radiation hydrodynamical simulations suggest that when the mass accretion rate significantly exceeds the Eddington rate, radiation is effectively trapped within the accreting matter and advected to the central BH before escaping by radiative diffusion. As a result, the emergent radiation luminosity is saturated at the order of L Edd (i.e., η rad 0.1 at Ṁ Edd ) and the accretion flow turns into a radiatively inefficient state even with a super-Eddington accretion rate (e.g., Ohsuga et al. 2005Ohsuga et al. , 2009McKinney et al. 2014;Takeo et al. 2020). This phase is considered to be a key process of the BH seed growth in the high-z (z > 6) universe to describe the already known massive high-z SMBHs (e.g., Mortlock et al. 2011;Wu et al. 2015;Bañados et al. 2018;Onoue et al. 2019).
In the situation that the accretion rate is well above the Eddington rate, a relativistic jet can be launched by the Blandford-Znajek mechanism if large-scale magnetic fields exist in the innermost region (Blandford & Znajek 1977;Tchekhovskoy et al. 2011). The jet behavior of these extreme cases has been investigated by sev-  Kormendy & Ho (2013) and Reines & Volonteri (2015) are shown as dot-dashed (black; the fiducial relation in this study) and solid-line (gray), respectively. The orange shaded region represents the range of over-massive BH, which is at least one order of magnitude higher than the local relations (Case 1 in Section 4.3), and the region is delineated by 10 8 < M /M < 10 10 and 10 −2 < MBH/M < 1. The blue shaded region represents the range of under-massive BH, in which host galaxies form first, and BH grow later (Case 2 in Section 4.3). This region is delineated by 10 8 < M /M < 10 10 and 10 −2 < MBH/M and smaller than the local relation by Reines & Volonteri (2015). The red shaded region represents the forbidden region where the BH cannot exceed MBH eral authors Sadowski & Narayan 2015) using a 3D general relativistic radiation magnetohydrodynamical simulation (GRRMHD) code. They find that, in these (super-)Eddington accretion phase, BHs emit a significant amount of radiation and jet energy, and both the radiation and jet power approach around the Eddington limit, which appears to be the case for ERGs and the sources in the top right region in Figure 7 (see also Blandford et al. 2019). Thus, the study of ERGs at z ∼ 1 will be complementary to studies utilizing the next generation instruments capable of observing seed BHs in the early universe at z 7  in the investigations of the growth of seed BHs.

Alternative Scenarios: If ERGs Do Not Follow Local Scaling Relations between M BH and M
The discussion related to the Eddington ratio depends on the M BH estimation, which in turn relies on the local scaling relation of Kormendy & Ho (2013). However, it is not obvious that this relation also holds in the low-mass end in which many ERGs probably reside. Al-though the current data cannot give any constraints on this issue and a more detailed study is beyond the scope of this paper, we discuss two alternative scenarios, in which our galaxies are above (case one) or below (case two) the correlation between M BH and M as shown in Figure 8.
Case one reflects the possibility that ERGs might host over-massive BHs compared to the relation by Kormendy & Ho (2013), with the BH-to-stellar-mass ratio f b, ≡ M BH /M 10 −2 (see the orange shaded area in Figure 8). In this case, concerns surrounding super-Eddington accretion discussed in Section 3.3 and 4.1 are alleviated: the median Eddington ratio for ERGs can be parametrized by λ Edd 0.17 (f b, /0.01) −1 . In addition, the possible future growth path in the plane of M BH and M can be discussed by assuming the current BH/stellar mass assembly rate. Since the radiation efficiency is η rad = 0.1 in the sub-Eddington regime, the BH accretion rate is estimated asṀ BH = L AGN,bol /(η rad c 2 ) 1.7 × 10 −3 f −1 b, Ṁ Edd , where the median Eddington ratio above is substituted (note that this equation is valid at f b, 1.7 × 10 −3 ). Comparing the BH accretion rate to the sSFR of ERGs (sSFR= SFR/M 10 −8 yr −1 ; see Figure 3), we ob-tainṀ BH /SFR 4 × 10 −3 , which is independent of the choice of f b, . Therefore, the BH-galaxy mass ratio for ERGs is expected to approach a canonical value observed in the local universe (e.g., f b, ∼ 3 × 10 −3 ; Kormendy & Ho 2013). This evolutional path might be analogous to that of known high-z luminous quasars whose SMBHs are likely over-massive compared to the local relation with 10 −2 f b, 2 × 10 −1 (e.g., Wang et al. 2013Wang et al. , 2016Trakhtenbrot et al. 2015;Decarli et al. 2018).
Case two considers the possibility that ERGs might host very under-massive BHs with f b, 10 −3 -10 −4 as shown in the blue shaded region in Figure 8. For instance, the overall normalization for late-type galaxies may be considerably lower than for the local relation of early-type galaxies (e.g., Reines & Volonteri 2015;Läsker et al. 2016;Greene et al. 2019). This "galaxy grows first, BH come later" phase may occur because gas accretion onto the BHs is quite inefficient due to efficient stellar feedback, e.g., star formation-driven outflows, and the shallow halo potential until the stellarmass reaches a critical value of M ∼ 10 10.5 M (e.g., Bower et al. 2017;Habouzit et al. 2017). However, once the galaxy reaches the critical mass, above which star formation-driven outflows no longer prevent the gas accretion to the galactic nuclei, they abruptly switch to a rapid growth phase in the absence of feedback, and finally merge into the local relation. ERGs with M 10 10.5 M might be in such a rapidly growing phase.
The scenario above looks promising but it is worth discussing the critical mass value because the stellar mass of most ERGs actually lies below the expected critical mass. Beside, Figure 5 shows that the median sBHAR (and therefore λ Edd ) tends to be higher for less massive ERGs with the median Eddington ratio of λ Edd 1.7 (f b, /10 −3 ) −1 , suggesting that the extreme BH growth is possible (at least temporally) even in lower mass galaxies. This discrepancy may reflect that there are missing understanding on the physics of AGN feedback and gas feeding in low mass galaxies.
Another interesting point in the second scenario is that the Eddington ratio for several ERGs becomes as high as λ Edd ∼ 10 2 . Recent numerical simulations suggest that for a highly magnetized accretion disk around a rapidly spinning BH, the disk transits into a magnetically arrested disk (MAD) state and produces high radiative luminosity (e.g., McKinney et al. 2014;Sadowski & Narayan 2015). Through the emission process, the level of high luminosity could be achieved only when the BH mass accretion exceeds ∼ 10 3Ṁ Edd ; see also Figure 5 in Inayoshi et al. 2019). Therefore, if this scenario is true, future follow-up observations of those ERGs with such high Eddington ratios would reveal the properties of possibly hyper-Eddington accreting BHs (Takeo et al. 2020).

Prospects for AGN Feedback in Low-Mass Galaxies through ERGs
In recent years, the theoretical study of AGN feedback has been extended to low-mass galaxies (Silk 2017). This was motivated mainly by the increasing number of intriguing observational results showing signs of AGN feedback in low-mass galaxies (Nyland et al. 2017;Penny et al. 2018;Kaviraj et al. 2019;Dickey et al. 2019;Mezcua et al. 2019). Our results show that the jet power of L jet > 10 44 erg s −1 in ERGs is equivalent to that produced from radio quasars and local radio AGN residing in massive galaxies. This suggests the presence of a strong jet that can disperse or even expel the interstellar medium (e.g., Morganti et al. 2005;Holt et al. 2008;Nesvadba et al. 2017) or produce cavities in the host galaxies, which are often observed in the local massive counterparts (e.g., Rafferty et al. 2006;McNamara & Nulsen 2007;Bîrzan et al. 2008;Blandford et al. 2019). On the other hand, low stellar-mass ERGs are still in a starburst phase, indicative of a substantial gas reservoir, and lacking clear signatures of negative AGN feedback 2 .
To investigate whether such a negative feedback from the jet is effective in the host galaxies, it is necessary to obtain high spatial resolution radio images that resolve the host galaxy down to scales of < 10 kpc (Reines et al. 2020). Our sample of compact radio sources from VLA/FIRST with a spatial resolution of ∼ 5 arcsec only provide poor upper-bounds of the radio jet size of < 40 kpc at z ∼ 1. Future higher quality observations with sub-arcsecond resolution will provide more insight into the effect of the jet on the host galaxy.
Given that the BH mass of ERGs might reach the massive BH range below M BH ∼ 10 6 M , studying the properties of ERGs gives a window into understanding seed BH growth in the high-z universe. Some observational studies recently suggested that AGN feedback might also have positive effect on the host galaxies, and intensive in situ star formation is ignited in the outflowing gas launched by the AGN (Maiolino et al. 2017). Recent numerical simulations of jet-driven feedback also predict star-formation triggered on galaxy scales by the overpressure of the jet during the first few Myrs of strong interaction between the jet and the interstellar medium (Wagner & Bicknell 2011;Wagner et al. 2016;Gaibler et al. 2012;Mukherjee et al. 2018). If the radio jets in low-mass galaxies have a positive impact on both star formation and BH growth, then radio-selected AGN in low-mass galaxies are an intriguing population for understanding the BH and stellar mass assembly in the environment of shallow potential low-mass galaxies.

Wide Area Radio and Optical Surveys; Potential for Discovering Previously Missed Populations
The small number density of ERGs (∼ 1 source deg −2 ) and their optical faintness of i AB ∼ 25 suggest that ERGs were easily missed in previous optical surveys of radio sources. Our work shows that ERGs can be discovered with wide (> 100 deg 2 ) and deep (i AB 26) optical follow-up observations of radio sources. The number of known ERGs will increase in near future, once the Subaru/SSP survey is complete, covering an area of ∼ 10 3 deg 2 . The upcoming Subaru/Prime Focus Spectrograph (PSF; Takada et al. 2014) will conduct extensive optical and near-IR spectroscopic follow-up observations in the footprint of the HSC-SSP field, providing spec-z information of ERGs with i AB 24. The forth-coming LSST survey (Ivezić et al. 2019) will cover half of the sky, and it will also increase the number of sources by another order of magnitude, resulting in ∼ 10 4 sources. A significant fraction of these ERGs will also be expected to reside in low-mass galaxies. Therefore, the combination of wide area (either shallow or deep) radio and deep optical imaging and spectroscopic surveys will give us a statistically large number of massive BH candidates in the coming years.
Recently, Reines et al. (2020) employed the radiosurvey approach to search for massive BHs by using VLA/FIRST and conducting VLA high spatial resolution follow-up observations of local dwarf galaxies at z < 0.1, and found more than 10 candidates. One remarkable result was that most radio cores were not located at the center of the host, but off-nucleus, a possible signature of a previous merger. If the detection of an off-nucleus core is indeed the consequence of a galaxy merger, it indicates that the orbital decay of a black hole through dynamical friction is inefficient in this BH mass range, resulting in a population of BHs that fail to reach the galactic center within a Hubble time, since major mergers in dwarf galaxies becomes rare at z < 3 (Fitts et al. 2018). Currently there are many theoretical implication regarding wandering BHs (Comerford & Greene 2014;Blecha et al. 2016;Tremmel et al. 2018;De Rosa et al. 2020;Guo et al. 2020;Ricarte et al. 2021b,a). High spatial resolution radio imaging can pin down the locations of such wandering BHs, and large-scale statistical studies may provide insights into the frequency and environment of wandering BHs through the combination of LSST and VLA/FIRST or the ongoing VLASS (Lacy et al. 2020), which will achieve a sensitivity down to 0.1 mJy at 2-4 GHz, giving an order of magnitude deeper radio observations than VLA/FIRST.

Do Blazars Contaminate ERGs?
It could be argued that blazars might contaminate the sample of ERGs, and that therefore the optical emission is dominated by the jet component. Based on the known blazar sequence in the radio and optical bands (Fossati et al. 1998;Donato et al. 2001;Inoue & Totani 2009;Ghisellini et al. 2017), the observed radioloudness of blazars is in the range of log R obs ∼ 1.5 − 3. This range is obtained by using the luminosity range of L 1.4GHz = 10 41−43 erg s −1 and shifting redshift to the range of z ∼ 0-5. Therefore, blazars cannot reproduce extreme radio-loudness reaching log R rest > 4. Blazar contamination is unlikely also from the point of the observed optical magnitude. Considering that all ERGs are VLA/FIRST selected with a flux limit of > 1 mJy, the expected observed magnitude of blazars based on the blazar sequence is high with a peak around i AB = 21, and all cases are at i AB < 23.5 even when changing the luminosity range of the blazar SEDs and also shifting the redshift range to z = 0-5. Based on these arguments, the contamination of blazars to ERGs is unlikely.

CONCLUSION
The long-missing optical counterparts of radio-bright (f ν > 1 mJy at 1.4 GHz) VLA/FIRST population have been constructed by Yamashita et al. (2018) by utilizing the wide (> 100 deg 2 ) and deep (i AB < 26) Subaru/HSC SSP survey (named WERGS). The WERGS catalog contains a unique radio galaxy population with log R rest > 4, or extremely radio-loud galaxies (ERGs). The number density of ERGs indicates that they are a very rare population (∼ 1 source deg −2 ), and therefore it is not surprising that ERGs were missed in previous optical counterpart surveys of the radio sources. Combining multi-wavelength data in optical, IR, and radio bands (Toba et al. 2019), we have compiled a sample of 39 ERGs and 987 NRGs with radio-loudness is 1 < log R rest < 4.
We have investigated the properties of these unique ERGs and found the following key results: 1. Although the estimated SFRs are sensitive to assumed SFHs, and they might have large uncertainties with ∼ 0.5 dex, all ERGs are likely in the starforming or starburst phase reaching specific starformation rate of sSFR ∼ 10 −8 yr −1 , suggesting that ERGs might be in a rapid stellar-mass assembly phase with mass doubling times of ∼ 100 Myr. Besides, their stellar mass is relatively small, including the low-mass galaxies with M < 10 10 M .
2. IR detected ERGs are in a rapid BH accretion phase with high specific black hole accretion rate (sBHAR) with the expected Eddington ratio log λ Edd ≈ −0.4, and some ERGs may be experiencing a super-Eddington phase.
3. Sources with higher R rest tend to have both higher sBHAR (and thus higher λ Edd ) and higher intrinsic radio-loudness R int . This paints a different picture of radio galaxies compared to conventionally known local radio galaxies with low λ Edd . ERGs represent a population of unique radio galaxies characterized by both high λ Edd and high radio power.
4. The jet efficiencies of ERGs are typically η jet ∼ 0.1, which is similar to local radio AGN residing in massive galaxies, whereas it is an order of magnitude higher than the values estimated for SDSS radio-loud quasars. This indicates that, although the accretion disk is reaching super-Eddington accretion, it is in a radiatively inefficient phase (possibly in a slim disk configuration) due to rapid mass accretion onto the central BH.

ACKNOWLEDGMENTS
We thank the anonymous referee for helpful suggestions that strengthened the paper.
We thank Yoshiyuki Inoue for providing us the blazar sequence SED templates. This work is supported by Program for Establishing a Consortium for the Development of Human Resources in Science and Technology, Japan Science and Technology Agency (JST) and is partially supported by Japan Society for the Promotion of Science (JSPS) KAKENHI (18K13584 and 20H01939; K. Ichikawa, 18J01050 and 19K14759;Y. Toba, 16H03958, 17H01114, 19H00697;T. Nagao, 17J09016: T. Kawamuro, 19K03862: A. Y. Wagner). M. Charisi acknowledges support from the National Science Foundation (NSF) NANOGrav Physics Frontier Center, award number 1430284.
The Hyper Suprime-Cam (HSC) collaboration includes the astronomical communities of Japan and Taiwan, and Princeton University. The HSC instrumentation and software were developed by the National Astronomical Observatory of Japan (NAOJ), the Kavli Institute for the Physics and Mathematics of the Universe (Kavli IPMU), the University of Tokyo, the High Energy Accelerator Research Organization (KEK), the Academia Sinica Institute for Astronomy and Astrophysics in Taiwan (ASIAA), and Princeton University.
the Japan Society for the Promotion of Science (JSPS), Japan Science and Technology Agency (JST), the Toray Science Foundation, NAOJ, Kavli IPMU, KEK, ASIAA, and Princeton University.
This paper makes use of software developed for the Large Synoptic Survey Telescope. We thank the LSST Project for making their code available as free software at http://dm.lsst.org This paper is based on data collected at the Subaru Telescope and retrieved from the HSC data archive system, which is operated by Subaru Telescope and Astronomy Data Center (ADC) at NAOJ. Data analysis was in part carried out with the cooperation of Center for Computational Astrophysics (CfCA), NAOJ.
The Pan-STARRS1 Surveys (PS1) and the PS1 public science archive have been made possible through contributions by the Institute for Astronomy, the University of   Our results on ERGs rely on photo-z estimates, which affect physical values calculated using luminosity distance, e.g., stellar-mass and luminosities. Unfortunately, because most of our sources are optically faint, there are no spec-z confirmed ERGs in our sample. A concern is how erroneous photo-z estimates may affect our results.
If our photo-z values are severely under-estimated, it is clear that the absolute physical values such as stellar-mass, with the dependence of luminosity distance of D 2 L , have been underestimated, and therefore the results on low-mass BH mass and low-mass galaxy arguments discussed in Section 4.4 are no longer valid. One situation this might happen is that our photo-z estimation at z ∼ 0.3-0.5 is not tracing the Balmer break at ∼ 4000Å, but actually the Lyman break at z ∼ 3-4 (e.g., Tanaka et al. 2018).
On the other hand, physical quantities obtained from the ratios of two physical quantities have weaker redshift dependences. For both R int and sBHAR, the luminosity distance dependence is canceled out. The remaining zdependent factor is the k-correction between the observed flux and the rest frame flux. The physical quantities L jet , L AGN,IR (and therefore L AGN,bol ), and M , can be mainly traced by one-band flux (densities) of FIRST 1.5 GHz tracing radio Synchrotron emission (f ν,1.5GHz ), WISE MIR band(s) tracing the peak of AGN dust emission (f ν,12µm ), and i-band tracing stellar emission (f ν,i band ). Therefore, each physical quantity can be written as R int = L jet /L AGN,bol ∝ f ν,1.5GHz /f ν,12µm and sBHAR= L AGN,bol /M ∝ f ν,12µm /f ν,i band . We checked and confirmed the proportionality of these ratios in our sample.
We then investigated the k-correction by seeeing how much these observed values change as a function of redshift. In the radio-band, we assumed the same power-law f ν ∝ ν α with α = −0.7. In the optical and IR band, we used the quasar SED templates obtained by Hickox et al. (2017), which contain two templates of type-1 and type-2 quasars, whose optical band is dominated by accretion disk and stellar-component, respectively. Since the SED extends only down to 2000Å, we extrapolated the SED with α = −0.5 for type-1 QSOs and α = −2.4 for type-2 QSOs in order to smoothly connect the SED template (eg., see also Richards et al. 2006;Polletta et al. 2007, for other SED templates). Given their high radio-loudness and as we discussed in Section 3, the SED of ERGs is likely similar to type-2 QSOs, whose optical emission is dominated by the stellar-component. Figure 9 shows the k-correction of each physical value as a function of redshift. The k-correction changes by a factor of up to 4.5 for R int and factor of 3 for sBHAR even when the targets are shifted at z ∼ 4. This suggests that an erroneous photo-z does not strongly affect our overall results for R int and sBHAR.

B. REDSHIFT DEPENDENCE OF OUR RESULTS
The physical properties between ERGs and NRGs are explored in Section 3 and discussed in Section 4. As shown in Figure 2, NRGs distribute smoothly in a wide redshift range at 0.3 < z < 1.6, while ERGs are slightly clustered at z > 0.8. Thus, one might wonder whether the difference in physical parameters between ERGs and NRGs is caused by the redshift difference. We here split the sample into the three redshift bins (z = 0.3-0.8, 0.8-1.2, and 1.2-1.6) for mitigating such redshift dependence and then compare the key parameters of the two populations.
The top panels show the relations between SFR and M , which are similar to ones with Figure 3. Most of the NRGs reside below the MS in the 0.3 < z < 0.8, then the fraction of sources on and above the MS increases as a function of redshift. On the other hand, ERGs always locate the lower M range in each redshift bin and most of them are above the MS (except several sources on and below the MS). Considering that the stellar-mass is estimated from the photometries mainly in the optical or near-IR bands, the deeper Subaru/HSC photometreis enable us to find this smaller stellar-mass population as shown in Figure 4.
The middle panels show the relations between sBHAR and M , which are the redshift-divided version of Figure 5. At 0.3 < z < 0.8, most of the NRGs are located at low sBHAR and high M cluster region where sBHAR/erg s −1 M −1 ∼ 10 33.5 and M ∼ 10 11.5 M , and then they move to higher sBHAR sequence with redshift. On the other hand, ERGs always locate at the top edge of the sequence, keeping high sBHAR values and sometimes reaching the sBHAR equivalent to λ Edd ∼ 1.
Finally, the bottom panels show the relation between R int and sBHAR, which are the redshift-divided ones of Figure 7. Most of the NRGs are located at the top left region at 0.3 < z < 0.8, and then the population moves to higher sBHAR and smaller R int sequence, moving to bottom-center or bottom-right of the panel. This is a consistent view of local radio galaxies at lower-z and radio quasar population at higher-z as discussed in Section 4.2. On the other hand, ERGs reside in the top right part of the plane, and does not show a redshift evolution, because of its extreme selection cut of log R rest > 4, limiting the sample only in the top right area of the panel.