Login to endorse this review.
Remarks to the Author:
This paper examines a rich class of whole genome regression models for GWAS. The method is applied to a large number of real traits and compared to several existing approaches, such as GCTA, which is a special case of the authors model. In particular the model can scale the contribution of SNPs according to MAF, LD and imputation quality, as well as being able to partition SNPs into classes. On the real datasets the method is shown to provide a better fit in many cases, and produce substantially different estimates of heritability. I think this is an important paper and I recommend publication, but some more work is needed to improve clarity, discuss modelling assumptions and provide more realistic simulations.
I have organised my comments by section
For the 29% average increase in heritability can you make it clearer what the baseline model is here? Also, since you are averaging over several different traits it might be better to give mean and median increase?
It would be nicer to mention via references that estimation of heritability goes back well before GWAS.
In para 2 (line 34), before you mention over-fitting, you might mention here what model you are fitting ie. a genome-wide regression model that has more parameters than samples with independent errors. You could give a baseline version of the model in eqn1 (i.e. the GCTA model) in the Introduction.
Please reference and discuss the LDscore approach early on in this section. This method makes a different assumption about how LD affects effect size priors. You MUST discuss this.
You have not justified biologically why two SNPs in perfect LD might have half the heritability. A counter argument would be that each SNP has an equal chance of disrupting the genetic basis of a disease apriori. Also mutation is thought to be mostly independent of recombination, but you are essentially assuming that these are linked processes in the context of human traits. You must discuss these issues.
Can what you are doing with LD weight essentially be thought of as approximately reducing the full set of SNPs to a set of tags? You have this in Table 1 as effective number of independent SNPs.
Make it clearer here what are the two models you fit when calculating the LRT statistic i.e partitioned versus non-partitioned model.
If you take all the partitioned models which value of alpha has the best log-likelihood? Can you plot the log-likelihood curve with alpha as the x-axis? I saw this in the Supplement but it would be much better as a Figure in the main text.
When including rare SNPs it seems that your choice of a best alpha would change, based on what you say in para 2 page 5. You should do a version of Fig2 with rare variants included and discuss. I find it hard to imagine that there is one "true" alpha across all traits.
In Fig 3A I don't like how the lines disappear off the the y-axis scale i.e. what is the Epilepsy LDSC value?? This needs to be changed.
Why is the LDSC error bar so wide compared to the other methods? This looks very bad and should be commented on.
In Fig 3B the simulated scenarios are to basic with just 1000 causal SNPs and an h^2 =0.8. Can you please do realistic simulations in which you simulate genome-wide datasets with LD. There should be a relatively small number of causal SNPs but lower levels of association should be seen around the causal loci due to LD. I want to know how the methods perform in this more realistic scenario.
In the subsection "Which LD model is best?" can you find a way to make it clearer which models you are comparing with your LRT statistic analysis. Maybe a table showing the models. It's not clear what you do "so that the two tranches are predicted to contribute equally under the.." models.
It was interesting to read about testing for correlation between genotyping errors and phenotype, however it wasn't clear if you were testing just genotyped SNPs or genotyped and imputed SNPs. You should try to make a distinction between these two types of SNPs. The text on page 7 seems to suggest you are looking at imputed SNPs since you have so many in the UCLEB cohort and this started off with just 200k SNPs. As such this dataset is not typical of most GWAS datasets with imputation quality likely to be much lower than usual. You should include a parallel analysis that uses imputed SNPs from a GWAS that has genotyped SNPs on a much denser array.
The DHS analysis is very striking in the difference between GCTA and LDAK. I am surprised that this result is not mentioned more prominently in the abstract.
Minor comments and typos
On line 40 I don't think the word Individual should have a capital I. Also maybe make it a little clearer that m is the number of SNPs and this is usually large.