Open Penguin Data Project

mission: to leverage the SEO community to build a model of reasonable approximation of the likelihood a site will be flagged by Penguin 2.0 - style updates.

methodology: build a training set of URL/keyword pairs of those impacted and not impacted by Penguin 2.0 - style updates. Open up the training set to the community for creating features / variables. Allow the community to run various statistical methods on the cumulative data.

assumptions: as with any study, there were several assumptions that I made. First, I assume that any URL/Keyword pair that lost 7 or more position rankings on the 22nd, resulting in a page 2 or worse ranking, after holding steady first page rankings the 5 previous days, that was neither a local nor a time-sensetive posts was hit by Penguin. This means the data could be missing entries that lost fewer than 7 positions or stayed on page 1 that were hit by Penguin. This means the data could wrongfully include entries that were penalized for other reasons on the same day. data:
  1. Keyword List: [csv]
  2. URL List: [csv]
  3. Ranking Data Set: [csv]
  4. Current Variable Set: [csv]

studies:
  1. Current Mean Spearman Correlations

providers:
Everyone who provides data to this project will receive recognition here.

Upload Data:
Instructions: Download the full data set from above and create a new CSV where the first 3 columns are still Penguin Status | Keyword | URL and the fourth and subsequent columns are your variable scores. The first row should include your variable name.