MediaWiki talk:Spam-blacklist
The list of regex fragments on this page is used to check URLs. It doesn't match on words outside of links and has a specialized format for some matching functions, see here: Extension:SpamBlacklist. The interesting aspect of this page is that we can opt to Blacklist EVERYTHING using a regex fragment that matches all, and then make use of MediaWiki:Spam-whitelist to specifically allow domains we know are friendly. The latter method might end up being the better/easier-to-manage option for us. --TheFae (talk) 20:04, 24 July 2017 (UTC)
- Let's see how we go with a blacklist first for 2 weeks... if we get tired, we'll go with a whitelist.
- This page itself won't confuse a search engine though, would it? They'll know we're not trying to spam SEO for these words? --Vadi (talk) 08:43, 25 July 2017 (UTC)
- Most of the content on this page is nonsense and very few words appear in links both to and from the page. Crawlers are likely to rank this page as having low importance because it lacks a rich link and content structure.
- One thing we can do is add the URI of this page to a robots.txt for the wiki domain and instruct crawlers not to index it. This might not prevent them from scanning the file but would inform legit crawlers that the contents aren't intended for indexing and so we aren't trying to key-word or back-link spam with all these domain-like bits on the page. --TheFae (talk) 16:56, 25 July 2017 (UTC)
Can we keep discussion in github issues where we discuss every things? I find it hard to review discussions in multiple places. Also I find whitelist rather troublesome, as we may hinder legit users from linking domains which we just did not recognise before. --Kebap (talk) 17:33, 30 July 2017 (UTC)