Editing NNTPChan

From Bibliotheca Anonoma

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.

Latest revision Your text
Line 68: Line 68:


thoughts on above: how can mods determinte intent of poster? :p
thoughts on above: how can mods determinte intent of poster? :p
==== Spam Detection without IP Bans ====
The problem on NNTPChan is similar to that of email: since IP bans are difficult, heuristic spam solutions have had to be made.
<pre>
what we have inside tor is probably the best you can get
federated inside tor
servers don't see ips, users can't find servers, servers can't see servers
the biggest drawback is no ip bans
but we've dealt with it for long enough that we're used to it
antonizoon
so how do you handle that
__uguu__ (IRC)
brute force man power
:p
no other way
antonizoon
how about data mining
such as the already prevalent spam detection systms
for email
_
__uguu__ (IRC)
how do you see that working?
antonizoon
well it works for email spam
with 1% false positive
it's already a very mature field at least in that narrow focus
_
__uguu__ (IRC)
i was legit considering just bolting on spamasssasin
antonizoon
what stopped you
_
__uguu__ (IRC)
just having to do that
and then training it
mostly training
antonizoon
well hey
we got tons of 4chan archived data uploaded to the internet archive to form a dataset
_
__uguu__ (IRC)
ohay
nice
antonizoon
that can be used to establish expected behavior
_
__uguu__ (IRC)
that would work
for ham at least
what about for spam?
antonizoon
https://archive.org/details/archive-moe-files-a
archive-moe-files-a : Free Download & Streaming : Internet Archive
- Internet Archive
so at least these sort of past discussions can be used to establish "expected" behavior
as for spam how applicable would existing detection sets be for your use case
they usually detect stuff like floods of random trash or nigerian scam emails
and they might be a bit tuned for emails, but there must be precedent for their application in discussions
_
__uguu__ (IRC)
we use the same format as email
multipart mime
antonizoon
perfect
_
__uguu__ (IRC)
that is why i thought of it
i'll check out what hooks SA has
found a go library
this will be simple
_
possibly
</pre>
=== Spamassassin ===
uguu is testing SpamAssassin on their NNTPChan nodes. Email spam filtering is a good fit for NNTP's MIME format, which has strong affinity with modern email formats.
* If it's detected as spam, it doesn't go into a newsgroup or federate.
* it also checks for dkim, but NNTPChan doesn't use DKIM, so disable this option.
* Some types of writing such as Cyrillic will be detected as spam by default, so make sure to allow non-latin charsets.
* it might be possible to integrate it into the mod system so that moderators can train the system to detect manually discovered examples of spam, or review detected spam. This will likely reduce the majority of spam and thus moderator time used processing them, and thus make it easier for people to run their own nodes. Note that DMCA and CP content will still need to be discovered manually and reviewed on a case by case basis.


=== current federation peering policies ===
=== current federation peering policies ===
Please note that all contributions to Bibliotheca Anonoma are considered to be released under the Creative Commons Attribution-ShareAlike (see Bibliotheca Anonoma:Copyrights for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource. Do not submit copyrighted work without permission!
Cancel Editing help (opens in new window)