Latest revision |
Your text |
Line 1: |
Line 1: |
| Exported from: https://scalar.vector.im/etherpad/p/!ouONyEOxwYkSvOaljT_matrix.org_bibanon-scratchpad
| | https://github.com/majestrate/nntpchan/ https://2hu-ch.org/thread-6a5153ae138db794d7a0ed33224812ab803480ea.html |
|
| |
|
| https://github.com/majestrate/nntpchan/ https://2hu-ch.org/thread-6a5153ae138db794d7a0ed33224812ab803480ea.html
| | = NNTPChan = |
|
| |
|
| A chan which is designed to be fully decentralized by using the NNTP service, akin to USENET but leveraging cryptographic signing. Could be a great way to have a freeform anonymous board where hosting costs can easily be federated and smaller chance can function as one. | | A chan which is designed to be fully decentralized by using the NNTP service, akin to USENET but leveraging cryptographic signing. Could be a great way to have a freeform anonymous board where hosting costs can easily be federated and smaller chance can function as one. |
Line 68: |
Line 68: |
|
| |
|
| thoughts on above: how can mods determinte intent of poster? :p | | thoughts on above: how can mods determinte intent of poster? :p |
|
| |
| ==== Spam Detection without IP Bans ====
| |
|
| |
| The problem on NNTPChan is similar to that of email: since IP bans are difficult, heuristic spam solutions have had to be made.
| |
|
| |
| <pre>
| |
| what we have inside tor is probably the best you can get
| |
| federated inside tor
| |
| servers don't see ips, users can't find servers, servers can't see servers
| |
| the biggest drawback is no ip bans
| |
| but we've dealt with it for long enough that we're used to it
| |
| antonizoon
| |
| so how do you handle that
| |
| __uguu__ (IRC)
| |
| brute force man power
| |
| :p
| |
| no other way
| |
| antonizoon
| |
| how about data mining
| |
| such as the already prevalent spam detection systms
| |
| for email
| |
| _
| |
| __uguu__ (IRC)
| |
| how do you see that working?
| |
| antonizoon
| |
| well it works for email spam
| |
| with 1% false positive
| |
| it's already a very mature field at least in that narrow focus
| |
| _
| |
| __uguu__ (IRC)
| |
| i was legit considering just bolting on spamasssasin
| |
| antonizoon
| |
| what stopped you
| |
| _
| |
| __uguu__ (IRC)
| |
| just having to do that
| |
| and then training it
| |
| mostly training
| |
| antonizoon
| |
| well hey
| |
| we got tons of 4chan archived data uploaded to the internet archive to form a dataset
| |
| _
| |
| __uguu__ (IRC)
| |
| ohay
| |
| nice
| |
| antonizoon
| |
| that can be used to establish expected behavior
| |
| _
| |
| __uguu__ (IRC)
| |
| that would work
| |
| for ham at least
| |
| what about for spam?
| |
| antonizoon
| |
| https://archive.org/details/archive-moe-files-a
| |
| archive-moe-files-a : Free Download & Streaming : Internet Archive
| |
| - Internet Archive
| |
| so at least these sort of past discussions can be used to establish "expected" behavior
| |
| as for spam how applicable would existing detection sets be for your use case
| |
| they usually detect stuff like floods of random trash or nigerian scam emails
| |
| and they might be a bit tuned for emails, but there must be precedent for their application in discussions
| |
| _
| |
| __uguu__ (IRC)
| |
| we use the same format as email
| |
| multipart mime
| |
| antonizoon
| |
| perfect
| |
| _
| |
| __uguu__ (IRC)
| |
| that is why i thought of it
| |
| i'll check out what hooks SA has
| |
| found a go library
| |
| this will be simple
| |
| _
| |
| possibly
| |
| </pre>
| |
|
| |
| === Spamassassin ===
| |
|
| |
| uguu is testing SpamAssassin on their NNTPChan nodes. Email spam filtering is a good fit for NNTP's MIME format, which has strong affinity with modern email formats.
| |
|
| |
| * If it's detected as spam, it doesn't go into a newsgroup or federate.
| |
| * it also checks for dkim, but NNTPChan doesn't use DKIM, so disable this option.
| |
| * Some types of writing such as Cyrillic will be detected as spam by default, so make sure to allow non-latin charsets.
| |
| * it might be possible to integrate it into the mod system so that moderators can train the system to detect manually discovered examples of spam, or review detected spam. This will likely reduce the majority of spam and thus moderator time used processing them, and thus make it easier for people to run their own nodes. Note that DMCA and CP content will still need to be discovered manually and reviewed on a case by case basis.
| |
|
| |
|
| === current federation peering policies === | | === current federation peering policies === |