Editing Ayase

From Bibliotheca Anonoma

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.

Latest revision Your text
Line 10: Line 10:


* Operating System: Any Linux system on any architecture supported by Rust and Python.
* Operating System: Any Linux system on any architecture supported by Rust and Python.
* Database: PostgreSQL based - Tenatively due to the end of support of TokuDB and the awful gotchas in mariadb, we should consider Postgresql.
* Database: TimescaleDB/PostgreSQL - As TimescaleDB is a superset of PostgreSQL, for smaller scale deployments it is also fully PostgreSQL compatible.
    * TimescaleDB/PostgreSQL - As TimescaleDB is a superset of PostgreSQL, for smaller scale deployments it is also fully PostgreSQL compatible.
    * CockroachDB - Seems better for horizontal scaling and decentralized resilience, but the whole dataset is only 1tb might not yet be needed.
* Middleware/HTML Frontend: [https://github.com/bibanon/ayase Ayase] (Python, FastAPI, Jinja2 HTML templates) - An Ayase and Asagi schema compatible frontend system for viewing the databases created by these scrapers.
* Middleware/HTML Frontend: [https://github.com/bibanon/ayase Ayase] (Python, FastAPI, Jinja2 HTML templates) - An Ayase and Asagi schema compatible frontend system for viewing the databases created by these scrapers.
* Scraper:
* Scraper:
Line 228: Line 226:


Another thing is that maybe we shouldn't have separate tables for every board like Asagi currently does. If Reddit or 8chan's Infinity platform was getting archived by this, it would be impractical to operate. While having a single table sounds like lunacy as well, PostgreSQL allows tables to be partitioned based on a single column, so an additional `board` column can be added.
Another thing is that maybe we shouldn't have separate tables for every board like Asagi currently does. If Reddit or 8chan's Infinity platform was getting archived by this, it would be impractical to operate. While having a single table sounds like lunacy as well, PostgreSQL allows tables to be partitioned based on a single column, so an additional `board` column can be added.
=== PostgreSQL Optimizations ===
To improve performance, PostgreSQL may need a lot more tuning than Mariadb.
The basics if you dont do anything else:
https://bun.uptrace.dev/postgres/performance-tuning.html
You also want to cluster tables and vacuum full manually every now and then. And you want to disable autovacuum when migrating data.


=== Single Table without Side Tables or Triggers ===
=== Single Table without Side Tables or Triggers ===
Line 368: Line 356:


https://gist.github.com/oka-tan/f794a6ac464a09f3581d0ac530e37b92
https://gist.github.com/oka-tan/f794a6ac464a09f3581d0ac530e37b92
== Page Views ==
The views we use to display data to users from the SQL database or search engine make up the primary mechanism that users interact with the system, with the API being for the most part, reflections of such views.
All anonymous users are authorized only to read the database. A separate API and authentication system is required for admins and moderators to edit the content of the SQL Database through approved mechanisms.
=== Pagination ===
Foolfuuka had an abysmal offset based pagination which put more and more stress on the SQL database the further back anons went. However, without a powerful search server on many archives, this was the only way for them to browse further back in the archives without knowing the thread or post number.
* Pagination Methodology: Cursor Pagination - Although 4chan uses offset pagination to provide a set of handy page numbers, page numbers don't make sense on an archive as they are a relative marker that changes every time a new thread is added, and offset pagination becomes more inefficient as the user goes further back.
* Thread Sort Order (Default: Last Modified) - The thread sort order should be last modified by default, as it is on 4chan and FoolFuuka.
** An admin may elect to have the WebUI display sort by thread creation date instead. To ensure disambiguation, specifying the search ordering method is required in the API whichever the default is.
== Queries ==
Queries are methods for users to request arbitrary content that is outside of the page views but doesn't resort to utilizing the search server.
=== Post Number ===
There should be an option to find a specific post number whether the user knows the board it was on, or not (a frequent situation when looking up screencaps).
Once submitted:
# If a specific board was defined, it should send the user to https://archive.url/board_name/post/12345678
# If no specific board was defined (important for looking up a post number from a screencap), it should send the user to https://archive.url/post/12345678
#* The results should be a post on all boards that matches the post number, of which there may be multiple results or only one.
Also, if the post number points to an OP post, it should direct the user to the thread URL instead of the post url.
These URLs should be usable without having to enter them in the WebUI.
== Experimental Concepts ==
These are experimental concepts which are not yet part of the Ayase standard.


=== PostgreSQL RBAC Row Permission System ===
=== PostgreSQL RBAC Row Permission System ===
Please note that all contributions to Bibliotheca Anonoma are considered to be released under the Creative Commons Attribution-ShareAlike (see Bibliotheca Anonoma:Copyrights for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource. Do not submit copyrighted work without permission!
Cancel Editing help (opens in new window)