Editing FoolFuuka/Asagi
From Bibliotheca Anonoma
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.
The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.
Latest revision | Your text | ||
Line 3: | Line 3: | ||
<blockquote>Note: Not to be confused with the FoolFuuka frontend, which uses the Asagi scraper. | <blockquote>Note: Not to be confused with the FoolFuuka frontend, which uses the Asagi scraper. | ||
</blockquote> | </blockquote> | ||
https://github.com/eksopl/fuuka/wiki/Sphinx-Search-Backend#gory-details | https://github.com/eksopl/fuuka/wiki/Sphinx-Search-Backend#gory-details | ||
Line 9: | Line 10: | ||
The Asagi Imageboard Archival Standard was developed by eksopl of Easymodo and the Foolz team under the direction of woxxy. It was developed to run the Foolz archiver, and has been the engine for the majority of archivers since the collapse of Archive.moe. | The Asagi Imageboard Archival Standard was developed by eksopl of Easymodo and the Foolz team under the direction of woxxy. It was developed to run the Foolz archiver, and has been the engine for the majority of archivers since the collapse of Archive.moe. | ||
Three versions can be identified: | |||
* Mark I (2009) - Produced for Foolz.us. Maybe in use by Nyafuu, was in use by Loveisover. | |||
* Mark II (2015) - Produced for Archive.moe. Used by Fireden and arch.b4k.co. | |||
* Mark III (2019) - The final reference standard codified by the Bibliotheca Anonoma, in preparation for the development of new drop-in replacements. | |||
== Reference Implementation == | == Reference Implementation == | ||
Line 27: | Line 22: | ||
** Despite the fact that PostgreSQL is supported by Asagi Scraper, it is not supported by FoolFuuka. | ** Despite the fact that PostgreSQL is supported by Asagi Scraper, it is not supported by FoolFuuka. | ||
* Scraper: Asagi (Java) - https://github.com/bibanon/asagi | * Scraper: Asagi (Java) - https://github.com/bibanon/asagi | ||
* Frontend: FoolFuuka (PHP) - https://github.com/ | * Frontend: FoolFuuka (PHP) - https://github.com/pleebe/FoolFuuka/tree/experimental | ||
* PHP Engine: Historically HHVM, PHP5.x compatible. Desuarchive and 4plebs now uses PHP 7. | * PHP Engine: Historically HHVM, PHP5.x compatible. Desuarchive and 4plebs now uses PHP 7. | ||
* Search: Sphinxsearch | * Search: Sphinxsearch | ||
Line 46: | Line 41: | ||
https://github.com/eksopl/asagi/wiki/Running-Asagi | https://github.com/eksopl/asagi/wiki/Running-Asagi | ||
Also check [[ | Also check [[FoolFuuka/Install/Ubuntu16#Install_and_compile_Asagi_from_source.]] | ||
=== FoolFuuka === | === FoolFuuka === | ||
Line 52: | Line 47: | ||
https://blog.foolz.us/ | https://blog.foolz.us/ | ||
Also check [[ | Also check [[FoolFuuka/Install/Ubuntu16]] | ||
= How asagi does stuff = | |||
== Misc == | == Misc == | ||
Line 117: | Line 57: | ||
AbstractDumper.java ln. 95: <code>public void initDumper(BoardSettings boardSettings) {</code> | AbstractDumper.java ln. 95: <code>public void initDumper(BoardSettings boardSettings) {</code> | ||
== How | == How asagi decides to update a thread: == | ||
?dunno? | |||
== When asagi does a thread update: == | |||
== When | |||
in : YotsubaJSON.java, ln. 88: <code>public Topic getThread(int threadNum, String lastMod) throws ContentGetException, ContentParseException, CfBicClearParseException {</code> | in : YotsubaJSON.java, ln. 88: <code>public Topic getThread(int threadNum, String lastMod) throws ContentGetException, ContentParseException, CfBicClearParseException {</code> | ||
Line 138: | Line 67: | ||
Loads thread JSON Decodes JSON For each post in the decoded thread JSON: Check if resto value is zero, and if so create a new thread from that post, updating lastmodified time to the time from fetching the JSON. <code>t = this.makeThreadFromJson(pj);</code> If resto is zero: Add the post to the current thread. <code>t.addPost(this.makePostFromJson(pj));</code> (What if two posts were resto==0? We’d break!) | Loads thread JSON Decodes JSON For each post in the decoded thread JSON: Check if resto value is zero, and if so create a new thread from that post, updating lastmodified time to the time from fetching the JSON. <code>t = this.makeThreadFromJson(pj);</code> If resto is zero: Add the post to the current thread. <code>t.addPost(this.makePostFromJson(pj));</code> (What if two posts were resto==0? We’d break!) | ||
== What does | == What does asagi do with a post in a thread? == | ||
Relevant files: >YousubaJSON.java >YotsubaAbstract.java | Relevant files: >YousubaJSON.java >YotsubaAbstract.java | ||
Line 188: | Line 117: | ||
Sanitized? EXIF data YotsubaJSON.java (ln. 212): <code>p.setExif(this.cleanSimple(this.parseMeta(pj.getCom(), pj.getUniqueIps(), pj.getSince4pass(), pj.getTrollCountry())));</code> Return the Post() object. | Sanitized? EXIF data YotsubaJSON.java (ln. 212): <code>p.setExif(this.cleanSimple(this.parseMeta(pj.getCom(), pj.getUniqueIps(), pj.getSince4pass(), pj.getTrollCountry())));</code> Return the Post() object. | ||
== How | == How asagi handles an image in a thread? == | ||
Files of note: >Local.java - Saving image files | Files of note: >Local.java - Saving image files | ||
Line 194: | Line 123: | ||
Local.java ln.201: <code>public void insertMedia(MediaPost h, Board source, boolean isPreview) throws ContentGetException, ContentStoreException, CfBicClearParseException {</code> | Local.java ln.201: <code>public void insertMedia(MediaPost h, Board source, boolean isPreview) throws ContentGetException, ContentStoreException, CfBicClearParseException {</code> | ||
== How | == How asagi deals with post deletions? == | ||
Relevant files: >src.java >src.java >src.java | Relevant files: >src.java >src.java >src.java | ||
Line 263: | Line 192: | ||
Post ID number from 4chan. Passed through as-is. | Post ID number from 4chan. Passed through as-is. | ||
==== | ==== “subnum“ ==== | ||
Ghostpost ID number for foolfuuka. Always zero.? | Ghostpost ID number for foolfuuka. Always zero.? | ||
Line 321: | Line 250: | ||
==== N/A -> “Deleted” ==== | ==== N/A -> “Deleted” ==== | ||
Set initially as false, then later updated if post is later absent from thread during subsequent updates. (YoutsubaJSON.java, ln. 205): | Set initially as false, then later updated if post is later absent from thread during subsequent updates. (YoutsubaJSON.java, ln. 205): | ||
(YoutsubaJSON.java, ln. 173): | <pre class="p.setdeleted(false);```"> | ||
#### “Capcode -> "Capcode" | |||
(YoutsubaJSON.java, ln. 173):</pre> | |||
String capcode = pj.getCapcode(); if (capcode != null) { if (capcode.equals(“manager”) || capcode.equals(“Manager”)) { capcode = “G”; } else { capcode = capcode.substring(0, 1).toUpperCase(); } }” ``` | |||
==== “email” -> “Email” ==== | ==== “email” -> “Email” ==== | ||
Line 385: | Line 307: | ||
== Images table values: == | == Images table values: == | ||
These seem to be handled by triggers that run on post insert. | These seem to be handled by triggers that run on post insert. ### 4ch -> (asagi) -> DB | ||
=== N/A -> (Incremental integer) -> “media_id” === | === N/A -> (Incremental integer) -> “media_id” === | ||
Line 399: | Line 319: | ||
=== N/A -> (Local filepath to full image) -> “media” === | === N/A -> (Local filepath to full image) -> “media” === | ||
The relative path to the image on disk. (Triggers.sql ln.119-139): <code> | The relative path to the image on disk. (Triggers.sql ln.119-139): <code>todo-codeblock</code> | ||
=== N/A -> (Local filepath to OP thumbnail) -> “preview_op” === | === N/A -> (Local filepath to OP thumbnail) -> “preview_op” === | ||
The relative path to the image on disk. (Triggers.sql ln.119-139): <code> | The relative path to the image on disk. (Triggers.sql ln.119-139): <code>todo-codeblock</code> | ||
=== N/A -> (Local filepath to reply thumbnail) -> “preview_reply” === | === N/A -> (Local filepath to reply thumbnail) -> “preview_reply” === | ||
The relative path to the image on disk. (Triggers.sql ln.119-139): | The relative path to the image on disk. (Triggers.sql ln.119-139): | ||
<pre class="todo-codeblock```"> | |||
### N/A -> (incerementer) -> "total" | |||
The number of posts that refer to this row. | The number of posts that refer to this row. | ||
(Triggers.sql) ln.123):</pre> | |||
(Triggers.sql ln.123): < | INSERT INTO "%%BOARD%%_images" (media_hash, media, preview_op, total)” <code>(Triggers.sql ln.127):</code>total = (total + 1),``` | ||
(Triggers.sql ln.127): <code>total = (total + 1) | |||
=== N/A -> (N/A) -> “banned” === | === N/A -> (N/A) -> “banned” === | ||
Not set by Asagi, but observed to prevent downloading banned files (Triggers.sql ln.119-139): | Not set by Asagi, but observed to prevent downloading banned files (Triggers.sql ln.119-139): | ||
= | <pre class="todo-codeblock```"> | ||
##### Table definition | |||
(Boards.sql ln.38-49): - _Table definition_</pre> | |||
CREATE TABLE %%BOARD%%_images ( media_id SERIAL NOT NULL, media_hash character varying(25) NOT NULL, media character varying(20), preview_op character varying(20), preview_reply character varying(20), total integer NOT NULL DEFAULT ‘0’, banned smallint NOT NULL DEFAULT ‘0’, PRIMARY KEY (media_id), UNIQUE (media_hash) ); | CREATE TABLE %%BOARD%%_images ( media_id SERIAL NOT NULL, media_hash character varying(25) NOT NULL, media character varying(20), preview_op character varying(20), preview_reply character varying(20), total integer NOT NULL DEFAULT ‘0’, banned smallint NOT NULL DEFAULT ‘0’, PRIMARY KEY (media_id), UNIQUE (media_hash) ); | ||
<pre> | |||
##### Image insert procedure | |||
(Triggers.sql ln.119-139): - Image insert procedure</pre> | |||
DROP PROCEDURE IF EXISTS “insert_image_%%BOARD%%”; CREATE PROCEDURE “insert_image_%%BOARD%%” (n_media_hash VARCHAR(25), n_media VARCHAR(20), n_preview VARCHAR(20), n_op INT) BEGIN IF n_op = 1 THEN INSERT INTO "%%BOARD%%_images" (media_hash, media, preview_op, total) VALUES (n_media_hash, n_media, n_preview, 1) ON DUPLICATE KEY UPDATE media_id = LAST_INSERT_ID(media_id), total = (total + 1), preview_op = COALESCE(preview_op, VALUES(preview_op)), media = COALESCE(media, VALUES(media)); ELSE INSERT INTO "%%BOARD%%_images" (media_hash, media, preview_reply, total) VALUES (n_media_hash, n_media, n_preview, 1) ON DUPLICATE KEY UPDATE media_id = LAST_INSERT_ID(media_id), total = (total + 1), preview_reply = COALESCE(preview_reply, VALUES(preview_reply)), media = COALESCE(media, VALUES(media)); END IF; END; ``` | DROP PROCEDURE IF EXISTS “insert_image_%%BOARD%%”; CREATE PROCEDURE “insert_image_%%BOARD%%” (n_media_hash VARCHAR(25), n_media VARCHAR(20), n_preview VARCHAR(20), n_op INT) BEGIN IF n_op = 1 THEN INSERT INTO "%%BOARD%%_images" (media_hash, media, preview_op, total) VALUES (n_media_hash, n_media, n_preview, 1) ON DUPLICATE KEY UPDATE media_id = LAST_INSERT_ID(media_id), total = (total + 1), preview_op = COALESCE(preview_op, VALUES(preview_op)), media = COALESCE(media, VALUES(media)); ELSE INSERT INTO "%%BOARD%%_images" (media_hash, media, preview_reply, total) VALUES (n_media_hash, n_media, n_preview, 1) ON DUPLICATE KEY UPDATE media_id = LAST_INSERT_ID(media_id), total = (total + 1), preview_reply = COALESCE(preview_reply, VALUES(preview_reply)), media = COALESCE(media, VALUES(media)); END IF; END; ``` | ||
Line 455: | Line 374: | ||
(AbstractDumper.java ln 161): - Fullsize media downloader thread <code>protected class MediaFetcher implements Runnable {</code> (AbstractDumper.java lln. 169): - Grab from queue thing <code>mediaPost = mediaUpdates.take();</code> (AbstractDumper.java lln. 173): - Try to handle the media for one post <code>mediaLocalBoard.insertMedia(mediaPost, sourceBoard);</code> (Local.java ln. 201): - Handler for a post with media <code>public void insertMedia(MediaPost h, Board source, boolean isPreview) throws ContentGetException, ContentStoreException, CfBicClearParseException {</code> (Local.java ln. 201): - Interact with DB for this media <code>mediaRow = db.getMedia(h);</code> | (AbstractDumper.java ln 161): - Fullsize media downloader thread <code>protected class MediaFetcher implements Runnable {</code> (AbstractDumper.java lln. 169): - Grab from queue thing <code>mediaPost = mediaUpdates.take();</code> (AbstractDumper.java lln. 173): - Try to handle the media for one post <code>mediaLocalBoard.insertMedia(mediaPost, sourceBoard);</code> (Local.java ln. 201): - Handler for a post with media <code>public void insertMedia(MediaPost h, Board source, boolean isPreview) throws ContentGetException, ContentStoreException, CfBicClearParseException {</code> (Local.java ln. 201): - Interact with DB for this media <code>mediaRow = db.getMedia(h);</code> | ||
If there is information for this media in the DB, | If there is information for this media in the DB, retreive it. If any new information exists about this media that is not already in the DB, add that to the DB entry. (SQL.java ln. 289): - Interact with DB for this media <code>public synchronized Media getMedia(MediaPost post) throws ContentGetException, ContentStoreException, DBConnectionException {</code> | ||
(SQL.java ln. 342-347): - Decide if media row needs an update | (SQL.java ln. 342-347): - Decide if media row needs an update | ||
Line 510: | Line 429: | ||
===== YoutsubaJSON.java: ===== | ===== YoutsubaJSON.java: ===== | ||
(ln.217) “private Topic makeThreadFromJson(PostJson pj) throws ContentParseException { | (ln.217) “private Topic makeThreadFromJson(PostJson pj) throws ContentParseException {“ | ||
===== Topic.java ===== | ===== Topic.java ===== | ||
Line 516: | Line 435: | ||
(Ln. 22) <code>public Topic(int num, int omPosts, int omImages) {</code> | (Ln. 22) <code>public Topic(int num, int omPosts, int omImages) {</code> | ||
Data flow: 4ch -> | Data flow: 4ch -> asagi -> DB | ||
?Seems to be handled by DB triggers.? | ?Seems to be handled by DB triggers.? | ||
Line 525: | Line 444: | ||
Hayden source code : https://github.com/bbepis/Hayden | Hayden source code : https://github.com/bbepis/Hayden | ||
= Ayase Imageboard Archival Standard (Ayase) = | |||
The Ayase Imageboard Archival Standard was produced by the Bibliotheca Anonoma to handle the ever growing operations of Desuarchive and RebeccaBlackTech. | |||
== Reference Implementation == | |||
* Operating System: CentOS/RHEL 8 | |||
* Database: PostgreSQL | |||
* Scraper: Ena or Hydrus (.NET C#) | |||
* Middleware: Ayase (Python PyPy) | |||
* Frontends: 4chan X, Clover, iphone app | |||
== Specifications == | |||
=== Files === | |||
* All files are to be named by shA256sum and file extension. This was chosen for the broad availability of hardware xtensions for the purpose nd its use by 8chan/vichan. | |||
* They are to be stored in double nested folders. | |||
=== Time === | |||
* Ayase requires time to be stored in PostgreSQL datetimes, which also store timezones. | |||
* Only UTC should be used as the timezone for newly scraped data. The timezone support is not an excuse to store in other timezones. | |||
* The timezone support is only meant for compatibility purposes with prior Asagi data, given that they store time as US time (maybe Eastern) due to their past HTML scraping. Future scrapes are strongly advised not to replicate this behavior, local time should be up to the frontend to determine. | |||
=== PostgreSQL Schema === | |||
if we GET json from the 4chan API, and always serve the same json to the user, why deconstruct and reconstruct into post focused sql records every time? | |||
=== Elasticsearch Engine === | |||
A seperate elastic search engine kept in sync with, but independent from the sql server, will replace Sphinxsearch which queries the mysql db |