FoolFuuka/Asagi: Difference between revisions

From Bibliotheca Anonoma
No edit summary
 
(9 intermediate revisions by 2 users not shown)
Line 3: Line 3:
<blockquote>Note: Not to be confused with the FoolFuuka frontend, which uses the Asagi scraper.
<blockquote>Note: Not to be confused with the FoolFuuka frontend, which uses the Asagi scraper.
</blockquote>
</blockquote>
https://github.com/eksopl/fuuka/wiki/Sphinx-Search-Backend#gory-details
https://github.com/eksopl/fuuka/wiki/Sphinx-Search-Backend#gory-details


Line 10: Line 9:
The Asagi Imageboard Archival Standard was developed by eksopl of Easymodo and the Foolz team under the direction of woxxy. It was developed to run the Foolz archiver, and has been the engine for the majority of archivers since the collapse of Archive.moe.
The Asagi Imageboard Archival Standard was developed by eksopl of Easymodo and the Foolz team under the direction of woxxy. It was developed to run the Foolz archiver, and has been the engine for the majority of archivers since the collapse of Archive.moe.


Three versions can be identified:
== SQL Schema ==


* Mark I (2009) - Produced for Foolz.us. Maybe in use by Nyafuu, was in use by Loveisover.
Two versions of the SQL Schema in use can be identified:
* Mark II (2015) - Produced for Archive.moe. Used by Fireden and arch.b4k.co.
 
* Mark III (2019) - The final reference standard codified by the Bibliotheca Anonoma, in preparation for the development of new drop-in replacements.
* 1.0.0 (2013) - The final version by Eksopl for Foolz.us, and the SQL schema was unchanged in Archive.moe.
** Should be used by Fireden, but this is unknown.
** Might be still used by Nyafuu, was in use by Loveisover.
* 1.3.0 (2019) - (only Mysql/triggers.sql was changed, no other structural SQL schema changes) The final reference standard consolidated from 4plebs repos by the Bibliotheca Anonoma for use in Desuarchive, in preparation for the development of new drop-in replacements.
** Used by Desuarchive, Rbt, arch.b4k.co, maybe 4plebs?
 
A refined version 2.0.0 is proposed that would eliminate SQL triggers for improved performance, instead leaving it up to the scraper engines to conduct similar operations as the triggers.


== Reference Implementation ==
== Reference Implementation ==
Line 22: Line 27:
** Despite the fact that PostgreSQL is supported by Asagi Scraper, it is not supported by FoolFuuka.
** Despite the fact that PostgreSQL is supported by Asagi Scraper, it is not supported by FoolFuuka.
* Scraper: Asagi (Java) - https://github.com/bibanon/asagi
* Scraper: Asagi (Java) - https://github.com/bibanon/asagi
* Frontend: FoolFuuka (PHP) - https://github.com/pleebe/FoolFuuka/tree/experimental
* Frontend: FoolFuuka (PHP) - https://github.com/bibanon/FoolFuuka
* PHP Engine: Historically HHVM, PHP5.x compatible. Desuarchive and 4plebs now uses PHP 7.
* PHP Engine: Historically HHVM, PHP5.x compatible. Desuarchive and 4plebs now uses PHP 7.
* Search: Sphinxsearch
* Search: Sphinxsearch
Line 35: Line 40:
* New Frontend: some python based 4chan API compatible middleware
* New Frontend: some python based 4chan API compatible middleware


== How asagi does stuff ==
== Compilation and Usage ==
 
=== Asagi ===
 
https://github.com/eksopl/asagi/wiki/Running-Asagi
 
Also check [[FoolFuuka/Install/Ubuntu16#Install_and_compile_Asagi_from_source.|FoolFuuka/Install/Ubuntu16#Install_and_compile_Asagi_from_source.]]
 
=== FoolFuuka ===
 
https://blog.foolz.us/
 
Also check [[FoolFuuka/Install/Ubuntu16|FoolFuuka/Install/Ubuntu16]]
 
= How Asagi does stuff =
 
== Configuration ==
 
For example here is the config for Desuarchive:
 
<pre>{&quot;settings&quot;: {
  &quot;dumperEngine&quot;: &quot;DumperJSON&quot;,
  &quot;sourceEngine&quot;: &quot;YotsubaJSON&quot;,
 
  &quot;boardSettings&quot;: {
    &quot;default&quot;: {
      &quot;engine&quot;: &quot;Mysql&quot;,
      &quot;database&quot;: &quot;asagi&quot;,
      &quot;host&quot;: &quot;localhost&quot;,
      &quot;username&quot;: &quot;asagi&quot;,
      &quot;password&quot;: &quot;YOUR_PASSWORD_HERE,
      &quot;charset&quot;: &quot;utf8mb4&quot;,
      &quot;path&quot;: &quot;/srv/foolfuuka/boards&quot;,
      &quot;updateFileLastModified&quot;: false,
      &quot;useOldDirectoryStructure&quot;: false,
      &quot;webserverGroup&quot;: &quot;www-data&quot;,
      &quot;thumbThreads&quot;: 2,
      &quot;mediaThreads&quot;: 2,
      &quot;newThreadsThreads&quot;: 6,
      &quot;deletedThreadsThresholdPage&quot;: 8,
      &quot;refreshDelay&quot;: 60,
      &quot;throttleAPI&quot;: false,
      &quot;throttleURL&quot;: &quot;i.4cdn.org&quot;,
      &quot;throttleMillisec&quot;: 1050,
      &quot;threadRefreshRate&quot;: 50
    },
 
    &quot;mlp&quot;: {},
    &quot;qa&quot;: {},
    &quot;aco&quot;: {},
    &quot;tg&quot;: {},
    &quot;d&quot;: {},
    &quot;co&quot;: {},
    &quot;a&quot;: {},
    &quot;an&quot;: {},
    &quot;k&quot;: {},
    &quot;fit&quot;: {},
    &quot;wsg&quot;: {&quot;mediaThreads&quot;: 0},
    &quot;gif&quot;: {&quot;mediaThreads&quot;: 0},
    &quot;r9k&quot;: {},
    &quot;int&quot;: {},
    &quot;c&quot;: {},
    &quot;m&quot;: {},
    &quot;vr&quot;: {},
    &quot;his&quot;: {},
    &quot;trash&quot;: {},
    &quot;cgl&quot;: {},
    &quot;g&quot;: {},
    &quot;mu&quot;: {}
  }
}}</pre>
== Misc ==
 
Seperate thread for (thumbs, media, topic-fetch, topic-insert, post-delete) per board
 
AbstractDumper.java ln. 95: <code>public void initDumper(BoardSettings boardSettings) {</code>
 
== How Asagi decides to update a thread: ==
 
In <code>initDumper()</code>, <code>DumperJSON</code> spawns an instance of its inner class <code>BoardPoller</code> on a thread. In its <code>run()</code> method, <code>BoardPoller</code> loops indefinitely:
 
* Wake up from sleeping (duration set by <code>refreshDelay</code> in the configuration)
* <code>threadList = sourceBoard.getAllThreads(lastMod);</code>
* If the request 304s or errors, go to sleep
* Go over the previous threads:
** If this thread is in the current threads and it’s been modified, mark its modification timestamp and page number. Then, push it to <code>newTopics</code>.
** If this thread is not in the current threads, it’s been deleted. Push it to <code>newTopics</code>.
* Put the remaining threads in newTopics
* Sleep until the delay expires
 
The queue <code>newTopics</code> is processed by <code>AbstractDumper</code>’s inner class <code>TopicFetcher</code>.
 
== When Asagi does a thread update: ==
 
in : YotsubaJSON.java, ln. 88: <code>public Topic getThread(int threadNum, String lastMod) throws ContentGetException, ContentParseException, CfBicClearParseException {</code>
 
Loads thread JSON Decodes JSON For each post in the decoded thread JSON: Check if resto value is zero, and if so create a new thread from that post, updating lastmodified time to the time from fetching the JSON. <code>t = this.makeThreadFromJson(pj);</code> If resto is zero: Add the post to the current thread. <code>t.addPost(this.makePostFromJson(pj));</code> (What if two posts were resto==0? We’d break!)
 
== What does Asagi do with a post in a thread? ==
 
Relevant files: &gt;YousubaJSON.java &gt;YotsubaAbstract.java
 
The entire thread is processed at once.
 
=== For OP: ===
 
In YousubaJSON.java, ln. 217: <code>private Topic makeThreadFromJson(PostJson pj) throws ContentParseException {</code>
 
Ensure post number is zero. Create a new Topic() object (ln222): <code>Topic t = new Topic(pj.getNo(), pj.getOmittedPosts(), pj.getOmittedImages());</code>
 
Add the supplied OP to the thread as any other would be (ln. 224): <code>t.addPost(this.makePostFromJson(pj));</code>
 
Return the Thread() object. DONE
 
=== For reply: ===
 
In YotsubaJSON.java, (ln. 157:) <code>private Post makePostFromJson(PostJson pj) throws ContentParseException {</code>
 
Ensure post number is valid. Ensure time is valid. Create new Post() object.
 
If the JSON gave a filename not equal to null: Generate filename from JSON values. <code>p.setMediaFilename(pj.getFilename() + pj.getExt());</code>
 
Generate original filename from JSON values. <code>p.setMediaOrig(pj.getTim() + pj.getExt());</code>
 
Generate preview original filename from JSON values. <code>p.setPreviewOrig(pj.getTim() + &quot;s.jpg&quot;);</code>
 
Find the post’s capcode, and if not null: Manager -&gt; “G” Anything else: first character as uppercase.
 
Find the post hash (tripcode?) If the post hash is “Developer”, set the post hash value to “Dev”
 
If the post country is not null: Convert values “XX” or “A1” if they are used to null.
 
Pass through values from JSON YotsubaJSON.java (ln. 188 -&gt; 212), doing the following conversions:
 
Thread num to the current thread number: <code>.setThreadNum(pj.getResto() == 0 ? pj.getNo() : pj.getResto());</code>
 
OP status YotsubaJSON.java (ln. 197): <code>p.setOp(pj.getResto() == 0);</code>
 
Sanitized title? YotsubaJSON.java (ln. 198): <code>p.setTitle(this.cleanSimple(pj.getSub()));</code>
 
See YotsubaAbstract.java (ln.83): <code>public String doClean(String text)</code>
 
Sanitized name YotsubaJSON.java (ln. 200): <code>p.setName(this.cleanSimple(pj.getName()));</code>
 
Date converted from NYC_TIMEZONE (ln. 202): <code>p.setDate(DateUtils.adjustTimestampEpoch(pj.getTime(), DateUtils.NYC_TIMEZONE));</code>
 
Sanitized? EXIF data YotsubaJSON.java (ln. 212): <code>p.setExif(this.cleanSimple(this.parseMeta(pj.getCom(), pj.getUniqueIps(), pj.getSince4pass(), pj.getTrollCountry())));</code> Return the Post() object.
 
== How Asagi handles an image in a thread? ==
 
Files of note: &gt;Local.java - Saving image files
 
Local.java ln.201: <code>public void insertMedia(MediaPost h, Board source, boolean isPreview) throws ContentGetException, ContentStoreException, CfBicClearParseException {</code>
 
== How Asagi deals with post deletions? ==
 
Relevant files: &gt;src.java &gt;src.java &gt;src.java
 
SQL That handles post deletion logic: SQL.java (ln. 94): <code>“this.updateDeletedQuery = String.format(&quot;UPDATE \&quot;%s\&quot; SET deleted = ?, timestamp_expired = ? WHERE num = ? AND subnum = ?&quot;, this.table);</code>
 
This is prepared into statement, SQL.java (ln.62): <code>updateDeletedStmt = conn.prepareStatement(updateDeletedQuery);</code>
 
Function that actually writes deleted flag to a post: SQL.java (ln. 267): <code>public synchronized void markDeleted(DeletedPost post) throws ContentStoreException, DBConnectionException {</code>
 
Class TopicFetcher has function run() which calls findDeleted() See AbstractDumper.java (ln. 268): <code>protected class TopicFetcher implements Runnable {</code>
 
And AbstractDumper.java (ln. 380): <code>findDeleted(oldTopic, topic, true);</code>
 
Function findDeleted() checks Posts in a Topic to check if they have been removed in AbstractDumper.java (ln. 103): <code>protected boolean findDeleted(Topic oldTopic, Topic newTopic, boolean markDeleted) {</code>
 
Function markDeleted() in Local.java calls function markDeleted() from SQL.java: Local.java (ln. 185): <code>public void markDeleted(DeletedPost post) throws ContentStoreException {</code>
 
SQL.java (ln. 267): <code>public synchronized void markDeleted(DeletedPost post) throws ContentStoreException, DBConnectionException {</code>
 
== How Asagi interacts with the database? ==
 
Relevant files: &gt;SQL.java &gt;Mysql.java
 
Inserting a thread: SQL.java, ln.196: <code>public synchronized void insert(Topic topic) throws ContentStoreException, DBConnectionException {</code> Each post in the current Topic object is fed through the insert statement sequentially.
 
== How each value is processed between 4ch and the DB ==
 
4chan API JSON broken down to post level (YoutsubaJSON.java ln.97), then: ?PostJson are created/populated from the thread JSON data, with values keeping the names from the API? (PostJson.java, ln.4) (DISREGARD CAPITALIZATION, GOOG DOCS FUCKS WITH IT. EVERYTHING IS LOWERCASE) DB column names referenced here: (SQL.java ln.79):
 
<pre>(poster_ip,
num,
subnum,
thread_num,
op,
timestamp,
preview_orig,
preview_w,
preview_h,
media_filename,
media_w,
media_h,
media_size,
media_hash,
media_orig,
spoiler,
deleted,
capcode,
email,
name,
trip,
title,
comment,
delpass,
sticky,
locked,
poster_hash,
poster_country,
exif)</pre>
=== 4ch → (Asagi) → DB ===
 
==== “poster_ip” ====
 
Presumably always NULL? PROVEME Does not appear to ever be initialized to a value in the java source code as far as i can tell?
 
==== “no” - &gt; “num” ====
 
Post ID number from 4chan. Passed through as-is.
 
==== “subnum” ====
 
Ghostpost ID number for foolfuuka. Always zero.?
 
==== “Thread_num” ====
 
Thread IP number, always Post ID number of OP. “num” of thread OP (YoutsubaJSON.java, ln. 196): <code>“p.setThreadNum(pj.getResto() == 0 ? pj.getNo() : pj.getResto());”</code>
 
==== “Time” -&gt;“timestamp” ====
 
Timestamp of post. 4ch API provides as milliseconds since 1 JAN 1970 in USA NYC timezone. DB stores as ?(Probably milliseconds since 1 JAN 1970 UTC+0) (YoutsubaJSON.java, ln. 202): <code>p.setDate(DateUtils.adjustTimestampEpoch(pj.getTime(), DateUtils.NYC_TIMEZONE));</code>
 
==== “resto” -&gt; (If resto is 0, use value from “num”) -&gt; op ====
 
Is this the OP of the thread? (YoutsubaJSON.java, ln. 197): <code>p.setOp(pj.getResto() == 0);</code>
 
==== N/A -&gt; Preview_orig ====
 
(YoutsubaJSON.java, ln. 170): <code>p.setPreviewOrig(pj.getTim() + &quot;s.jpg&quot;);</code>
 
==== “tn_w” -&gt; “preview_w” ====
 
Width of media thumbnail (YoutsubaJSON.java, ln. 193): <code>p.setPreviewW(pj.getTnW());</code>
 
==== “tn_h” -&gt; “preview_h” ====
 
Height of media thumbnail. (YoutsubaJSON.java, ln. 170): <code>p.setPreviewH(pj.getTnH());</code>
 
==== N/A -&gt; media_filename ====
 
(YoutsubaJSON.java, ln. 168): <code>p.setMediaFilename(pj.getFilename() + pj.getExt());</code>
 
==== “W” -&gt; “media_w” ====
 
(YoutsubaJSON.java, ln. 191): <code>p.setMediaW(pj.getW());</code>
 
==== “h” -&gt; “media_h” ====
 
(YoutsubaJSON.java, ln. 192): <code>p.setMediaH(pj.getH());</code>
 
==== “Media_size” ====
 
(YoutsubaJSON.java, ln. 190): <code>p.setMediaSize(pj.getFsize());</code>
 
==== “Media_hash” ====
 
(YoutsubaJSON.java, ln. 189): <code>p.setMediaHash(pj.getMd5());</code>
 
==== N/A -&gt; “Media_orig” ====
 
(YoutsubaJSON.java, ln. 169): <code>p.setMediaOrig(pj.getTim() + pj.getExt());</code>
 
==== “Spoiler” ====
 
(YoutsubaJSON.java, ln. 204): <code>p.setSpoiler(pj.isSpoiler());</code>
 
==== N/A -&gt; “Deleted” ====
 
Set initially as false, then later updated if post is later absent from thread during subsequent updates. (YoutsubaJSON.java, ln. 205): <code>p.setdeleted(false);</code>
 
==== “Capcode” -&gt; “Capcode” ====
 
(YoutsubaJSON.java, ln. 173):
 
<pre>String capcode = pj.getCapcode();
if (capcode != null) {
    if (capcode.equals(&quot;manager&quot;) || capcode.equals(&quot;Manager&quot;)) {
        capcode = &quot;G&quot;;
    } else {
        capcode = capcode.substring(0, 1).toUpperCase();
    }
}</pre>
==== “email” -&gt; “Email” ====
 
(YoutsubaJSON.java, ln. 199): <code>p.setEmail(pj.getEmail());</code>
 
==== “Name” -&gt; (cleanSimple()) -&gt; “Name” ====
 
(YoutsubaJSON.java, ln. 200): <code>p.setName(this.cleanSimple(pj.getName()));</code>
 
==== “trip” -&gt; “Trip” ====
 
(YoutsubaJSON.java, ln. XX): <code>p.setTrip(pj.getTrip());</code>
 
==== “sub” -&gt; (cleanSimple()) -&gt; “Title” ====
 
(YoutsubaJSON.java, ln. 198): <code>p.setTitle(this.cleanSimple(pj.getSub()));</code>
 
==== “com” -&gt; (doClean) -&gt; “Comment” ====
 
(YoutsubaJSON.java, ln. 203): <code>p.setComment(this.doClean(pj.getCom()));</code>
 
==== N/A -&gt; “Delpass” ====
 
TODO (YoutsubaJSON.java, ln. XX): <code>TODO</code>
 
==== “sticky” -&gt; “Sticky” ====


When asagi does a thread update:
(YoutsubaJSON.java, ln. 206): <code>p.setSticky(pj.isSticky());</code>


in : YotsubaJSON.java, ln. 88:
==== “closed”, “archived” -&gt; “closed” AND (NOT “archived”)) -&gt; “Locked” ====


`public Topic getThread(int threadNum, String lastMod) throws ContentGetException, ContentParseException, CfBicClearParseException {`
(YoutsubaJSON.java, ln. 207): <code>p.setClosed(pj.isClosed() &amp;&amp; !pj.isArchived());</code> (SQL.java, ln. 78 - 86): <code>this.insertQuery = String.format(</code> (SQL.java, ln. 236): <code>insertStmt.setBoolean(c++, post.isClosed());</code>


Loads thread JSON
==== “id” -&gt; “Poster_hash” ====


Decodes JSON
(YoutsubaJSON.java, ln. 182):


For each post in the decoded thread JSON:
<pre>String posterHash = pj.getId();
if(posterHash != null &amp;&amp; posterHash.equals(&quot;Developer&quot;)) posterHash = &quot;Dev&quot;;</pre>
==== “country” -&gt; “Poster_country” ====


Check if resto value is zero, and if so create a new thread from that post, updating lastmodified time to the time from fetching the JSON.
(YoutsubaJSON.java, ln. 185-186):
`t = this.makeThreadFromJson(pj);`


If resto is zero: Add the post to the current thread.
<pre>String posterCountry = pj.getCountry();
if(posterCountry != null &amp;&amp; (posterCountry.equals(&quot;XX&quot;) || posterCountry.equals(&quot;A1&quot;))) posterCountry = null;</pre>
==== lots -&gt; “exif” ====


`t.addPost(this.makePostFromJson(pj));`
(YoutsubaJSON.java, ln. 212): <code>p.setExif(this.cleanSimple(this.parseMeta(pj.getCom(), pj.getUniqueIps(), pj.getSince4pass(), pj.getTrollCountry())));</code> (YoutsubaAbstract.java, ln. 137): <code>public String parseMeta(String text, Integer uniqueIps, Integer since4pass, String trollCountry) {</code>
(What if two posts were resto==0? We’d break!)


=== Files ===
== Images table values: ==


first 4chan timestamp filename seen is recorded to the sql database with md5sum as unique key?
These seem to be handled by triggers that run on post insert.


all future images with that md5sum are then linked to that timestamp filename
=== 4ch -&gt; (Asagi) -&gt; DB ===


directory format does the following subfolders based on first few numbers to cut down on amount of files in a single directory (overloads filesystem)
=== N/A -&gt; (Incremental integer) -&gt; “media_id” ===


1234/56/123456789000.jpg
This is simply an autoincrementing integer value. Set by DB engine if new image. ''Set by trigger if already-seen image.'' (Triggers.sql ln.119-139): - ''If the md5 is already in the DB'' <code>media_id = LAST_INSERT_ID(media_id),</code>


=== Time ===
=== md5 -&gt; (N/A) -&gt; “media_hash” ===


eastern time is used due to scraping
The base64-encoded md5 hash of the media file as given by 4ch. (SQL.java ln.77-87) Omitted for brevity


=== MySQL Schema ===
=== N/A -&gt; (Local filepath to full image) -&gt; “media” ===


see asagi source code
The relative path to the image on disk. (Triggers.sql ln.119-139): <code>TODO</code>


foolfuuka also adds tables so dont forget when building
=== N/A -&gt; (Local filepath to OP thumbnail) -&gt; “preview_op” ===


=== API Schema ===
The relative path to the image on disk. (Triggers.sql ln.119-139): <code>TODO</code>


= Ayase Imageboard Archival Standard (Ayase) =
=== N/A -&gt; (Local filepath to reply thumbnail) -&gt; “preview_reply” ===


The Ayase Imageboard Archival Standard was produced by the Bibliotheca Anonoma to handle the ever growing operations of Desuarchive and RebeccaBlackTech.
The relative path to the image on disk. (Triggers.sql ln.119-139): <code>TODO</code>


== Reference Implementation ==
=== N/A -&gt; (incrementer) -&gt; “total” ===
 
The number of posts that refer to this row.
 
(Triggers.sql ln.123): <code>INSERT INTO \&quot;%%BOARD%%\_images\&quot; (media\_hash, media, preview\_op, total)</code>
 
(Triggers.sql ln.127): <code>total = (total + 1)</code>
 
=== N/A -&gt; (N/A) -&gt; “banned” ===
 
Not set by Asagi, but observed to prevent downloading banned files (Triggers.sql ln.119-139): <code>TODO</code>
 
===== Table definition =====
 
(Boards.sql ln.38-49): - ''Table definition''
 
CREATE TABLE %%BOARD%%_images ( media_id SERIAL NOT NULL, media_hash character varying(25) NOT NULL, media character varying(20), preview_op character varying(20), preview_reply character varying(20), total integer NOT NULL DEFAULT ‘0’, banned smallint NOT NULL DEFAULT ‘0’, PRIMARY KEY (media_id), UNIQUE (media_hash) );
 
===== Image insert procedure =====
 
(Triggers.sql ln.119-139): - Image insert procedure
 
DROP PROCEDURE IF EXISTS “insert_image_%%BOARD%%”; CREATE PROCEDURE “insert_image_%%BOARD%%” (n_media_hash VARCHAR(25), n_media VARCHAR(20), n_preview VARCHAR(20), n_op INT) BEGIN IF n_op = 1 THEN INSERT INTO &quot;%%BOARD%%_images&quot; (media_hash, media, preview_op, total) VALUES (n_media_hash, n_media, n_preview, 1) ON DUPLICATE KEY UPDATE media_id = LAST_INSERT_ID(media_id), total = (total + 1), preview_op = COALESCE(preview_op, VALUES(preview_op)), media = COALESCE(media, VALUES(media)); ELSE INSERT INTO &quot;%%BOARD%%_images&quot; (media_hash, media, preview_reply, total) VALUES (n_media_hash, n_media, n_preview, 1) ON DUPLICATE KEY UPDATE media_id = LAST_INSERT_ID(media_id), total = (total + 1), preview_reply = COALESCE(preview_reply, VALUES(preview_reply)), media = COALESCE(media, VALUES(media)); END IF; END; ```
 
== Data path: ==
 
=== 4ch -&gt; decode JSON -&gt; make Post objects ===
 
API Date is retrieved from 4ch and put into TopicJson objects, ready for handling posts in each of those topics. (YoutsubaJSON.java ln .89): - ''Get 4ch API JSON'' <code>String[] wgetReply = this.wgetText(this.linkThread(threadNum), lastMod);</code> (YoutsubaJSON.java ln .97) - ''Decode JSON into TopicJson objects'' <code>topicJson = GSON.fromJson(threadText, TopicJson.class);</code> (YoutsubaJSON.java ln .93) V For a single topic, each post if decoded into a Post object Post object -&gt; added to Topic object (YoutsubaJSON.java ln .93): - ''Create Topic object'' <code>Topic t = null;</code>
 
(YoutsubaJSON.java ln .109): - OP <code>t = this.makeThreadFromJson(pj);</code>
 
(YoutsubaJSON.java ln .116): - Reply <code>t.addPost(this.makePostFromJson(pj));</code>
 
V
 
(AbstractDumper.java ln 303): <code>topic = sourceBoard.getThread(newTopic, lastMod);</code>
 
V
 
Image processed somehow?
 
Full image fetching begins
 
(AbstractDumper.java ln 161): - Fullsize media downloader thread <code>protected class MediaFetcher implements Runnable {</code> (AbstractDumper.java lln. 169): - Grab from queue thing <code>mediaPost = mediaUpdates.take();</code> (AbstractDumper.java lln. 173): - Try to handle the media for one post <code>mediaLocalBoard.insertMedia(mediaPost, sourceBoard);</code> (Local.java ln. 201): - Handler for a post with media <code>public void insertMedia(MediaPost h, Board source, boolean isPreview) throws ContentGetException, ContentStoreException, CfBicClearParseException {</code> (Local.java ln. 201): - Interact with DB for this media <code>mediaRow = db.getMedia(h);</code>
 
If there is information for this media in the DB, retrieve it. If any new information exists about this media that is not already in the DB, add that to the DB entry. (SQL.java ln. 289): - Interact with DB for this media <code>public synchronized Media getMedia(MediaPost post) throws ContentGetException, ContentStoreException, DBConnectionException {</code>
 
(SQL.java ln. 342-347): - Decide if media row needs an update
 
<pre>boolean mediaUpdate = media.getMedia() == null;
boolean previewOpUpdate = media.getPreviewOp() == null &amp;&amp; post.isOp();
boolean previewReplyUpdate = media.getPreviewReply() == null &amp;&amp; !post.isOp();
// Update media row in _images table when any of its entries are null and we actually have it
if(mediaUpdate || previewOpUpdate || previewReplyUpdate) {</pre>
(SQL.java ln. 349-373): - Add values to DB for this media
 
<pre>if(mediaUpdate) {
updateMediaStmt.setString(1, post.getMedia());
updateMediaStmt.setString(2, post.getMediaHash());
updateMediaStmt.executeUpdate();
}
if(previewOpUpdate) {
updatePreviewOpStmt.setString(1, post.getPreview());
updatePreviewOpStmt.setString(2, post.getMediaHash());
updatePreviewOpStmt.executeUpdate();
}
if(previewReplyUpdate) {
updatePreviewReplyStmt.setString(1, post.getPreview());
updatePreviewReplyStmt.setString(2, post.getMediaHash());
updatePreviewReplyStmt.executeUpdate();
}
conn.commit();</pre>
(SQL.java ln. 62-66): - Preparing for selecting / updating media
 
<pre>selectMediaStmt = conn.prepareStatement(selectMediaQuery);
updateMediaStmt = conn.prepareStatement(updateMediaQuery);
updatePreviewOpStmt = conn.prepareStatement(updatePreviewOpQuery);
updatePreviewReplyStmt = conn.prepareStatement(updatePreviewReplyQuery);</pre>
(SQL.java ln. 96-103): - SQL for selecting / updating media
 
<pre>this.selectMediaQuery = String.format(&quot;SELECT * FROM \&quot;%s_images\&quot; WHERE media_hash = ?&quot;,
this.table);
this.updateMediaQuery = String.format(&quot;UPDATE \&quot;%s_images\&quot; SET media = ? WHERE media_hash = ?&quot;,
this.table);
this.updatePreviewOpQuery = String.format(&quot;UPDATE \&quot;%s_images\&quot; SET preview_op = ? WHERE media_hash = ?&quot;,
this.table);
this.updatePreviewReplyQuery = String.format(&quot;UPDATE \&quot;%s_images\&quot; SET preview_reply = ? WHERE media_hash = ?&quot;,
this.table);</pre>
== Threads table values: ==
 
==== Files of note: ====
 
<blockquote>SQL.java YoutsubaJSON.java Topic.java - Class definition boards.sql (ln.52) - Table definition triggers.sql
</blockquote>
==== Functions of note: ====


* Operating System: CentOS/RHEL 8
===== SQL.java: =====
* Database: PostgreSQL
* Scraper: Ena or Hydrus (.NET C#)
* Middleware: Ayase (Python PyPy)
* Frontends: 4chan X, Clover, iphone app


== Specifications ==
===== YoutsubaJSON.java: =====


=== Files ===
(ln.217) “private Topic makeThreadFromJson(PostJson pj) throws ContentParseException {”


* All files are to be named by shA256sum and file extension. This was chosen for the broad availability of hardware xtensions for the purpose nd its use by 8chan/vichan.
===== Topic.java =====
* They are to be stored in double nested folders.


=== Time ===
(Ln. 22) <code>public Topic(int num, int omPosts, int omImages) {</code>


* Ayase requires time to be stored in PostgreSQL datetimes, which also store timezones.
Data flow: 4ch -&gt; Asagi -&gt; DB
* Only UTC should be used as the timezone for newly scraped data. The timezone support is not an excuse to store in other timezones.
* The timezone support is only meant for compatibility purposes with prior Asagi data, given that they store time as US time (maybe Eastern) due to their past HTML scraping. Future scrapes are strongly advised not to replicate this behavior, local time should be up to the frontend to determine.


=== PostgreSQL Schema ===
?Seems to be handled by DB triggers.?


if we GET json from the 4chan API, and always serve the same json to the user, why deconstruct and reconstruct into post focused sql records every time?
== References: ==


=== Elasticsearch Engine ===
Asagi source code used: Bibabon repo 2019-7 retrieved from https://github.com/bibanon/asagi with commit https://github.com/bibanon/asagi/commit/dace6f01664d887f9f60dfdec341e626685b542f


A seperate elastic search engine kept in sync with, but independent from the sql server, will replace Sphinxsearch which queries the mysql db
Hayden source code : https://github.com/bbepis/Hayden

Latest revision as of 05:29, 7 April 2020

Fuuka Imageboard Archival Standard (Fuuka)[edit]

Note: Not to be confused with the FoolFuuka frontend, which uses the Asagi scraper.

https://github.com/eksopl/fuuka/wiki/Sphinx-Search-Backend#gory-details

Asagi Imageboard Archival Standard (Asagi)[edit]

The Asagi Imageboard Archival Standard was developed by eksopl of Easymodo and the Foolz team under the direction of woxxy. It was developed to run the Foolz archiver, and has been the engine for the majority of archivers since the collapse of Archive.moe.

SQL Schema[edit]

Two versions of the SQL Schema in use can be identified:

  • 1.0.0 (2013) - The final version by Eksopl for Foolz.us, and the SQL schema was unchanged in Archive.moe.
    • Should be used by Fireden, but this is unknown.
    • Might be still used by Nyafuu, was in use by Loveisover.
  • 1.3.0 (2019) - (only Mysql/triggers.sql was changed, no other structural SQL schema changes) The final reference standard consolidated from 4plebs repos by the Bibliotheca Anonoma for use in Desuarchive, in preparation for the development of new drop-in replacements.
    • Used by Desuarchive, Rbt, arch.b4k.co, maybe 4plebs?

A refined version 2.0.0 is proposed that would eliminate SQL triggers for improved performance, instead leaving it up to the scraper engines to conduct similar operations as the triggers.

Reference Implementation[edit]

  • Operating System: Ubuntu 14.04/16.04 LTS
  • Database: MySQL Compatible utf8mb4
    • Despite the fact that PostgreSQL is supported by Asagi Scraper, it is not supported by FoolFuuka.
  • Scraper: Asagi (Java) - https://github.com/bibanon/asagi
  • Frontend: FoolFuuka (PHP) - https://github.com/bibanon/FoolFuuka
  • PHP Engine: Historically HHVM, PHP5.x compatible. Desuarchive and 4plebs now uses PHP 7.
  • Search: Sphinxsearch

New Implementation[edit]

Proposed, still needs to be constructed.

Compilation and Usage[edit]

Asagi[edit]

https://github.com/eksopl/asagi/wiki/Running-Asagi

Also check FoolFuuka/Install/Ubuntu16#Install_and_compile_Asagi_from_source.

FoolFuuka[edit]

https://blog.foolz.us/

Also check FoolFuuka/Install/Ubuntu16

How Asagi does stuff[edit]

Configuration[edit]

For example here is the config for Desuarchive:

{"settings": {
  "dumperEngine": "DumperJSON",
  "sourceEngine": "YotsubaJSON",

  "boardSettings": {
    "default": {
      "engine": "Mysql",
      "database": "asagi",
      "host": "localhost",
      "username": "asagi",
      "password": "YOUR_PASSWORD_HERE,
      "charset": "utf8mb4",
      "path": "/srv/foolfuuka/boards",
      "updateFileLastModified": false,
      "useOldDirectoryStructure": false,
      "webserverGroup": "www-data",
      "thumbThreads": 2,
      "mediaThreads": 2,
      "newThreadsThreads": 6,
      "deletedThreadsThresholdPage": 8,
      "refreshDelay": 60,
      "throttleAPI": false,
      "throttleURL": "i.4cdn.org",
      "throttleMillisec": 1050,
      "threadRefreshRate": 50
    },

    "mlp": {},
    "qa": {},
    "aco": {},
    "tg": {},
    "d": {},
    "co": {},
    "a": {},
    "an": {},
    "k": {},
    "fit": {},
    "wsg": {"mediaThreads": 0},
    "gif": {"mediaThreads": 0},
    "r9k": {},
    "int": {},
    "c": {},
    "m": {},
    "vr": {},
    "his": {},
    "trash": {},
    "cgl": {},
    "g": {},
    "mu": {}
  }
}}

Misc[edit]

Seperate thread for (thumbs, media, topic-fetch, topic-insert, post-delete) per board

AbstractDumper.java ln. 95: public void initDumper(BoardSettings boardSettings) {

How Asagi decides to update a thread:[edit]

In initDumper(), DumperJSON spawns an instance of its inner class BoardPoller on a thread. In its run() method, BoardPoller loops indefinitely:

  • Wake up from sleeping (duration set by refreshDelay in the configuration)
  • threadList = sourceBoard.getAllThreads(lastMod);
  • If the request 304s or errors, go to sleep
  • Go over the previous threads:
    • If this thread is in the current threads and it’s been modified, mark its modification timestamp and page number. Then, push it to newTopics.
    • If this thread is not in the current threads, it’s been deleted. Push it to newTopics.
  • Put the remaining threads in newTopics
  • Sleep until the delay expires

The queue newTopics is processed by AbstractDumper’s inner class TopicFetcher.

When Asagi does a thread update:[edit]

in : YotsubaJSON.java, ln. 88: public Topic getThread(int threadNum, String lastMod) throws ContentGetException, ContentParseException, CfBicClearParseException {

Loads thread JSON Decodes JSON For each post in the decoded thread JSON: Check if resto value is zero, and if so create a new thread from that post, updating lastmodified time to the time from fetching the JSON. t = this.makeThreadFromJson(pj); If resto is zero: Add the post to the current thread. t.addPost(this.makePostFromJson(pj)); (What if two posts were resto==0? We’d break!)

What does Asagi do with a post in a thread?[edit]

Relevant files: >YousubaJSON.java >YotsubaAbstract.java

The entire thread is processed at once.

For OP:[edit]

In YousubaJSON.java, ln. 217: private Topic makeThreadFromJson(PostJson pj) throws ContentParseException {

Ensure post number is zero. Create a new Topic() object (ln222): Topic t = new Topic(pj.getNo(), pj.getOmittedPosts(), pj.getOmittedImages());

Add the supplied OP to the thread as any other would be (ln. 224): t.addPost(this.makePostFromJson(pj));

Return the Thread() object. DONE

For reply:[edit]

In YotsubaJSON.java, (ln. 157:) private Post makePostFromJson(PostJson pj) throws ContentParseException {

Ensure post number is valid. Ensure time is valid. Create new Post() object.

If the JSON gave a filename not equal to null: Generate filename from JSON values. p.setMediaFilename(pj.getFilename() + pj.getExt());

Generate original filename from JSON values. p.setMediaOrig(pj.getTim() + pj.getExt());

Generate preview original filename from JSON values. p.setPreviewOrig(pj.getTim() + "s.jpg");

Find the post’s capcode, and if not null: Manager -> “G” Anything else: first character as uppercase.

Find the post hash (tripcode?) If the post hash is “Developer”, set the post hash value to “Dev”

If the post country is not null: Convert values “XX” or “A1” if they are used to null.

Pass through values from JSON YotsubaJSON.java (ln. 188 -> 212), doing the following conversions:

Thread num to the current thread number: .setThreadNum(pj.getResto() == 0 ? pj.getNo() : pj.getResto());

OP status YotsubaJSON.java (ln. 197): p.setOp(pj.getResto() == 0);

Sanitized title? YotsubaJSON.java (ln. 198): p.setTitle(this.cleanSimple(pj.getSub()));

See YotsubaAbstract.java (ln.83): public String doClean(String text)

Sanitized name YotsubaJSON.java (ln. 200): p.setName(this.cleanSimple(pj.getName()));

Date converted from NYC_TIMEZONE (ln. 202): p.setDate(DateUtils.adjustTimestampEpoch(pj.getTime(), DateUtils.NYC_TIMEZONE));

Sanitized? EXIF data YotsubaJSON.java (ln. 212): p.setExif(this.cleanSimple(this.parseMeta(pj.getCom(), pj.getUniqueIps(), pj.getSince4pass(), pj.getTrollCountry()))); Return the Post() object.

How Asagi handles an image in a thread?[edit]

Files of note: >Local.java - Saving image files

Local.java ln.201: public void insertMedia(MediaPost h, Board source, boolean isPreview) throws ContentGetException, ContentStoreException, CfBicClearParseException {

How Asagi deals with post deletions?[edit]

Relevant files: >src.java >src.java >src.java

SQL That handles post deletion logic: SQL.java (ln. 94): “this.updateDeletedQuery = String.format("UPDATE \"%s\" SET deleted = ?, timestamp_expired = ? WHERE num = ? AND subnum = ?", this.table);

This is prepared into statement, SQL.java (ln.62): updateDeletedStmt = conn.prepareStatement(updateDeletedQuery);

Function that actually writes deleted flag to a post: SQL.java (ln. 267): public synchronized void markDeleted(DeletedPost post) throws ContentStoreException, DBConnectionException {

Class TopicFetcher has function run() which calls findDeleted() See AbstractDumper.java (ln. 268): protected class TopicFetcher implements Runnable {

And AbstractDumper.java (ln. 380): findDeleted(oldTopic, topic, true);

Function findDeleted() checks Posts in a Topic to check if they have been removed in AbstractDumper.java (ln. 103): protected boolean findDeleted(Topic oldTopic, Topic newTopic, boolean markDeleted) {

Function markDeleted() in Local.java calls function markDeleted() from SQL.java: Local.java (ln. 185): public void markDeleted(DeletedPost post) throws ContentStoreException {

SQL.java (ln. 267): public synchronized void markDeleted(DeletedPost post) throws ContentStoreException, DBConnectionException {

How Asagi interacts with the database?[edit]

Relevant files: >SQL.java >Mysql.java

Inserting a thread: SQL.java, ln.196: public synchronized void insert(Topic topic) throws ContentStoreException, DBConnectionException { Each post in the current Topic object is fed through the insert statement sequentially.

How each value is processed between 4ch and the DB[edit]

4chan API JSON broken down to post level (YoutsubaJSON.java ln.97), then: ?PostJson are created/populated from the thread JSON data, with values keeping the names from the API? (PostJson.java, ln.4) (DISREGARD CAPITALIZATION, GOOG DOCS FUCKS WITH IT. EVERYTHING IS LOWERCASE) DB column names referenced here: (SQL.java ln.79):

(poster_ip,
num,
subnum,
thread_num,
op,
timestamp,
preview_orig,
preview_w,
preview_h,
media_filename,
media_w,
media_h,
media_size,
media_hash,
media_orig,
spoiler,
deleted,
capcode,
email,
name,
trip,
title,
comment,
delpass,
sticky,
locked,
poster_hash,
poster_country,
exif)

4ch → (Asagi) → DB[edit]

“poster_ip”[edit]

Presumably always NULL? PROVEME Does not appear to ever be initialized to a value in the java source code as far as i can tell?

“no” - > “num”[edit]

Post ID number from 4chan. Passed through as-is.

“subnum”[edit]

Ghostpost ID number for foolfuuka. Always zero.?

“Thread_num”[edit]

Thread IP number, always Post ID number of OP. “num” of thread OP (YoutsubaJSON.java, ln. 196): “p.setThreadNum(pj.getResto() == 0 ? pj.getNo() : pj.getResto());”

“Time” ->“timestamp”[edit]

Timestamp of post. 4ch API provides as milliseconds since 1 JAN 1970 in USA NYC timezone. DB stores as ?(Probably milliseconds since 1 JAN 1970 UTC+0) (YoutsubaJSON.java, ln. 202): p.setDate(DateUtils.adjustTimestampEpoch(pj.getTime(), DateUtils.NYC_TIMEZONE));

“resto” -> (If resto is 0, use value from “num”) -> op[edit]

Is this the OP of the thread? (YoutsubaJSON.java, ln. 197): p.setOp(pj.getResto() == 0);

N/A -> Preview_orig[edit]

(YoutsubaJSON.java, ln. 170): p.setPreviewOrig(pj.getTim() + "s.jpg");

“tn_w” -> “preview_w”[edit]

Width of media thumbnail (YoutsubaJSON.java, ln. 193): p.setPreviewW(pj.getTnW());

“tn_h” -> “preview_h”[edit]

Height of media thumbnail. (YoutsubaJSON.java, ln. 170): p.setPreviewH(pj.getTnH());

N/A -> media_filename[edit]

(YoutsubaJSON.java, ln. 168): p.setMediaFilename(pj.getFilename() + pj.getExt());

“W” -> “media_w”[edit]

(YoutsubaJSON.java, ln. 191): p.setMediaW(pj.getW());

“h” -> “media_h”[edit]

(YoutsubaJSON.java, ln. 192): p.setMediaH(pj.getH());

“Media_size”[edit]

(YoutsubaJSON.java, ln. 190): p.setMediaSize(pj.getFsize());

“Media_hash”[edit]

(YoutsubaJSON.java, ln. 189): p.setMediaHash(pj.getMd5());

N/A -> “Media_orig”[edit]

(YoutsubaJSON.java, ln. 169): p.setMediaOrig(pj.getTim() + pj.getExt());

“Spoiler”[edit]

(YoutsubaJSON.java, ln. 204): p.setSpoiler(pj.isSpoiler());

N/A -> “Deleted”[edit]

Set initially as false, then later updated if post is later absent from thread during subsequent updates. (YoutsubaJSON.java, ln. 205): p.setdeleted(false);

“Capcode” -> “Capcode”[edit]

(YoutsubaJSON.java, ln. 173):

String capcode = pj.getCapcode();
if (capcode != null) {
    if (capcode.equals("manager") || capcode.equals("Manager")) {
        capcode = "G";
    } else {
        capcode = capcode.substring(0, 1).toUpperCase();
    }
}

“email” -> “Email”[edit]

(YoutsubaJSON.java, ln. 199): p.setEmail(pj.getEmail());

“Name” -> (cleanSimple()) -> “Name”[edit]

(YoutsubaJSON.java, ln. 200): p.setName(this.cleanSimple(pj.getName()));

“trip” -> “Trip”[edit]

(YoutsubaJSON.java, ln. XX): p.setTrip(pj.getTrip());

“sub” -> (cleanSimple()) -> “Title”[edit]

(YoutsubaJSON.java, ln. 198): p.setTitle(this.cleanSimple(pj.getSub()));

“com” -> (doClean) -> “Comment”[edit]

(YoutsubaJSON.java, ln. 203): p.setComment(this.doClean(pj.getCom()));

N/A -> “Delpass”[edit]

TODO (YoutsubaJSON.java, ln. XX): TODO

“sticky” -> “Sticky”[edit]

(YoutsubaJSON.java, ln. 206): p.setSticky(pj.isSticky());

“closed”, “archived” -> “closed” AND (NOT “archived”)) -> “Locked”[edit]

(YoutsubaJSON.java, ln. 207): p.setClosed(pj.isClosed() && !pj.isArchived()); (SQL.java, ln. 78 - 86): this.insertQuery = String.format( (SQL.java, ln. 236): insertStmt.setBoolean(c++, post.isClosed());

“id” -> “Poster_hash”[edit]

(YoutsubaJSON.java, ln. 182):

String posterHash = pj.getId();
if(posterHash != null && posterHash.equals("Developer")) posterHash = "Dev";

“country” -> “Poster_country”[edit]

(YoutsubaJSON.java, ln. 185-186):

String posterCountry = pj.getCountry();
if(posterCountry != null && (posterCountry.equals("XX") || posterCountry.equals("A1"))) posterCountry = null;

lots -> “exif”[edit]

(YoutsubaJSON.java, ln. 212): p.setExif(this.cleanSimple(this.parseMeta(pj.getCom(), pj.getUniqueIps(), pj.getSince4pass(), pj.getTrollCountry()))); (YoutsubaAbstract.java, ln. 137): public String parseMeta(String text, Integer uniqueIps, Integer since4pass, String trollCountry) {

Images table values:[edit]

These seem to be handled by triggers that run on post insert.

4ch -> (Asagi) -> DB[edit]

N/A -> (Incremental integer) -> “media_id”[edit]

This is simply an autoincrementing integer value. Set by DB engine if new image. Set by trigger if already-seen image. (Triggers.sql ln.119-139): - If the md5 is already in the DB media_id = LAST_INSERT_ID(media_id),

md5 -> (N/A) -> “media_hash”[edit]

The base64-encoded md5 hash of the media file as given by 4ch. (SQL.java ln.77-87) Omitted for brevity

N/A -> (Local filepath to full image) -> “media”[edit]

The relative path to the image on disk. (Triggers.sql ln.119-139): TODO

N/A -> (Local filepath to OP thumbnail) -> “preview_op”[edit]

The relative path to the image on disk. (Triggers.sql ln.119-139): TODO

N/A -> (Local filepath to reply thumbnail) -> “preview_reply”[edit]

The relative path to the image on disk. (Triggers.sql ln.119-139): TODO

N/A -> (incrementer) -> “total”[edit]

The number of posts that refer to this row.

(Triggers.sql ln.123): INSERT INTO \"%%BOARD%%\_images\" (media\_hash, media, preview\_op, total)

(Triggers.sql ln.127): total = (total + 1)

N/A -> (N/A) -> “banned”[edit]

Not set by Asagi, but observed to prevent downloading banned files (Triggers.sql ln.119-139): TODO

Table definition[edit]

(Boards.sql ln.38-49): - Table definition

CREATE TABLE %%BOARD%%_images ( media_id SERIAL NOT NULL, media_hash character varying(25) NOT NULL, media character varying(20), preview_op character varying(20), preview_reply character varying(20), total integer NOT NULL DEFAULT ‘0’, banned smallint NOT NULL DEFAULT ‘0’, PRIMARY KEY (media_id), UNIQUE (media_hash) );

Image insert procedure[edit]

(Triggers.sql ln.119-139): - Image insert procedure

DROP PROCEDURE IF EXISTS “insert_image_%%BOARD%%”; CREATE PROCEDURE “insert_image_%%BOARD%%” (n_media_hash VARCHAR(25), n_media VARCHAR(20), n_preview VARCHAR(20), n_op INT) BEGIN IF n_op = 1 THEN INSERT INTO "%%BOARD%%_images" (media_hash, media, preview_op, total) VALUES (n_media_hash, n_media, n_preview, 1) ON DUPLICATE KEY UPDATE media_id = LAST_INSERT_ID(media_id), total = (total + 1), preview_op = COALESCE(preview_op, VALUES(preview_op)), media = COALESCE(media, VALUES(media)); ELSE INSERT INTO "%%BOARD%%_images" (media_hash, media, preview_reply, total) VALUES (n_media_hash, n_media, n_preview, 1) ON DUPLICATE KEY UPDATE media_id = LAST_INSERT_ID(media_id), total = (total + 1), preview_reply = COALESCE(preview_reply, VALUES(preview_reply)), media = COALESCE(media, VALUES(media)); END IF; END; ```

Data path:[edit]

4ch -> decode JSON -> make Post objects[edit]

API Date is retrieved from 4ch and put into TopicJson objects, ready for handling posts in each of those topics. (YoutsubaJSON.java ln .89): - Get 4ch API JSON String[] wgetReply = this.wgetText(this.linkThread(threadNum), lastMod); (YoutsubaJSON.java ln .97) - Decode JSON into TopicJson objects topicJson = GSON.fromJson(threadText, TopicJson.class); (YoutsubaJSON.java ln .93) V For a single topic, each post if decoded into a Post object Post object -> added to Topic object (YoutsubaJSON.java ln .93): - Create Topic object Topic t = null;

(YoutsubaJSON.java ln .109): - OP t = this.makeThreadFromJson(pj);

(YoutsubaJSON.java ln .116): - Reply t.addPost(this.makePostFromJson(pj));

V

(AbstractDumper.java ln 303): topic = sourceBoard.getThread(newTopic, lastMod);

V

Image processed somehow?

Full image fetching begins

(AbstractDumper.java ln 161): - Fullsize media downloader thread protected class MediaFetcher implements Runnable { (AbstractDumper.java lln. 169): - Grab from queue thing mediaPost = mediaUpdates.take(); (AbstractDumper.java lln. 173): - Try to handle the media for one post mediaLocalBoard.insertMedia(mediaPost, sourceBoard); (Local.java ln. 201): - Handler for a post with media public void insertMedia(MediaPost h, Board source, boolean isPreview) throws ContentGetException, ContentStoreException, CfBicClearParseException { (Local.java ln. 201): - Interact with DB for this media mediaRow = db.getMedia(h);

If there is information for this media in the DB, retrieve it. If any new information exists about this media that is not already in the DB, add that to the DB entry. (SQL.java ln. 289): - Interact with DB for this media public synchronized Media getMedia(MediaPost post) throws ContentGetException, ContentStoreException, DBConnectionException {

(SQL.java ln. 342-347): - Decide if media row needs an update

boolean mediaUpdate = media.getMedia() == null;
boolean previewOpUpdate = media.getPreviewOp() == null && post.isOp();
boolean previewReplyUpdate = media.getPreviewReply() == null && !post.isOp();
// Update media row in _images table when any of its entries are null and we actually have it
if(mediaUpdate || previewOpUpdate || previewReplyUpdate) {

(SQL.java ln. 349-373): - Add values to DB for this media

if(mediaUpdate) {
updateMediaStmt.setString(1, post.getMedia());
updateMediaStmt.setString(2, post.getMediaHash());
updateMediaStmt.executeUpdate();
}
if(previewOpUpdate) {
updatePreviewOpStmt.setString(1, post.getPreview());
updatePreviewOpStmt.setString(2, post.getMediaHash());
updatePreviewOpStmt.executeUpdate();
}
if(previewReplyUpdate) {
updatePreviewReplyStmt.setString(1, post.getPreview());
updatePreviewReplyStmt.setString(2, post.getMediaHash());
updatePreviewReplyStmt.executeUpdate();
}
conn.commit();

(SQL.java ln. 62-66): - Preparing for selecting / updating media

selectMediaStmt = conn.prepareStatement(selectMediaQuery);
updateMediaStmt = conn.prepareStatement(updateMediaQuery);
updatePreviewOpStmt = conn.prepareStatement(updatePreviewOpQuery);
updatePreviewReplyStmt = conn.prepareStatement(updatePreviewReplyQuery);

(SQL.java ln. 96-103): - SQL for selecting / updating media

this.selectMediaQuery = String.format("SELECT * FROM \"%s_images\" WHERE media_hash = ?",
this.table);
this.updateMediaQuery = String.format("UPDATE \"%s_images\" SET media = ? WHERE media_hash = ?",
this.table);
this.updatePreviewOpQuery = String.format("UPDATE \"%s_images\" SET preview_op = ? WHERE media_hash = ?",
this.table);
this.updatePreviewReplyQuery = String.format("UPDATE \"%s_images\" SET preview_reply = ? WHERE media_hash = ?",
this.table);

Threads table values:[edit]

Files of note:[edit]

SQL.java YoutsubaJSON.java Topic.java - Class definition boards.sql (ln.52) - Table definition triggers.sql

Functions of note:[edit]

SQL.java:[edit]
YoutsubaJSON.java:[edit]

(ln.217) “private Topic makeThreadFromJson(PostJson pj) throws ContentParseException {”

Topic.java[edit]

(Ln. 22) public Topic(int num, int omPosts, int omImages) {

Data flow: 4ch -> Asagi -> DB

?Seems to be handled by DB triggers.?

References:[edit]

Asagi source code used: Bibabon repo 2019-7 retrieved from https://github.com/bibanon/asagi with commit https://github.com/bibanon/asagi/commit/dace6f01664d887f9f60dfdec341e626685b542f

Hayden source code : https://github.com/bbepis/Hayden