Difference between revisions of "FoolFuuka/Asagi"

(Asagi Imageboard Archival Standard (Asagi))
Line 34: Line 34:
 
* Frontend: https://github.com/pleebe/FoolFuuka/tree/experimental
 
* Frontend: https://github.com/pleebe/FoolFuuka/tree/experimental
 
* New Frontend: some python based 4chan API compatible middleware
 
* New Frontend: some python based 4chan API compatible middleware
 +
 +
== Compilation and Usage ==
 +
 +
https://github.com/eksopl/asagi/wiki/Running-Asagi
 +
 +
Also check [[FoolFuuka/Install/Ubuntu16#Install_and_compile_Asagi_from_source.]]
  
 
== How asagi does stuff ==
 
== How asagi does stuff ==

Revision as of 04:32, 20 August 2019

Fuuka Imageboard Archival Standard (Fuuka)

Note: Not to be confused with the FoolFuuka frontend, which uses the Asagi scraper.

https://github.com/eksopl/fuuka/wiki/Sphinx-Search-Backend#gory-details

Asagi Imageboard Archival Standard (Asagi)

The Asagi Imageboard Archival Standard was developed by eksopl of Easymodo and the Foolz team under the direction of woxxy. It was developed to run the Foolz archiver, and has been the engine for the majority of archivers since the collapse of Archive.moe.

Three versions can be identified:

  • Mark I (2009) - Produced for Foolz.us. Maybe in use by Nyafuu, was in use by Loveisover.
  • Mark II (2015) - Produced for Archive.moe. Used by Fireden and arch.b4k.co.
  • Mark III (2019) - The final reference standard codified by the Bibliotheca Anonoma, in preparation for the development of new drop-in replacements.

Reference Implementation

New Implementation

Proposed, still needs to be constructed.

Compilation and Usage

https://github.com/eksopl/asagi/wiki/Running-Asagi

Also check FoolFuuka/Install/Ubuntu16#Install_and_compile_Asagi_from_source.

How asagi does stuff

When asagi does a thread update:

in : YotsubaJSON.java, ln. 88:

`public Topic getThread(int threadNum, String lastMod) throws ContentGetException, ContentParseException, CfBicClearParseException {`

Loads thread JSON

Decodes JSON

For each post in the decoded thread JSON:

Check if resto value is zero, and if so create a new thread from that post, updating lastmodified time to the time from fetching the JSON. `t = this.makeThreadFromJson(pj);`

If resto is zero: Add the post to the current thread.

`t.addPost(this.makePostFromJson(pj));` (What if two posts were resto==0? We’d break!)

Files

first 4chan timestamp filename seen is recorded to the sql database with md5sum as unique key?

all future images with that md5sum are then linked to that timestamp filename

directory format does the following subfolders based on first few numbers to cut down on amount of files in a single directory (overloads filesystem)

1234/56/123456789000.jpg

Time

eastern time is used due to scraping

MySQL Schema

see asagi source code

foolfuuka also adds tables so dont forget when building

API Schema

Ayase Imageboard Archival Standard (Ayase)

The Ayase Imageboard Archival Standard was produced by the Bibliotheca Anonoma to handle the ever growing operations of Desuarchive and RebeccaBlackTech.

Reference Implementation

  • Operating System: CentOS/RHEL 8
  • Database: PostgreSQL
  • Scraper: Ena or Hydrus (.NET C#)
  • Middleware: Ayase (Python PyPy)
  • Frontends: 4chan X, Clover, iphone app

Specifications

Files

  • All files are to be named by shA256sum and file extension. This was chosen for the broad availability of hardware xtensions for the purpose nd its use by 8chan/vichan.
  • They are to be stored in double nested folders.

Time

  • Ayase requires time to be stored in PostgreSQL datetimes, which also store timezones.
  • Only UTC should be used as the timezone for newly scraped data. The timezone support is not an excuse to store in other timezones.
  • The timezone support is only meant for compatibility purposes with prior Asagi data, given that they store time as US time (maybe Eastern) due to their past HTML scraping. Future scrapes are strongly advised not to replicate this behavior, local time should be up to the frontend to determine.

PostgreSQL Schema

if we GET json from the 4chan API, and always serve the same json to the user, why deconstruct and reconstruct into post focused sql records every time?

Elasticsearch Engine

A seperate elastic search engine kept in sync with, but independent from the sql server, will replace Sphinxsearch which queries the mysql db