Difference between revisions of "FoolFuuka/Asagi"

(Tags: Mobile edit, Mobile web edit)
Line 20: Line 20:
 
`t.addPost(this.makePostFromJson(pj));`
 
`t.addPost(this.makePostFromJson(pj));`
 
(What if two posts were resto==0? We’d break!)
 
(What if two posts were resto==0? We’d break!)
 +
 +
The timezone support is only meant for compatibility purposes with prior Asagi data, given that they store time as US time (maybe Eastern) due to their past HTML scraping. Future scrapes are strongly advised not to replicate this behavior, local time should be up to the frontend to determine.
 +
 +
### PostgreSQL Schema
 +
 +
if we GET json from the 4chan API, and always serve the same json to the user, why deconstruct and reconstruct into post focused sql records every time?
 +
 +
### Elasticsearch Engine
 +
 +
A seperate elastic search engine kept in sync with, but independent from the sql server, will replace Sphinxsearch which queries the mysql db
  
 
= Fuuka Imageboard Archival Standard (Fuuka) =
 
= Fuuka Imageboard Archival Standard (Fuuka) =

Revision as of 07:23, 19 August 2019

How asagi does stuff

When asagi does a thread update:

in : YotsubaJSON.java, ln. 88:

`public Topic getThread(int threadNum, String lastMod) throws ContentGetException, ContentParseException, CfBicClearParseException {`

Loads thread JSON

Decodes JSON

For each post in the decoded thread JSON:

Check if resto value is zero, and if so create a new thread from that post, updating lastmodified time to the time from fetching the JSON. `t = this.makeThreadFromJson(pj);`

If resto is zero: Add the post to the current thread.

`t.addPost(this.makePostFromJson(pj));` (What if two posts were resto==0? We’d break!)

The timezone support is only meant for compatibility purposes with prior Asagi data, given that they store time as US time (maybe Eastern) due to their past HTML scraping. Future scrapes are strongly advised not to replicate this behavior, local time should be up to the frontend to determine.

      1. PostgreSQL Schema

if we GET json from the 4chan API, and always serve the same json to the user, why deconstruct and reconstruct into post focused sql records every time?

      1. Elasticsearch Engine

A seperate elastic search engine kept in sync with, but independent from the sql server, will replace Sphinxsearch which queries the mysql db

Fuuka Imageboard Archival Standard (Fuuka)

Note: Not to be confused with the FoolFuuka frontend, which uses the Asagi scraper and therefore its standard instead.

Asagi Imageboard Archival Standard (Asagi)

The Asagi Imageboard Archival Standard was developed by eksopl of Easymodo and the Foolz team under the direction of woxxy. It was developed to run the Foolz archiver, and has been the engine for the majority of archivers since the collapse of Archive.moe.

Three versions can be identified:

  • Mark I (2009) - Produced for Foolz.us. Maybe in use by Nyafuu, was in use by Loveisover.
  • Mark II (2015) - Produced for Archive.moe. Used by Fireden and arch.b4k.co.
  • Mark III (2019) - The final reference standard codified by the Bibliotheca Anonoma, in preparation for the development of new drop-in replacements.

Reference Implementation

  • Operating System: Ubuntu 14.04/16.04 LTS
  • Database: MySQL Compatible utf8mb4
    • Despite the fact that PostgreSQL is supported by Asagi Scraper, it is not supported by FoolFuuka.
  • Scraper: Asagi (Java)
  • Frontend: FoolFuuka (PHP)
  • PHP Engine: Historically HHVM, PHP5.x compatible. Desuarchive and 4plebs now uses PHP 7.
  • Search: Sphinxsearch

New Implementation

  • Database: Percona MariaDB TokuDB - Special variant of MariaDB with many unique optimizations.
  • Scraper: H (.NET C#)
  • Frontend: bibanon/foolfuuka
  • New Frontend: (ASP.NET C#)

Specifications

Files

first 4chan timestamp filename seen is recorded to the sql database with md5sum as unique key?

all future images with that md5sum are then linked to that timestamp filename

directory format does the following subfolders based on first few numbers to cut down on amount of files in a single directory (overloads filesystem)

1234/56/123456789000.jpg

Time

eastern time is used due to scraping

MySQL Schema

see asagi source code

foolfuuka also adds tables so dont forget when building

API Schema

Ayase Imageboard Archival Standard (Ayase)

The Ayase Imageboard Archival Standard was produced by the Bibliotheca Anonoma to handle the ever growing operations of Desuarchive and RebeccaBlackTech.

Reference Implementation

  • Operating System: CentOS/RHEL 8
  • Database: PostgreSQL
  • Scraper: Ena or H (.NET C#)
  • Middleware: Ayase (Python PyPy)
  • Frontends: 4chan X, Clover, iphone app

Specifications

Files

  • All files are to be named by shA256sum and file extension. This was chosen for the broad availability of hardware xtensions for the purpose nd its use by 8chan/vichan.
  • They are to be stored in double nested folders.

Time

  • Ayase requires time to be stored in PostgreSQL datetimes, which also store timezones.
  • Only UTC should be used as the timezone for newly scraped data. The timezone support is not an excuse to store in other timezones.
  • The timezone support is only meant for compatibility purposes with prior Asagi data, given that they store time as US time (m