Difference between revisions of "FoolFuuka/Asagi"

(Tags: Mobile edit, Mobile web edit)
Line 1: Line 1:
How asagi does stuff
+
== How asagi does stuff ==
  
 
When asagi does a thread update:
 
When asagi does a thread update:
Line 21: Line 21:
 
(What if two posts were resto==0? We’d break!)
 
(What if two posts were resto==0? We’d break!)
  
Fuuka Imageboard Archival Standard (Fuuka)
+
= Fuuka Imageboard Archival Standard (Fuuka) =
=============
 
  
> Note: Not to be confused with the FoolFuuka frontend, which uses the Asagi scraper and therefore its standard instead.
+
<blockquote>Note: Not to be confused with the FoolFuuka frontend, which uses the Asagi scraper and therefore its standard instead.
 
+
</blockquote>
Asagi Imageboard Archival Standard (Asagi)
+
= Asagi Imageboard Archival Standard (Asagi) =
=================
 
  
 
The Asagi Imageboard Archival Standard was developed by eksopl of Easymodo and the Foolz team under the direction of woxxy. It was developed to run the Foolz archiver, and has been the engine for the majority of archivers since the collapse of Archive.moe.
 
The Asagi Imageboard Archival Standard was developed by eksopl of Easymodo and the Foolz team under the direction of woxxy. It was developed to run the Foolz archiver, and has been the engine for the majority of archivers since the collapse of Archive.moe.
Line 37: Line 35:
 
* Mark III (2019) - The final reference standard codified by the Bibliotheca Anonoma, in preparation for the development of new drop-in replacements.
 
* Mark III (2019) - The final reference standard codified by the Bibliotheca Anonoma, in preparation for the development of new drop-in replacements.
  
## Reference Implementation
+
== Reference Implementation ==
  
 
* Operating System: Ubuntu 14.04/16.04 LTS
 
* Operating System: Ubuntu 14.04/16.04 LTS
 
* Database: MySQL Compatible utf8mb4
 
* Database: MySQL Compatible utf8mb4
    * Despite the fact that PostgreSQL is supported by Asagi Scraper, it is not supported by FoolFuuka.
+
** Despite the fact that PostgreSQL is supported by Asagi Scraper, it is not supported by FoolFuuka.
 
* Scraper: Asagi (Java)
 
* Scraper: Asagi (Java)
 
* Frontend: FoolFuuka (PHP)
 
* Frontend: FoolFuuka (PHP)
Line 47: Line 45:
 
* Search: Sphinxsearch
 
* Search: Sphinxsearch
  
## New Implementation
+
== New Implementation ==
  
 
* Database: Percona MariaDB TokuDB - Special variant of MariaDB with many unique optimizations.
 
* Database: Percona MariaDB TokuDB - Special variant of MariaDB with many unique optimizations.
Line 54: Line 52:
 
* New Frontend: (ASP.NET C#)
 
* New Frontend: (ASP.NET C#)
  
## Specifications
+
== Specifications ==
  
### Files
+
=== Files ===
  
 
first 4chan timestamp filename seen is recorded to the sql database with md5sum as unique key?
 
first 4chan timestamp filename seen is recorded to the sql database with md5sum as unique key?
Line 66: Line 64:
 
1234/56/123456789000.jpg
 
1234/56/123456789000.jpg
  
### Time
+
=== Time ===
  
 
eastern time is used due to scraping
 
eastern time is used due to scraping
  
### MySQL Schema
+
=== MySQL Schema ===
  
 
see asagi source code
 
see asagi source code
Line 76: Line 74:
 
foolfuuka also adds tables so dont forget when building
 
foolfuuka also adds tables so dont forget when building
  
### API Schema
+
=== API Schema ===
  
Ayase Imageboard Archival Standard (Ayase)
+
= Ayase Imageboard Archival Standard (Ayase) =
=================
 
  
 
The Ayase Imageboard Archival Standard was produced by the Bibliotheca Anonoma to handle the ever growing operations of Desuarchive and RebeccaBlackTech.
 
The Ayase Imageboard Archival Standard was produced by the Bibliotheca Anonoma to handle the ever growing operations of Desuarchive and RebeccaBlackTech.
  
## Reference Implementation
+
== Reference Implementation ==
  
 
* Operating System: CentOS/RHEL 8
 
* Operating System: CentOS/RHEL 8
Line 91: Line 88:
 
* Frontends: 4chan X, Clover, iphone app
 
* Frontends: 4chan X, Clover, iphone app
  
## Specifications
+
== Specifications ==
  
### Files
+
=== Files ===
  
 
* All files are to be named by shA256sum and file extension. This was chosen for the broad availability of hardware xtensions for the purpose nd its use by 8chan/vichan.
 
* All files are to be named by shA256sum and file extension. This was chosen for the broad availability of hardware xtensions for the purpose nd its use by 8chan/vichan.
 
* They are to be stored in double nested folders.
 
* They are to be stored in double nested folders.
  
### Time
+
=== Time ===
  
 
* Ayase requires time to be stored in PostgreSQL datetimes, which also store timezones.
 
* Ayase requires time to be stored in PostgreSQL datetimes, which also store timezones.
 
* Only UTC should be used as the timezone for newly scraped data. The timezone support is not an excuse to store in other timezones.
 
* Only UTC should be used as the timezone for newly scraped data. The timezone support is not an excuse to store in other timezones.
* The timezone support is only meant for compatibility purposes with prior Asagi data, given that they store time as US time (maybe Eastern) due to their past HTML scraping. Future scrapes are strongly advised not to replicate this behavior, local time should be up to the frontend to determine.
+
* The timezone support is only meant for compatibility purposes with prior Asagi data, given that they store time as US time (m
 
 
### PostgreSQL Schema
 
 
 
if we GET json from the 4chan API, and always serve the same json to the user, why deconstruct and reconstruct into post focused sql records every time?
 
 
 
### Elasticsearch Engine
 
 
 
A seperate elastic search engine kept in sync with, but independent from the sql server, will replace Sphinxsearch which queries the mysql db
 

Revision as of 07:22, 19 August 2019

How asagi does stuff

When asagi does a thread update:

in : YotsubaJSON.java, ln. 88:

`public Topic getThread(int threadNum, String lastMod) throws ContentGetException, ContentParseException, CfBicClearParseException {`

Loads thread JSON

Decodes JSON

For each post in the decoded thread JSON:

Check if resto value is zero, and if so create a new thread from that post, updating lastmodified time to the time from fetching the JSON. `t = this.makeThreadFromJson(pj);`

If resto is zero: Add the post to the current thread.

`t.addPost(this.makePostFromJson(pj));` (What if two posts were resto==0? We’d break!)

Fuuka Imageboard Archival Standard (Fuuka)

Note: Not to be confused with the FoolFuuka frontend, which uses the Asagi scraper and therefore its standard instead.

Asagi Imageboard Archival Standard (Asagi)

The Asagi Imageboard Archival Standard was developed by eksopl of Easymodo and the Foolz team under the direction of woxxy. It was developed to run the Foolz archiver, and has been the engine for the majority of archivers since the collapse of Archive.moe.

Three versions can be identified:

  • Mark I (2009) - Produced for Foolz.us. Maybe in use by Nyafuu, was in use by Loveisover.
  • Mark II (2015) - Produced for Archive.moe. Used by Fireden and arch.b4k.co.
  • Mark III (2019) - The final reference standard codified by the Bibliotheca Anonoma, in preparation for the development of new drop-in replacements.

Reference Implementation

  • Operating System: Ubuntu 14.04/16.04 LTS
  • Database: MySQL Compatible utf8mb4
    • Despite the fact that PostgreSQL is supported by Asagi Scraper, it is not supported by FoolFuuka.
  • Scraper: Asagi (Java)
  • Frontend: FoolFuuka (PHP)
  • PHP Engine: Historically HHVM, PHP5.x compatible. Desuarchive and 4plebs now uses PHP 7.
  • Search: Sphinxsearch

New Implementation

  • Database: Percona MariaDB TokuDB - Special variant of MariaDB with many unique optimizations.
  • Scraper: H (.NET C#)
  • Frontend: bibanon/foolfuuka
  • New Frontend: (ASP.NET C#)

Specifications

Files

first 4chan timestamp filename seen is recorded to the sql database with md5sum as unique key?

all future images with that md5sum are then linked to that timestamp filename

directory format does the following subfolders based on first few numbers to cut down on amount of files in a single directory (overloads filesystem)

1234/56/123456789000.jpg

Time

eastern time is used due to scraping

MySQL Schema

see asagi source code

foolfuuka also adds tables so dont forget when building

API Schema

Ayase Imageboard Archival Standard (Ayase)

The Ayase Imageboard Archival Standard was produced by the Bibliotheca Anonoma to handle the ever growing operations of Desuarchive and RebeccaBlackTech.

Reference Implementation

  • Operating System: CentOS/RHEL 8
  • Database: PostgreSQL
  • Scraper: Ena or H (.NET C#)
  • Middleware: Ayase (Python PyPy)
  • Frontends: 4chan X, Clover, iphone app

Specifications

Files

  • All files are to be named by shA256sum and file extension. This was chosen for the broad availability of hardware xtensions for the purpose nd its use by 8chan/vichan.
  • They are to be stored in double nested folders.

Time

  • Ayase requires time to be stored in PostgreSQL datetimes, which also store timezones.
  • Only UTC should be used as the timezone for newly scraped data. The timezone support is not an excuse to store in other timezones.
  • The timezone support is only meant for compatibility purposes with prior Asagi data, given that they store time as US time (m