Self-hosted Bookmark and Archive manager
Bookmark links and edit its metadata (like title, tags, summary) via web-interface.
Archive links content in HTML, PDF or full-page PNG format.
Automatic archival of links to non-html content like pdf, jpg, txt etc..
i.e. Bookmarking links to pdf, jpg etc.. via web-interface will automatically save those files on server.
Supports archival of media elements of a web-page using third party download managers.
Directory based categorization of bookmarks
Automatic tagging of HTML links.
Automatic summarization of HTML content.
Special readability mode.
Search bookmarks according to url, title, tags or summary.
Supports multiple user accounts.
Supports public and group directory for every user.
Upload any file from web-interface for archiving.
Easy to use admin interface for managing multiple users.
Import bookmarks from Netscape Bookmark HTML file format.
Supports streaming of archived media elements.
Annotation support for both HTML, its readable version.
Annotation support for both archived and uploaded pdf/epub files.
Remembers last read position of html (and its readable version), pdf and epub.
Rudimentary support for adding custom note.
First make sure that python 3.5.2+ (recommended version is 3.6.5+) is installed on system and install following packages using native package manager.
1. virtualenv
2. wkhtmltopdf (for html to pdf/png conversion)
3. redis-server (optional)
4. chromium (optional from v0.2+)
Installation of above dependencies in Arch or Arch based distros
$ sudo pacman -S python-virtualenv wkhtmltopdf redis chromium
Installation of above dependencies in Debian or Ubuntu based distros
$ sudo apt install virtualenv wkhtmltopdf redis-server chromium-browser
Note: Name of above dependencies may change depending on distro or OS, so install accordingly. Once above dependencies are installed, execute following commands, which are distro/platform independent.
$ mkdir reminiscence
$ cd reminiscence
$ virtualenv -p python3 venv
$ source venv/bin/activate
$ cd venv
$ git clone https://github.com/kanishka-linux/reminiscence.git
$ cd reminiscence
$ pip install -r requirements.txt
$ mkdir logs archive tmp
$ python manage.py generatesecretkey
$ python manage.py nltkdownload
$ python manage.py migrate
$ python manage.py createsuperuser
$ python manage.py runserver 127.0.0.1:8000
open 127.0.0.1:8000 using any browser, login and start adding links
**Note:** replace localhost address with local ip address of your server
to access web-interface from anywhere on the local network
Admin interface available at: /admin/
Generating PDFs and PNGs are resource intesive and time consuming. We can delegate these tasks to celery, in order to execute them in the background.
Edit reminiscence/settings.py file and set USE_CELERY = True
Now open another terminal in the same topmost project directory and execute following commands:
$ cd venv
$ source bin/activate
$ cd venv/reminiscence
$ celery -A reminiscence worker --loglevel=info
launch redis-server from another terminal
$ redis-server
Using docker is convenient compared to normal installation method described above. It will take care of configuration and setting up of gunicorn, nginx and also postgresql database. (Setting and running up these three things can be a bit cumbersome, if done manually, which is described below in separate section.) It will also automatically download headless version of wkhtmltopdf from official github repository (Since, many distros do not package wkhtmltopdf with headless feature) and nltk data set, apart from installing python based dependencies.
Install docker and docker-compose
Enable/start docker service. Instructions for enabling docker might be different in different distros. Sample instruction for enabling/starting docker will look like
$ systemctl enable/start docker.service
clone github repository and enter directory
$ git clone https://github.com/kanishka-linux/reminiscence.git
$ cd reminiscence
build and start
$ sudo docker-compose up --build
Note: Above instruction will take some time when executed for the first time.
Above step will also create default user: 'admin' with default password: 'changepassword'
If IP address of server is '192.168.1.2' then admin interface will be available at
192.168.1.2/admin/
Note: In this method, there is no need to
attach port number to IP address.
Change default admin password from admin interface and create new regular user. After that logout, and open '192.168.1.2'. Now login with regular user for regular activity.
For custom configuration, modify nginx.conf and dockerfiles available in the repository. After that execute step 4 again.
Note: If Windows users are facing problem in mounting data volume for Postgres, they are advised to refer this issue.
Note: Ubuntu 16.04 users might have to modify docker-compose.yml file and need to change version 3 to 2. issue
Note: For setting celery inside docker follow these instruction. Sometimes gunicorn doesn't work properly with default background task handler inside docker. In such cases users can enable celery.
Creating Directory
Users first have to create directory from web interface.
Note: Currently '/' and few other special characters are not allowed as characters in directory name. If users are facing problem when accessing directory, then they are advised to rename directory and remove special characters.

Adding Links
Users have to navigate to required directory and then need to add links to it. URLs are fetched asynchronously from the source for gathering metadata initially. Users have to wait for few seconds, after that page will refresh automatically showing new content. It may happen, nothing would show up after automatic page refresh (e.g. due to slow URL fetching) then try refreshing page manually by clicking on directory entry again. Maybe in future, I will have to look into django channels and websockets to enable real-time duplex communication between client and server.

This feature has been implemented using NLTK library. The library has been used for proper tokenization and removing stopwords from sentence. Once stopwords are removed, top K high frequency words (where value of K is decided by user) are used as tags. In order to generate summary of HTML content, score is alloted to a sentence based on frequency of non-stopwords contained in it. After that highests score sentences (forming 1/3'rd of total content) are used to generate summary. It is one of the simplest methods for automatic tagging and summarization, hence not perfect. It can't tag group of meaningful words. e.g. It will not consider 'data structure' as a single tag. Supporting multi-word tags is in TODO list of the project.
About summarization, there are many advance methods which may give even more better results, which users can find in this paper. Both these feature needs to be activated from Settings box. It is off by default.

Once user will open link using inbuilt reader, the application will try to present text content, properly formatted for mobile devices whenever possible. In reader mode user will also find options Original, PDF and PNG, at the top header. These options will be available only when user has archived the link in those formats. Options for selecting archive file format is available in every user's Settings box. If Original, format is selected then users can see the text content along with original stylesheet and linked images. Javascript will be removed from original file format due to security reasons. If page can't be displayed due to lack of javascript then users have to depend on either PDF or full-page PNG formats.

PDF and full-page screenshot in PNG format of HTML page will be generated using wkhtmltopdf. It is headless tool but in some distro it might not be packaged with headless feature. In such cases, users have to run it using Xvfb. In order to use it headlessly using Xvfb, set USE_XVFB = True in reminiscence/settings.py file and then install xvfb using command line.
Note: Use Xvfb, only when wkhtmltopdf is not packaged with headless feature.
Note: Alternatively Users can also download official headless wkhtmltopdf for their resepctive distro/OS from here. Only problem is that, users will have to update the package manually on their own for every new update.
Why not use Headless Chromium?
Currently headless chromium doesn't support full page screenshot, otherwise I might have used it blindly. There is another headless browser hlspy, based on QtWebEngine, which I built for my personal use. hlspy can generate entire html content, pdf document and full page screenshot in one single request and that too using just one single process. In both chromium and wkhtmltopdf, one has to execute atleast two separate processes for doing the same thing. The main problem with hlspy is that it is not completely headless, it can't run without X. It requires xvfb for running in headless environment.
In future, I'll try to provide a way to choose between different backends (i.e. chromium, wkhtmltopdf or hlspy) for performing these tasks.
Note: From v0.2+ onwards, support for headless Chromium has been added for generating HTML and PDF content. Users can use this feature if default archived content has some discrepancies. Users need to install Chromium to use this feature.
Note: This feature is available from v0.2+ onwards
In settings.py file add your favourite download manager to DOWNLOAD_MANAGERS_ALLOWED list. Default are curl and wget. In case of docker based method users have to make corresponding changes in dockersettings.py file. For large arbitrary files with direct download links, curl and wget are good enough. For complex use cases users will need something like youtube-dl, which they have to install and manage on their own and needs to be added to the DOWNLOAD_MANAGERS_ALLOWED list.
open web-interface settings box and add command to Download Manager Field:
ex: wget {iurl} -O {output}
iurl -> input url
output -> output path
OR
ex: youtube-dl {iurl} -o {output}
Users should not substitute anything for {iurl} and {output} field, they should be kept as it is. In short, users should just write regular command with parameters and leave the {iurl} and {output} field untouched. (Note: do not even remove curly brackets).
Reminiscence server will take care of setting up of input url i.e. {iurl} and output path field i.e. {output}.
If user is using youtube-dl as a download manager, then it is advisable to install ffmpeg along with it. In this case user has to take care of regular updating of youtube-dl on their own. In docker based installation, users have to add installation instructions for ffmpeg in Dockerfile; and then need to modify requirements.txt and add youtube_dl as dependency.
Web-interface settings box also contains, streaming option. If this option is enabled, then HTML5 compliant media files can be played inside browsers, otherwise they will be available for download
$ claude mcp add reminiscence \
-- python -m otcore.mcp_server <graph>