Block access to Gitea's archive feature
Crawlers have been hitting the archive URLs in Gitea, which can result in massive cached archive files filling the disk faster than the daily cron clears them out. This feature is an attractive nuisance anyway for many projects, particularly Python-based source repositories for which users mistakenly assume that a tarball of the worktree is a suitable substitute for an sdist package, which leads to a lot of confusion if build backends like PBR or setuptools-scm are relied on. Fortunately, Gitea now has a way to turn off this functionality. Add a test to make sure these URLs return a 404 in order to prevent any accidental future regression. Disable the archive cleanup cron as well, since it's just a no-op at this point. Change-Id: I0912243f40f2101bf1f3133fbf306def10aa5f83
This commit is contained in:
		| @@ -39,6 +39,7 @@ ROOT = /data/git/repositories | ||||
| DISABLED_REPO_UNITS = repo.issues,repo.pulls,repo.wiki,repo.projects,repo.actions | ||||
| DISABLE_STARS = true | ||||
| DISABLE_MIGRATIONS = true | ||||
| DISABLE_DOWNLOAD_SOURCE_ARCHIVES = true | ||||
|  | ||||
| [git] | ||||
| ; Implemented in 1.16 but broke older git clients. Now expected to work | ||||
| @@ -128,16 +129,15 @@ STORAGE_TYPE = local | ||||
| PATH = /data/git/lfs | ||||
|  | ||||
| ; This is an undocumented gitea cron job that will delete all | ||||
| ; repo archives once daily at midnight. Repo archives are | ||||
| ; repo archives periodically if enabled. Repo archives are | ||||
| ; tarballs/zips/etc of repository state generate for things like | ||||
| ; tags. This helps ensure we don't run out of disk. | ||||
| ; tags. We used to rely on it, but some crawlers are so aggressive | ||||
| ; they manage to fill up our filesystems between scheduled cleanups | ||||
| ; so instead we've blocked access to the feature entirely. This | ||||
| ; defaults to disabled, but keep it explicit in here as a reminder | ||||
| ; in case we ever revert the change and restore archive access. | ||||
| [cron.delete_repo_archives] | ||||
| ENABLED = true | ||||
| RUN_AT_START = false | ||||
| NOTICE_ON_SUCCESS = false | ||||
| ; Note we run this several hours after 0000 (midnight) to avoid conflict | ||||
| ; with default cron jobs run by gitea at that time. | ||||
| SCHEDULE = 0 0 3 * * * | ||||
| ENABLED = false | ||||
|  | ||||
| ; We don't need gitea phoning out to check versions. We stay on | ||||
| ; top of new releases using github release notifications over email. | ||||
|   | ||||
| @@ -71,6 +71,23 @@ def test_proxy_ua_blacklist(host): | ||||
|                    'https://gitea99.opendev.org:3081/') | ||||
|     assert '403 Forbidden' in cmd.stdout | ||||
|  | ||||
| def test_disable_archives(host): | ||||
|     cmd = host.run('curl --insecure ' | ||||
|                    '--resolve gitea99.opendev.org:3081:127.0.0.1 ' | ||||
|                    'https://gitea99.opendev.org:3081/' | ||||
|                    'opendev/system-config/archive/master.bundle') | ||||
|     assert cmd.stdout == 'Not Found\n' | ||||
|     cmd = host.run('curl --insecure ' | ||||
|                    '--resolve gitea99.opendev.org:3081:127.0.0.1 ' | ||||
|                    'https://gitea99.opendev.org:3081/' | ||||
|                    'opendev/system-config/archive/master.tar.gz') | ||||
|     assert cmd.stdout == 'Not Found\n' | ||||
|     cmd = host.run('curl --insecure ' | ||||
|                    '--resolve gitea99.opendev.org:3081:127.0.0.1 ' | ||||
|                    'https://gitea99.opendev.org:3081/' | ||||
|                    'opendev/system-config/archive/master.zip') | ||||
|     assert cmd.stdout == 'Not Found\n' | ||||
|  | ||||
| def test_ondisk_logs(host): | ||||
|     mariadb_log = host.file('/var/log/containers/docker-mariadb.log') | ||||
|     assert mariadb_log.exists | ||||
|   | ||||
		Reference in New Issue
	
	Block a user
	 Jeremy Stanley
					Jeremy Stanley