41 Commits

Author SHA1 Message Date
Stepan Vladovskiy
e382cc1ea5 Merge branch 'dev' into feat/sv-searching-txtai
All checks were successful
Deploy on push / deploy (push) Successful in 6s
:
2025-04-15 19:20:48 -03:00
to
83d61ca76d Merge branch 'dev' into feat/sv-searching-txtai
All checks were successful
Deploy on push / deploy (push) Successful in 6s
2025-04-13 05:36:18 +00:00
Stepan Vladovskiy
106222b0e0 debug: without debug logging. clean
All checks were successful
Deploy on push / deploy (push) Successful in 1m27s
2025-04-07 11:41:48 -03:00
Stepan Vladovskiy
c533241d1e fix(reader): sorting by rang not by id in cash
All checks were successful
Deploy on push / deploy (push) Successful in 6s
2025-04-03 13:51:13 -03:00
Stepan Vladovskiy
78326047bf fix(reader.py): change sorting and answer on querys
All checks were successful
Deploy on push / deploy (push) Successful in 50s
2025-04-03 13:20:18 -03:00
Stepan Vladovskiy
bc4ec79240 fix(search.py): store all results in cash not only first offset
All checks were successful
Deploy on push / deploy (push) Successful in 52s
2025-04-03 13:10:53 -03:00
Stepan Vladovskiy
a0db5707c4 feat: add cash for storing searchresalts and hold them for working pagination. Now we are have offset for use on frontend
All checks were successful
Deploy on push / deploy (push) Successful in 51s
2025-04-01 16:01:09 -03:00
Stepan Vladovskiy
ecc443c3ad refactor(reader.py): Remove the unnecessary topic joins that cause duplicate results
All checks were successful
Deploy on push / deploy (push) Successful in 51s
2025-04-01 12:57:46 -03:00
Stepan Vladovskiy
9a02ca74ad merged with dev
All checks were successful
Deploy on push / deploy (push) Successful in 1m24s
2025-03-31 13:38:32 -03:00
Stepan Vladovskiy
9ebb81cbd3 refactor(reader.py): rm debug line 2025-03-31 13:32:51 -03:00
Stepan Vladovskiy
0bc55977ac debug(reader.py): query_with_stat(info) always
All checks were successful
Deploy on push / deploy (push) Successful in 51s
2025-03-27 15:18:08 -03:00
Stepan Vladovskiy
ff3a4debce debug(reader.py): trying to handle main topic ids founded
All checks were successful
Deploy on push / deploy (push) Successful in 54s
2025-03-27 14:43:17 -03:00
Stepan Vladovskiy
ae85b32f69 feat(type.qraphql): SearchResult with shout id
All checks were successful
Deploy on push / deploy (push) Successful in 51s
2025-03-27 14:06:52 -03:00
Stepan Vladovskiy
34a354e9e3 debug(reader.py: trying back shout id in query call
All checks were successful
Deploy on push / deploy (push) Successful in 52s
2025-03-27 11:54:56 -03:00
Stepan Vladovskiy
e405fb527b refactor(search.py): moved to use one table docs for embdings and docs store
All checks were successful
Deploy on push / deploy (push) Successful in 50s
2025-03-25 16:42:44 -03:00
Stepan Vladovskiy
7f36f93d92 feat(search.py): detects both missing documents and null embeddings
All checks were successful
Deploy on push / deploy (push) Successful in 1m32s
2025-03-25 15:18:29 -03:00
Stepan Vladovskiy
f089a32394 debug(search.py): with more logs when check sync of indexing
All checks were successful
Deploy on push / deploy (push) Successful in 1m3s
2025-03-25 14:44:05 -03:00
Stepan Vladovskiy
1fd623a660 feat: with index sync endpoints configs
All checks were successful
Deploy on push / deploy (push) Successful in 56s
2025-03-25 13:31:45 -03:00
Stepan Vladovskiy
88012f1b8c debug(server.py): with 4 workers (threds). cheking reindexing
All checks were successful
Deploy on push / deploy (push) Successful in 55s
2025-03-25 12:21:59 -03:00
Stepan Vladovskiy
6e284640c0 feat: give little timeout for resource stab
All checks were successful
Deploy on push / deploy (push) Successful in 51s
2025-03-24 21:42:51 -03:00
Stepan Vladovskiy
077cb46482 debug: server.py -> threds 1 , search.py -> add 3 times reconect
All checks were successful
Deploy on push / deploy (push) Successful in 49s
2025-03-24 20:16:07 -03:00
Stepan Vladovskiy
60a13a9097 refactor(search.py): moved initialization logic in search-txtai instance
All checks were successful
Deploy on push / deploy (push) Successful in 55s
2025-03-24 19:47:02 -03:00
Stepan Vladovskiy
316375bf18 debug(search.py): encrease batch size for bulk indexing
All checks were successful
Deploy on push / deploy (push) Successful in 1m1s
2025-03-21 17:56:54 -03:00
Stepan Vladovskiy
fb820f67fd debug(search.py): encrease batch size for bulk indexing
All checks were successful
Deploy on push / deploy (push) Successful in 53s
2025-03-21 17:48:26 -03:00
Stepan Vladovskiy
f1d9f4e036 feat(search.py): with db reset endpoint
All checks were successful
Deploy on push / deploy (push) Successful in 53s
2025-03-21 17:28:54 -03:00
Stepan Vladovskiy
ebb67eb311 debug: decrease chars in search.py for bulk indexing
All checks were successful
Deploy on push / deploy (push) Successful in 52s
2025-03-21 16:53:00 -03:00
Stepan Vladovskiy
50a8c24ead feat(search.py): documnet for bulk indexing are categorized
All checks were successful
Deploy on push / deploy (push) Successful in 55s
2025-03-21 15:40:29 -03:00
Stepan Vladovskiy
eb4b9363ab debug: change logs entris and indexing not wraps all in documents
All checks were successful
Deploy on push / deploy (push) Successful in 53s
2025-03-21 14:32:45 -03:00
Stepan Vladovskiy
19c5028a0c debug: Limit max chars for bulk indexing
All checks were successful
Deploy on push / deploy (push) Successful in 53s
2025-03-21 14:18:32 -03:00
Stepan Vladovskiy
57e1e8e6bd debug: more logs in indexing
All checks were successful
Deploy on push / deploy (push) Successful in 53s
2025-03-21 14:10:09 -03:00
Stepan Vladovskiy
385057ffcd debug: with logs in indexing procedure
All checks were successful
Deploy on push / deploy (push) Successful in 54s
2025-03-21 13:45:50 -03:00
Stepan Vladovskiy
90699768ff debug: start index
All checks were successful
Deploy on push / deploy (push) Successful in 55s
2025-03-21 13:30:23 -03:00
Stepan Vladovskiy
ad0ca75aa9 debug: no redis for indexing in nackend side
All checks were successful
Deploy on push / deploy (push) Successful in 1m41s
2025-03-19 14:47:31 -03:00
Stepan Vladovskiy
39242d5e6c debug: add logs in search.py and change and input validation ... index ver too
All checks were successful
Deploy on push / deploy (push) Successful in 55s
2025-03-12 14:13:55 -03:00
Stepan Vladovskiy
24cca7f2cb debug: something wrong one stap back with logs
All checks were successful
Deploy on push / deploy (push) Successful in 53s
2025-03-12 13:11:19 -03:00
Stepan Vladovskiy
a9c7ac49d6 feat: with logs >>>
All checks were successful
Deploy on push / deploy (push) Successful in 59s
2025-03-12 13:07:27 -03:00
Stepan Vladovskiy
f249752db5 feat: moved txtai and search procedure in different instance
All checks were successful
Deploy on push / deploy (push) Successful in 2m18s
2025-03-12 12:06:09 -03:00
Stepan Vladovskiy
c0b2116da2 feat(db.py): added fetch_all_shouts, to populate the search index
All checks were successful
Deploy on push / deploy (push) Successful in 35s
2025-03-05 20:32:34 +00:00
Stepan Vladovskiy
59e71c8144 debug: fixed workflows gitea
All checks were successful
Deploy on push / deploy (push) Successful in 4m41s
2025-03-05 20:17:34 +00:00
Stepan Vladovskiy
e6a416383d debug: fixed workflows gitea
All checks were successful
Deploy on push / deploy (push) Successful in 15s
2025-03-05 20:16:32 +00:00
Stepan Vladovskiy
d55448398d feat(search.py): change to txtai server, with ai model. And fix granian workers 2025-03-05 20:08:21 +00:00
40 changed files with 1329 additions and 1401 deletions

View File

@@ -29,7 +29,16 @@ jobs:
if: github.ref == 'refs/heads/dev'
uses: dokku/github-action@master
with:
branch: 'dev'
branch: 'main'
force: true
git_remote_url: 'ssh://dokku@v2.discours.io:22/core'
ssh_private_key: ${{ secrets.SSH_PRIVATE_KEY }}
- name: Push to dokku for staging branch
if: github.ref == 'refs/heads/staging'
uses: dokku/github-action@master
with:
branch: 'dev'
git_remote_url: 'ssh://dokku@staging.discours.io:22/core'
ssh_private_key: ${{ secrets.SSH_PRIVATE_KEY }}
git_push_flags: '--force'

5
.gitignore vendored
View File

@@ -128,6 +128,9 @@ dmypy.json
.idea
temp.*
# Debug
DEBUG.log
discours.key
discours.crt
discours.pem
@@ -162,4 +165,4 @@ views.json
*.crt
*cache.json
.cursor
.devcontainer/devcontainer.json
.devcontainer/

View File

@@ -1,14 +1,3 @@
#### [0.4.20] - 2025-05-03
- Исправлена ошибка в классе `CacheRevalidationManager`: добавлена инициализация атрибута `_redis`
- Улучшена обработка соединения с Redis в менеджере ревалидации кэша:
- Автоматическое восстановление соединения в случае его потери
- Проверка соединения перед выполнением операций с кэшем
- Дополнительное логирование для упрощения диагностики проблем
- Исправлен резолвер `unpublish_shout`:
- Корректное формирование синтетического поля `publication` с `published_at: null`
- Возвращение полноценного словаря с данными вместо объекта модели
- Улучшена загрузка связанных данных (авторы, темы) для правильного формирования ответа
#### [0.4.19] - 2025-04-14
- dropped `Shout.description` and `Draft.description` to be UX-generated
- use redis to init views counters after migrator

View File

@@ -57,7 +57,7 @@ mkcert localhost
Then, run the server:
```shell
python -m granian main:app --interface asgi --host 0.0.0.0 --port 8000
python server.py dev
```
### Useful Commands

119
auth/usermodel.py Normal file
View File

@@ -0,0 +1,119 @@
import time
from sqlalchemy import (
JSON,
Boolean,
Column,
DateTime,
ForeignKey,
Integer,
String,
func,
)
from sqlalchemy.orm import relationship
from services.db import Base
class Permission(Base):
__tablename__ = "permission"
id = Column(String, primary_key=True, unique=True, nullable=False, default=None)
resource = Column(String, nullable=False)
operation = Column(String, nullable=False)
class Role(Base):
__tablename__ = "role"
id = Column(String, primary_key=True, unique=True, nullable=False, default=None)
name = Column(String, nullable=False)
permissions = relationship(Permission)
class AuthorizerUser(Base):
__tablename__ = "authorizer_users"
id = Column(String, primary_key=True, unique=True, nullable=False, default=None)
key = Column(String)
email = Column(String, unique=True)
email_verified_at = Column(Integer)
family_name = Column(String)
gender = Column(String)
given_name = Column(String)
is_multi_factor_auth_enabled = Column(Boolean)
middle_name = Column(String)
nickname = Column(String)
password = Column(String)
phone_number = Column(String, unique=True)
phone_number_verified_at = Column(Integer)
# preferred_username = Column(String, nullable=False)
picture = Column(String)
revoked_timestamp = Column(Integer)
roles = Column(String, default="author,reader")
signup_methods = Column(String, default="magic_link_login")
created_at = Column(Integer, default=lambda: int(time.time()))
updated_at = Column(Integer, default=lambda: int(time.time()))
class UserRating(Base):
__tablename__ = "user_rating"
id = None
rater: Column = Column(ForeignKey("user.id"), primary_key=True, index=True)
user: Column = Column(ForeignKey("user.id"), primary_key=True, index=True)
value: Column = Column(Integer)
@staticmethod
def init_table():
pass
class UserRole(Base):
__tablename__ = "user_role"
id = None
user = Column(ForeignKey("user.id"), primary_key=True, index=True)
role = Column(ForeignKey("role.id"), primary_key=True, index=True)
class User(Base):
__tablename__ = "user"
default_user = None
email = Column(String, unique=True, nullable=False, comment="Email")
username = Column(String, nullable=False, comment="Login")
password = Column(String, nullable=True, comment="Password")
bio = Column(String, nullable=True, comment="Bio") # status description
about = Column(String, nullable=True, comment="About") # long and formatted
userpic = Column(String, nullable=True, comment="Userpic")
name = Column(String, nullable=True, comment="Display name")
slug = Column(String, unique=True, comment="User's slug")
links = Column(JSON, nullable=True, comment="Links")
oauth = Column(String, nullable=True)
oid = Column(String, nullable=True)
muted = Column(Boolean, default=False)
confirmed = Column(Boolean, default=False)
created_at = Column(DateTime(timezone=True), nullable=False, server_default=func.now(), comment="Created at")
updated_at = Column(DateTime(timezone=True), nullable=False, server_default=func.now(), comment="Updated at")
last_seen = Column(DateTime(timezone=True), nullable=False, server_default=func.now(), comment="Was online at")
deleted_at = Column(DateTime(timezone=True), nullable=True, comment="Deleted at")
ratings = relationship(UserRating, foreign_keys=UserRating.user)
roles = relationship(lambda: Role, secondary=UserRole.__tablename__)
def get_permission(self):
scope = {}
for role in self.roles:
for p in role.permissions:
if p.resource not in scope:
scope[p.resource] = set()
scope[p.resource].add(p.operation)
print(scope)
return scope
# if __name__ == "__main__":
# print(User.get_permission(user_id=1))

15
cache/revalidator.py vendored
View File

@@ -8,7 +8,6 @@ from cache.cache import (
invalidate_cache_by_prefix,
)
from resolvers.stat import get_with_stat
from services.redis import redis
from utils.logger import root_logger as logger
CACHE_REVALIDATION_INTERVAL = 300 # 5 minutes
@@ -22,19 +21,9 @@ class CacheRevalidationManager:
self.lock = asyncio.Lock()
self.running = True
self.MAX_BATCH_SIZE = 10 # Максимальное количество элементов для поштучной обработки
self._redis = redis # Добавлена инициализация _redis для доступа к Redis-клиенту
async def start(self):
"""Запуск фонового воркера для ревалидации кэша."""
# Проверяем, что у нас есть соединение с Redis
if not self._redis._client:
logger.warning("Redis connection not established. Waiting for connection...")
try:
await self._redis.connect()
logger.info("Redis connection established for revalidation manager")
except Exception as e:
logger.error(f"Failed to connect to Redis: {e}")
self.task = asyncio.create_task(self.revalidate_cache())
async def revalidate_cache(self):
@@ -50,10 +39,6 @@ class CacheRevalidationManager:
async def process_revalidation(self):
"""Обновление кэша для всех сущностей, требующих ревалидации."""
# Проверяем соединение с Redis
if not self._redis._client:
return # Выходим из метода, если не удалось подключиться
async with self.lock:
# Ревалидация кэша авторов
if self.items_to_revalidate["authors"]:

24
cache/triggers.py vendored
View File

@@ -88,11 +88,7 @@ def after_reaction_handler(mapper, connection, target):
with local_session() as session:
shout = (
session.query(Shout)
.filter(
Shout.id == shout_id,
Shout.published_at.is_not(None),
Shout.deleted_at.is_(None),
)
.filter(Shout.id == shout_id, Shout.published_at.is_not(None), Shout.deleted_at.is_(None))
.first()
)
@@ -112,27 +108,15 @@ def events_register():
event.listen(AuthorFollower, "after_insert", after_follower_handler)
event.listen(AuthorFollower, "after_update", after_follower_handler)
event.listen(
AuthorFollower,
"after_delete",
lambda *args: after_follower_handler(*args, is_delete=True),
)
event.listen(AuthorFollower, "after_delete", lambda *args: after_follower_handler(*args, is_delete=True))
event.listen(TopicFollower, "after_insert", after_follower_handler)
event.listen(TopicFollower, "after_update", after_follower_handler)
event.listen(
TopicFollower,
"after_delete",
lambda *args: after_follower_handler(*args, is_delete=True),
)
event.listen(TopicFollower, "after_delete", lambda *args: after_follower_handler(*args, is_delete=True))
event.listen(ShoutReactionsFollower, "after_insert", after_follower_handler)
event.listen(ShoutReactionsFollower, "after_update", after_follower_handler)
event.listen(
ShoutReactionsFollower,
"after_delete",
lambda *args: after_follower_handler(*args, is_delete=True),
)
event.listen(ShoutReactionsFollower, "after_delete", lambda *args: after_follower_handler(*args, is_delete=True))
event.listen(Reaction, "after_update", mark_for_revalidation)
event.listen(Author, "after_update", mark_for_revalidation)

View File

@@ -147,32 +147,16 @@ await invalidate_topics_cache(456)
```python
class CacheRevalidationManager:
def __init__(self, interval=CACHE_REVALIDATION_INTERVAL):
# ...
self._redis = redis # Прямая ссылка на сервис Redis
async def start(self):
# Проверка и установка соединения с Redis
# ...
# ...
async def process_revalidation(self):
# Обработка элементов для ревалидации
# ...
def mark_for_revalidation(self, entity_id, entity_type):
# Добавляет сущность в очередь на ревалидацию
# ...
```
Менеджер ревалидации работает как асинхронный фоновый процесс, который периодически (по умолчанию каждые 5 минут) проверяет наличие сущностей для ревалидации.
**Взаимодействие с Redis:**
- CacheRevalidationManager хранит прямую ссылку на сервис Redis через атрибут `_redis`
- При запуске проверяется наличие соединения с Redis и при необходимости устанавливается новое
- Включена автоматическая проверка соединения перед каждой операцией ревалидации
- Система самостоятельно восстанавливает соединение при его потере
**Особенности реализации:**
Особенности реализации:
- Для авторов и тем используется поштучная ревалидация каждой записи
- Для шаутов и реакций используется батчевая обработка, с порогом в 10 элементов
- При достижении порога система переключается на инвалидацию коллекций вместо поштучной обработки

48
main.py
View File

@@ -17,7 +17,8 @@ from cache.revalidator import revalidation_manager
from services.exception import ExceptionHandlerMiddleware
from services.redis import redis
from services.schema import create_all_tables, resolvers
from services.search import search_service
#from services.search import search_service
from services.search import search_service, initialize_search_index
from services.viewed import ViewedStorage
from services.webhook import WebhookEndpoint, create_webhook_endpoint
from settings import DEV_SERVER_PID_FILE_NAME, MODE
@@ -34,24 +35,67 @@ async def start():
f.write(str(os.getpid()))
print(f"[main] process started in {MODE} mode")
async def check_search_service():
"""Check if search service is available and log result"""
info = await search_service.info()
if info.get("status") in ["error", "unavailable"]:
print(f"[WARNING] Search service unavailable: {info.get('message', 'unknown reason')}")
else:
print(f"[INFO] Search service is available: {info}")
# indexing DB data
# async def indexing():
# from services.db import fetch_all_shouts
# all_shouts = await fetch_all_shouts()
# await initialize_search_index(all_shouts)
async def lifespan(_app):
try:
print("[lifespan] Starting application initialization")
create_all_tables()
await asyncio.gather(
redis.connect(),
precache_data(),
ViewedStorage.init(),
create_webhook_endpoint(),
search_service.info(),
check_search_service(),
start(),
revalidation_manager.start(),
)
print("[lifespan] Basic initialization complete")
# Add a delay before starting the intensive search indexing
print("[lifespan] Waiting for system stabilization before search indexing...")
await asyncio.sleep(10) # 10-second delay to let the system stabilize
# Start search indexing as a background task with lower priority
asyncio.create_task(initialize_search_index_background())
yield
finally:
print("[lifespan] Shutting down application services")
tasks = [redis.disconnect(), ViewedStorage.stop(), revalidation_manager.stop()]
await asyncio.gather(*tasks, return_exceptions=True)
print("[lifespan] Shutdown complete")
# Initialize search index in the background
async def initialize_search_index_background():
"""Run search indexing as a background task with low priority"""
try:
print("[search] Starting background search indexing process")
from services.db import fetch_all_shouts
# Get total count first (optional)
all_shouts = await fetch_all_shouts()
total_count = len(all_shouts) if all_shouts else 0
print(f"[search] Fetched {total_count} shouts for background indexing")
# Start the indexing process with the fetched shouts
print("[search] Beginning background search index initialization...")
await initialize_search_index(all_shouts)
print("[search] Background search index initialization complete")
except Exception as e:
print(f"[search] Error in background search indexing: {str(e)}")
# Создаем экземпляр GraphQL
graphql_app = GraphQL(schema, debug=True)

View File

@@ -90,6 +90,7 @@ class Author(Base):
Модель автора в системе.
Attributes:
user (str): Идентификатор пользователя в системе авторизации
name (str): Отображаемое имя
slug (str): Уникальный строковый идентификатор
bio (str): Краткая биография/статус
@@ -104,6 +105,8 @@ class Author(Base):
__tablename__ = "author"
user = Column(String) # unbounded link with authorizer's User type
name = Column(String, nullable=True, comment="Display name")
slug = Column(String, unique=True, comment="Author's slug")
bio = Column(String, nullable=True, comment="Bio") # status description
@@ -121,14 +124,12 @@ class Author(Base):
# Определяем индексы
__table_args__ = (
# Индекс для быстрого поиска по имени
Index("idx_author_name", "name"),
# Индекс для быстрого поиска по slug
Index("idx_author_slug", "slug"),
# Индекс для быстрого поиска по идентификатору пользователя
Index("idx_author_user", "user"),
# Индекс для фильтрации неудаленных авторов
Index(
"idx_author_deleted_at", "deleted_at", postgresql_where=deleted_at.is_(None)
),
Index("idx_author_deleted_at", "deleted_at", postgresql_where=deleted_at.is_(None)),
# Индекс для сортировки по времени создания (для новых авторов)
Index("idx_author_created_at", "created_at"),
# Индекс для сортировки по времени последнего посещения

View File

@@ -26,14 +26,12 @@ class DraftAuthor(Base):
caption = Column(String, nullable=True, default="")
class Draft(Base):
__tablename__ = "draft"
# required
created_at: int = Column(Integer, nullable=False, default=lambda: int(time.time()))
# Колонки для связей с автором
created_by: int = Column("created_by", ForeignKey("author.id"), nullable=False)
community: int = Column("community", ForeignKey("community.id"), nullable=False, default=1)
created_by: int = Column(ForeignKey("author.id"), nullable=False)
community: int = Column(ForeignKey("community.id"), nullable=False, default=1)
# optional
layout: str = Column(String, nullable=True, default="article")
@@ -51,55 +49,7 @@ class Draft(Base):
# auto
updated_at: int | None = Column(Integer, nullable=True, index=True)
deleted_at: int | None = Column(Integer, nullable=True, index=True)
updated_by: int | None = Column("updated_by", ForeignKey("author.id"), nullable=True)
deleted_by: int | None = Column("deleted_by", ForeignKey("author.id"), nullable=True)
# --- Relationships ---
# Только many-to-many связи через вспомогательные таблицы
authors = relationship(Author, secondary="draft_author", lazy="select")
topics = relationship(Topic, secondary="draft_topic", lazy="select")
# Связь с Community (если нужна как объект, а не ID)
# community = relationship("Community", foreign_keys=[community_id], lazy="joined")
# Пока оставляем community_id как ID
# Связь с публикацией (один-к-одному или один-к-нулю)
# Загружается через joinedload в резолвере
publication = relationship(
"Shout",
primaryjoin="Draft.id == Shout.draft",
foreign_keys="Shout.draft",
uselist=False,
lazy="noload", # Не грузим по умолчанию, только через options
viewonly=True # Указываем, что это связь только для чтения
)
def dict(self):
"""
Сериализует объект Draft в словарь.
Гарантирует, что поля topics и authors всегда будут списками.
"""
return {
"id": self.id,
"created_at": self.created_at,
"created_by": self.created_by,
"community": self.community,
"layout": self.layout,
"slug": self.slug,
"title": self.title,
"subtitle": self.subtitle,
"lead": self.lead,
"body": self.body,
"media": self.media or [],
"cover": self.cover,
"cover_caption": self.cover_caption,
"lang": self.lang,
"seo": self.seo,
"updated_at": self.updated_at,
"deleted_at": self.deleted_at,
"updated_by": self.updated_by,
"deleted_by": self.deleted_by,
# Гарантируем, что topics и authors всегда будут списками
"topics": [topic.dict() for topic in (self.topics or [])],
"authors": [author.dict() for author in (self.authors or [])]
}
updated_by: int | None = Column(ForeignKey("author.id"), nullable=True)
deleted_by: int | None = Column(ForeignKey("author.id"), nullable=True)
authors = relationship(Author, secondary="draft_author")
topics = relationship(Topic, secondary="draft_topic")

View File

@@ -1,2 +0,0 @@
[tool.ruff]
line-length = 108

View File

@@ -13,6 +13,10 @@ starlette
gql
ariadne
granian
# NLP and search
httpx
orjson
pydantic
trafilatura

View File

@@ -16,11 +16,9 @@ from resolvers.draft import (
delete_draft,
load_drafts,
publish_draft,
unpublish_draft,
update_draft,
)
from resolvers.editor import (
unpublish_shout,
)
from resolvers.feed import (
load_shouts_coauthored,
load_shouts_discussed,

View File

@@ -77,45 +77,18 @@ async def get_authors_with_stats(limit=50, offset=0, by: Optional[str] = None):
base_query = select(Author).where(Author.deleted_at.is_(None))
# Применяем сортировку
# vars for statistics sorting
stats_sort_field = None
stats_sort_direction = "desc"
if by:
if isinstance(by, dict):
logger.debug(f"Processing dict-based sorting: {by}")
# Обработка словаря параметров сортировки
from sqlalchemy import asc, desc, func
from orm.shout import ShoutAuthor
from orm.author import AuthorFollower
from sqlalchemy import asc, desc
# Checking for order field in the dictionary
if "order" in by:
order_value = by["order"]
logger.debug(f"Found order field with value: {order_value}")
if order_value in ["shouts", "followers", "rating", "comments"]:
stats_sort_field = order_value
stats_sort_direction = "desc" # По умолчанию убывающая сортировка для статистики
logger.debug(f"Applying statistics-based sorting by: {stats_sort_field}")
elif order_value == "name":
# Sorting by name in ascending order
base_query = base_query.order_by(asc(Author.name))
logger.debug("Applying alphabetical sorting by name")
else:
# If order is not a stats field, treat it as a regular field
column = getattr(Author, order_value, None)
if column:
for field, direction in by.items():
column = getattr(Author, field, None)
if column:
if direction.lower() == "desc":
base_query = base_query.order_by(desc(column))
else:
# Regular sorting by fields
for field, direction in by.items():
column = getattr(Author, field, None)
if column:
if direction.lower() == "desc":
base_query = base_query.order_by(desc(column))
else:
base_query = base_query.order_by(column)
else:
base_query = base_query.order_by(column)
elif by == "new":
base_query = base_query.order_by(desc(Author.created_at))
elif by == "active":
@@ -126,55 +99,6 @@ async def get_authors_with_stats(limit=50, offset=0, by: Optional[str] = None):
else:
base_query = base_query.order_by(desc(Author.created_at))
# If sorting by statistics, modify the query
if stats_sort_field == "shouts":
# Sorting by the number of shouts
from sqlalchemy import func, and_
from orm.shout import Shout, ShoutAuthor
subquery = (
select(
ShoutAuthor.author,
func.count(func.distinct(Shout.id)).label("shouts_count")
)
.select_from(ShoutAuthor)
.join(Shout, ShoutAuthor.shout == Shout.id)
.where(
and_(
Shout.deleted_at.is_(None),
Shout.published_at.is_not(None)
)
)
.group_by(ShoutAuthor.author)
.subquery()
)
base_query = (
base_query
.outerjoin(subquery, Author.id == subquery.c.author)
.order_by(desc(func.coalesce(subquery.c.shouts_count, 0)))
)
elif stats_sort_field == "followers":
# Sorting by the number of followers
from sqlalchemy import func
from orm.author import AuthorFollower
subquery = (
select(
AuthorFollower.author,
func.count(func.distinct(AuthorFollower.follower)).label("followers_count")
)
.select_from(AuthorFollower)
.group_by(AuthorFollower.author)
.subquery()
)
base_query = (
base_query
.outerjoin(subquery, Author.id == subquery.c.author)
.order_by(desc(func.coalesce(subquery.c.followers_count, 0)))
)
# Применяем лимит и смещение
base_query = base_query.limit(limit).offset(offset)

View File

@@ -1,6 +1,8 @@
import time
from operator import or_
import trafilatura
from sqlalchemy.orm import joinedload
from sqlalchemy.sql import and_
from cache.cache import (
cache_author,
@@ -10,7 +12,7 @@ from cache.cache import (
invalidate_shouts_cache,
)
from orm.author import Author
from orm.draft import Draft, DraftAuthor, DraftTopic
from orm.draft import Draft
from orm.shout import Shout, ShoutAuthor, ShoutTopic
from orm.topic import Topic
from services.auth import login_required
@@ -18,70 +20,34 @@ from services.db import local_session
from services.notify import notify_shout
from services.schema import mutation, query
from services.search import search_service
from utils.html_wrapper import wrap_html_fragment
from utils.logger import root_logger as logger
def create_shout_from_draft(session, draft, author_id):
"""
Создаёт новый объект публикации (Shout) на основе черновика.
Args:
session: SQLAlchemy сессия (не используется, для совместимости)
draft (Draft): Объект черновика
author_id (int): ID автора публикации
Returns:
Shout: Новый объект публикации (не сохранённый в базе)
Пример:
>>> from orm.draft import Draft
>>> draft = Draft(id=1, title='Заголовок', body='Текст', slug='slug', created_by=1)
>>> shout = create_shout_from_draft(None, draft, 1)
>>> shout.title
'Заголовок'
>>> shout.body
'Текст'
>>> shout.created_by
1
"""
# Создаем новую публикацию
shout = Shout(
body=draft.body or "",
body=draft.body,
slug=draft.slug,
cover=draft.cover,
cover_caption=draft.cover_caption,
lead=draft.lead,
title=draft.title or "",
title=draft.title,
subtitle=draft.subtitle,
layout=draft.layout or "article",
media=draft.media or [],
lang=draft.lang or "ru",
layout=draft.layout,
media=draft.media,
lang=draft.lang,
seo=draft.seo,
created_by=author_id,
community=draft.community,
draft=draft.id,
deleted_at=None,
)
# Инициализируем пустые массивы для связей
shout.topics = []
shout.authors = []
return shout
@query.field("load_drafts")
@login_required
async def load_drafts(_, info):
"""
Загружает все черновики, доступные текущему пользователю.
Предварительно загружает связанные объекты (topics, authors, publication),
чтобы избежать ошибок с отсоединенными объектами при сериализации.
Returns:
dict: Список черновиков или сообщение об ошибке
"""
user_id = info.context.get("user_id")
author_dict = info.context.get("author", {})
author_id = author_dict.get("id")
@@ -89,44 +55,13 @@ async def load_drafts(_, info):
if not user_id or not author_id:
return {"error": "User ID and author ID are required"}
try:
with local_session() as session:
# Предзагружаем authors, topics и связанную publication
drafts_query = (
session.query(Draft)
.options(
joinedload(Draft.topics),
joinedload(Draft.authors),
joinedload(Draft.publication) # Загружаем связанную публикацию
)
.filter(Draft.authors.any(Author.id == author_id))
)
drafts = drafts_query.all()
# Преобразуем объекты в словари, пока они в контексте сессии
drafts_data = []
for draft in drafts:
draft_dict = draft.dict()
# Всегда возвращаем массив для topics, даже если он пустой
draft_dict["topics"] = [topic.dict() for topic in (draft.topics or [])]
draft_dict["authors"] = [author.dict() for author in (draft.authors or [])]
# Добавляем информацию о публикации, если она есть
if draft.publication:
draft_dict["publication"] = {
"id": draft.publication.id,
"slug": draft.publication.slug,
"published_at": draft.publication.published_at
}
else:
draft_dict["publication"] = None
drafts_data.append(draft_dict)
return {"drafts": drafts_data}
except Exception as e:
logger.error(f"Failed to load drafts: {e}", exc_info=True)
return {"error": f"Failed to load drafts: {str(e)}"}
with local_session() as session:
drafts = (
session.query(Draft)
.filter(or_(Draft.authors.any(Author.id == author_id), Draft.created_by == author_id))
.all()
)
return {"drafts": drafts}
@mutation.field("create_draft")
@@ -180,30 +115,22 @@ async def create_draft(_, info, draft_input):
# Remove id from input if present since it's auto-generated
if "id" in draft_input:
del draft_input["id"]
# Добавляем текущее время создания и ID автора
if "seo" not in draft_input and not draft_input["seo"]:
body_teaser = draft_input.get("body", "")[:300].split("\n")[:-1].join("\n")
draft_input["seo"] = draft_input.get("lead", body_teaser)
# Добавляем текущее время создания
draft_input["created_at"] = int(time.time())
draft_input["created_by"] = author_id
draft = Draft(**draft_input)
draft = Draft(created_by=author_id, **draft_input)
session.add(draft)
session.flush()
# Добавляем создателя как автора
da = DraftAuthor(shout=draft.id, author=author_id)
session.add(da)
session.commit()
return {"draft": draft}
except Exception as e:
logger.error(f"Failed to create draft: {e}", exc_info=True)
return {"error": f"Failed to create draft: {str(e)}"}
def generate_teaser(body, limit=300):
body_html = wrap_html_fragment(body)
body_text = trafilatura.extract(body_html, include_comments=False, include_tables=False)
body_teaser = ". ".join(body_text[:limit].split(". ")[:-1])
return body_teaser
@mutation.field("update_draft")
@login_required
@@ -212,21 +139,7 @@ async def update_draft(_, info, draft_id: int, draft_input):
Args:
draft_id: ID черновика для обновления
draft_input: Данные для обновления черновика согласно схеме DraftInput:
- layout: String
- author_ids: [Int!]
- topic_ids: [Int!]
- main_topic_id: Int
- media: [MediaItemInput]
- lead: String
- subtitle: String
- lang: String
- seo: String
- body: String
- title: String
- slug: String
- cover: String
- cover_caption: String
draft_input: Данные для обновления черновика
Returns:
dict: Обновленный черновик или сообщение об ошибке
@@ -238,89 +151,36 @@ async def update_draft(_, info, draft_id: int, draft_input):
if not user_id or not author_id:
return {"error": "Author ID are required"}
try:
with local_session() as session:
draft = session.query(Draft).filter(Draft.id == draft_id).first()
if not draft:
return {"error": "Draft not found"}
# Проверяем slug - он должен быть или не пустым, или не передаваться вообще
if "slug" in draft_input and (draft_input["slug"] is None or draft_input["slug"] == ""):
# Если slug пустой, либо удаляем его из входных данных, либо генерируем временный уникальный
# Вариант 1: просто удаляем ключ из входных данных, чтобы оставить старое значение
del draft_input["slug"]
# Вариант 2 (если нужно обновить): генерируем временный уникальный slug
# import uuid
# draft_input["slug"] = f"draft-{uuid.uuid4().hex[:8]}"
# Фильтруем входные данные, оставляя только разрешенные поля
allowed_fields = {
"layout", "author_ids", "topic_ids", "main_topic_id",
"media", "lead", "subtitle", "lang", "seo", "body",
"title", "slug", "cover", "cover_caption"
}
filtered_input = {k: v for k, v in draft_input.items() if k in allowed_fields}
with local_session() as session:
draft = session.query(Draft).filter(Draft.id == draft_id).first()
if not draft:
return {"error": "Draft not found"}
# Проверяем slug
if "slug" in filtered_input and not filtered_input["slug"]:
del filtered_input["slug"]
if "seo" not in draft_input and not draft.seo:
body_src = draft_input["body"] if "body" in draft_input else draft.body
body_text = trafilatura.extract(body_src)
lead_src = draft_input["lead"] if "lead" in draft_input else draft.lead
lead_text = trafilatura.extract(lead_src)
body_teaser = body_text[:300].split(". ")[:-1].join(".\n")
draft_input["seo"] = lead_text or body_teaser
# Обновляем связи с авторами если переданы
if "author_ids" in filtered_input:
author_ids = filtered_input.pop("author_ids")
if author_ids:
# Очищаем текущие связи
session.query(DraftAuthor).filter(DraftAuthor.shout == draft_id).delete()
# Добавляем новые связи
for aid in author_ids:
da = DraftAuthor(shout=draft_id, author=aid)
session.add(da)
Draft.update(draft, draft_input)
# Set updated_at and updated_by from the authenticated user
current_time = int(time.time())
draft.updated_at = current_time
draft.updated_by = author_id
# Обновляем связи с темами если переданы
if "topic_ids" in filtered_input:
topic_ids = filtered_input.pop("topic_ids")
main_topic_id = filtered_input.pop("main_topic_id", None)
if topic_ids:
# Очищаем текущие связи
session.query(DraftTopic).filter(DraftTopic.shout == draft_id).delete()
# Добавляем новые связи
for tid in topic_ids:
dt = DraftTopic(
shout=draft_id,
topic=tid,
main=(tid == main_topic_id) if main_topic_id else False
)
session.add(dt)
# Генерируем SEO если не предоставлено
if "seo" not in filtered_input and not draft.seo:
body_src = filtered_input.get("body", draft.body)
lead_src = filtered_input.get("lead", draft.lead)
body_html = wrap_html_fragment(body_src)
lead_html = wrap_html_fragment(lead_src)
try:
body_text = trafilatura.extract(body_html, include_comments=False, include_tables=False) if body_src else None
lead_text = trafilatura.extract(lead_html, include_comments=False, include_tables=False) if lead_src else None
body_teaser = generate_teaser(body_text, 300) if body_text else ""
filtered_input["seo"] = lead_text if lead_text else body_teaser
except Exception as e:
logger.warning(f"Failed to generate SEO for draft {draft_id}: {e}")
# Обновляем основные поля черновика
for key, value in filtered_input.items():
setattr(draft, key, value)
# Обновляем метаданные
draft.updated_at = int(time.time())
draft.updated_by = author_id
session.commit()
# Преобразуем объект в словарь для ответа
draft_dict = draft.dict()
draft_dict["topics"] = [topic.dict() for topic in draft.topics]
draft_dict["authors"] = [author.dict() for author in draft.authors]
# Добавляем объект автора в updated_by
draft_dict["updated_by"] = author_dict
return {"draft": draft_dict}
except Exception as e:
logger.error(f"Failed to update draft: {e}", exc_info=True)
return {"error": f"Failed to update draft: {str(e)}"}
session.commit()
return {"draft": draft}
@mutation.field("delete_draft")
@@ -340,136 +200,182 @@ async def delete_draft(_, info, draft_id: int):
return {"draft": draft}
def validate_html_content(html_content: str) -> tuple[bool, str]:
"""
Проверяет валидность HTML контента через trafilatura.
Args:
html_content: HTML строка для проверки
Returns:
tuple[bool, str]: (валидность, сообщение об ошибке)
Example:
>>> is_valid, error = validate_html_content("<p>Valid HTML</p>")
>>> is_valid
True
>>> error
''
>>> is_valid, error = validate_html_content("Invalid < HTML")
>>> is_valid
False
>>> 'Invalid HTML' in error
True
"""
if not html_content or not html_content.strip():
return False, "Content is empty"
try:
html_content = wrap_html_fragment(html_content)
extracted = trafilatura.extract(html_content)
if not extracted:
return False, "Invalid HTML structure or empty content"
return True, ""
except Exception as e:
logger.error(f"HTML validation error: {e}", exc_info=True)
return False, f"Invalid HTML content: {str(e)}"
@mutation.field("publish_draft")
@login_required
async def publish_draft(_, info, draft_id: int):
"""
Публикует черновик, создавая новый Shout или обновляя существующий.
user_id = info.context.get("user_id")
author_dict = info.context.get("author", {})
author_id = author_dict.get("id")
if not user_id or not author_id:
return {"error": "User ID and author ID are required"}
with local_session() as session:
draft = session.query(Draft).filter(Draft.id == draft_id).first()
if not draft:
return {"error": "Draft not found"}
shout = create_shout_from_draft(session, draft, author_id)
session.add(shout)
session.commit()
return {"shout": shout, "draft": draft}
@mutation.field("unpublish_draft")
@login_required
async def unpublish_draft(_, info, draft_id: int):
user_id = info.context.get("user_id")
author_dict = info.context.get("author", {})
author_id = author_dict.get("id")
if not user_id or not author_id:
return {"error": "User ID and author ID are required"}
with local_session() as session:
draft = session.query(Draft).filter(Draft.id == draft_id).first()
if not draft:
return {"error": "Draft not found"}
shout = session.query(Shout).filter(Shout.draft == draft.id).first()
if shout:
shout.published_at = None
session.commit()
return {"shout": shout, "draft": draft}
return {"error": "Failed to unpublish draft"}
@mutation.field("publish_shout")
@login_required
async def publish_shout(_, info, shout_id: int):
"""Publish draft as a shout or update existing shout.
Args:
draft_id (int): ID черновика для публикации
Returns:
dict: Результат публикации с shout или сообщением об ошибке
shout_id: ID существующей публикации или 0 для новой
draft: Объект черновика (опционально)
"""
user_id = info.context.get("user_id")
author_dict = info.context.get("author", {})
author_id = author_dict.get("id")
now = int(time.time())
if not user_id or not author_id:
return {"error": "Author ID is required"}
return {"error": "User ID and author ID are required"}
try:
with local_session() as session:
# Загружаем черновик со всеми связями
draft = (
session.query(Draft)
.options(
joinedload(Draft.topics),
joinedload(Draft.authors),
joinedload(Draft.publication)
)
.filter(Draft.id == draft_id)
.first()
)
shout = session.query(Shout).filter(Shout.id == shout_id).first()
if not shout:
return {"error": "Shout not found"}
was_published = shout.published_at is not None
draft = session.query(Draft).where(Draft.id == shout.draft).first()
if not draft:
return {"error": "Draft not found"}
# Находим черновик если не передан
# Проверка валидности HTML в body
is_valid, error = validate_html_content(draft.body)
if not is_valid:
return {"error": f"Cannot publish draft: {error}"}
# Проверяем, есть ли уже публикация для этого черновика
if draft.publication:
shout = draft.publication
# Обновляем существующую публикацию
for field in ["body", "title", "subtitle", "lead", "cover", "cover_caption", "media", "lang", "seo"]:
if hasattr(draft, field):
setattr(shout, field, getattr(draft, field))
shout.updated_at = int(time.time())
shout.updated_by = author_id
else:
# Создаем новую публикацию
if not shout:
shout = create_shout_from_draft(session, draft, author_id)
now = int(time.time())
shout.created_at = now
shout.published_at = now
session.add(shout)
session.flush() # Получаем ID нового шаута
else:
# Обновляем существующую публикацию
shout.draft = draft.id
shout.created_by = author_id
shout.title = draft.title
shout.subtitle = draft.subtitle
shout.body = draft.body
shout.cover = draft.cover
shout.cover_caption = draft.cover_caption
shout.lead = draft.lead
shout.layout = draft.layout
shout.media = draft.media
shout.lang = draft.lang
shout.seo = draft.seo
# Очищаем существующие связи
session.query(ShoutAuthor).filter(ShoutAuthor.shout == shout.id).delete()
session.query(ShoutTopic).filter(ShoutTopic.shout == shout.id).delete()
draft.updated_at = now
shout.updated_at = now
# Добавляем авторов
for author in (draft.authors or []):
sa = ShoutAuthor(shout=shout.id, author=author.id)
# Устанавливаем published_at только если была ранее снята с публикации
if not was_published:
shout.published_at = now
# Обрабатываем связи с авторами
if (
not session.query(ShoutAuthor)
.filter(and_(ShoutAuthor.shout == shout.id, ShoutAuthor.author == author_id))
.first()
):
sa = ShoutAuthor(shout=shout.id, author=author_id)
session.add(sa)
# Добавляем темы
for topic in (draft.topics or []):
st = ShoutTopic(
topic=topic.id,
shout=shout.id,
main=topic.main if hasattr(topic, "main") else False
)
session.add(st)
# Обрабатываем темы
if draft.topics:
for topic in draft.topics:
st = ShoutTopic(
topic=topic.id, shout=shout.id, main=topic.main if hasattr(topic, "main") else False
)
session.add(st)
session.add(shout)
session.add(draft)
session.flush()
# Инвалидируем кэш только если это новая публикация или была снята с публикации
if not was_published:
cache_keys = ["feed", f"author_{author_id}", "random_top", "unrated"]
# Добавляем ключи для тем
for topic in shout.topics:
cache_keys.append(f"topic_{topic.id}")
cache_keys.append(f"topic_shouts_{topic.id}")
await cache_by_id(Topic, topic.id, cache_topic)
# Инвалидируем кэш
await invalidate_shouts_cache(cache_keys)
await invalidate_shout_related_cache(shout, author_id)
# Обновляем кэш авторов
for author in shout.authors:
await cache_by_id(Author, author.id, cache_author)
# Отправляем уведомление о публикации
await notify_shout(shout.dict(), "published")
# Обновляем поисковый индекс
search_service.index(shout)
else:
# Для уже опубликованных материалов просто отправляем уведомление об обновлении
await notify_shout(shout.dict(), "update")
session.commit()
# Инвалидируем кеш
invalidate_shouts_cache()
invalidate_shout_related_cache(shout.id)
# Уведомляем о публикации
await notify_shout(shout.id)
# Обновляем поисковый индекс
search_service.index_shout(shout)
logger.info(f"Successfully published shout #{shout.id} from draft #{draft_id}")
logger.debug(f"Shout data: {shout.dict()}")
return {"shout": shout}
except Exception as e:
logger.error(f"Failed to publish draft {draft_id}: {e}", exc_info=True)
return {"error": f"Failed to publish draft: {str(e)}"}
logger.error(f"Failed to publish shout: {e}", exc_info=True)
if "session" in locals():
session.rollback()
return {"error": f"Failed to publish shout: {str(e)}"}
@mutation.field("unpublish_shout")
@login_required
async def unpublish_shout(_, info, shout_id: int):
"""Unpublish a shout.
Args:
shout_id: The ID of the shout to unpublish
Returns:
dict: The unpublished shout or an error message
"""
author_dict = info.context.get("author", {})
author_id = author_dict.get("id")
if not author_id:
return {"error": "Author ID is required"}
shout = None
with local_session() as session:
try:
shout = session.query(Shout).filter(Shout.id == shout_id).first()
shout.published_at = None
session.commit()
invalidate_shout_related_cache(shout)
invalidate_shouts_cache()
except Exception:
session.rollback()
return {"error": "Failed to unpublish shout"}
return {"shout": shout}

View File

@@ -3,7 +3,7 @@ import time
import orjson
import trafilatura
from sqlalchemy import and_, desc, select
from sqlalchemy.orm import joinedload, selectinload
from sqlalchemy.orm import joinedload
from sqlalchemy.sql.functions import coalesce
from cache.cache import (
@@ -13,7 +13,6 @@ from cache.cache import (
invalidate_shouts_cache,
)
from orm.author import Author
from orm.draft import Draft
from orm.shout import Shout, ShoutAuthor, ShoutTopic
from orm.topic import Topic
from resolvers.follower import follow, unfollow
@@ -21,9 +20,8 @@ from resolvers.stat import get_with_stat
from services.auth import login_required
from services.db import local_session
from services.notify import notify_shout
from services.schema import mutation, query
from services.schema import query
from services.search import search_service
from utils.html_wrapper import wrap_html_fragment
from utils.logger import root_logger as logger
@@ -181,11 +179,9 @@ async def create_shout(_, info, inp):
# Создаем публикацию без topics
body = inp.get("body", "")
lead = inp.get("lead", "")
body_html = wrap_html_fragment(body)
lead_html = wrap_html_fragment(lead)
body_text = trafilatura.extract(body_html)
lead_text = trafilatura.extract(lead_html)
seo = inp.get("seo", lead_text.strip() or body_text.strip()[:300].split(". ")[:-1].join(". "))
body_text = trafilatura.extract(body)
lead_text = trafilatura.extract(lead)
seo = inp.get("seo", lead_text or body_text[:300].split(". ")[:-1].join(". "))
new_shout = Shout(
slug=slug,
body=body,
@@ -649,178 +645,39 @@ def get_main_topic(topics):
"""Get the main topic from a list of ShoutTopic objects."""
logger.info(f"Starting get_main_topic with {len(topics) if topics else 0} topics")
logger.debug(
f"Topics data: {[(t.slug, getattr(t, 'main', False)) for t in topics] if topics else []}"
f"Topics data: {[(t.topic.slug if t.topic else 'no-topic', t.main) for t in topics] if topics else []}"
)
if not topics:
logger.warning("No topics provided to get_main_topic")
return {"id": 0, "title": "no topic", "slug": "notopic", "is_main": True}
# Проверяем, является ли topics списком объектов ShoutTopic или Topic
if hasattr(topics[0], 'topic') and topics[0].topic:
# Для ShoutTopic объектов (старый формат)
# Find first main topic in original order
main_topic_rel = next((st for st in topics if getattr(st, 'main', False)), None)
logger.debug(
f"Found main topic relation: {main_topic_rel.topic.slug if main_topic_rel and main_topic_rel.topic else None}"
)
# Find first main topic in original order
main_topic_rel = next((st for st in topics if st.main), None)
logger.debug(
f"Found main topic relation: {main_topic_rel.topic.slug if main_topic_rel and main_topic_rel.topic else None}"
)
if main_topic_rel and main_topic_rel.topic:
result = {
"slug": main_topic_rel.topic.slug,
"title": main_topic_rel.topic.title,
"id": main_topic_rel.topic.id,
"is_main": True,
}
logger.info(f"Returning main topic: {result}")
return result
if main_topic_rel and main_topic_rel.topic:
result = {
"slug": main_topic_rel.topic.slug,
"title": main_topic_rel.topic.title,
"id": main_topic_rel.topic.id,
"is_main": True,
}
logger.info(f"Returning main topic: {result}")
return result
# If no main found but topics exist, return first
if topics and topics[0].topic:
logger.info(f"No main topic found, using first topic: {topics[0].topic.slug}")
result = {
"slug": topics[0].topic.slug,
"title": topics[0].topic.title,
"id": topics[0].topic.id,
"is_main": True,
}
return result
else:
# Для Topic объектов (новый формат из selectinload)
# После смены на selectinload у нас просто список Topic объектов
if topics:
logger.info(f"Using first topic as main: {topics[0].slug}")
result = {
"slug": topics[0].slug,
"title": topics[0].title,
"id": topics[0].id,
"is_main": True,
}
return result
# If no main found but topics exist, return first
if topics and topics[0].topic:
logger.info(f"No main topic found, using first topic: {topics[0].topic.slug}")
result = {
"slug": topics[0].topic.slug,
"title": topics[0].topic.title,
"id": topics[0].topic.id,
"is_main": True,
}
return result
logger.warning("No valid topics found, returning default")
return {"slug": "notopic", "title": "no topic", "id": 0, "is_main": True}
@mutation.field("unpublish_shout")
@login_required
async def unpublish_shout(_, info, shout_id: int):
"""Снимает публикацию (shout) с публикации.
Предзагружает связанный черновик (draft) и его авторов/темы, чтобы избежать
ошибок при последующем доступе к ним в GraphQL.
Args:
shout_id: ID публикации для снятия с публикации
Returns:
dict: Снятая с публикации публикация или сообщение об ошибке
"""
author_dict = info.context.get("author", {})
author_id = author_dict.get("id")
if not author_id:
# В идеале нужна проверка прав, имеет ли автор право снимать публикацию
return {"error": "Author ID is required"}
shout = None
with local_session() as session:
try:
# Загружаем Shout со всеми связями для правильного формирования ответа
shout = (
session.query(Shout)
.options(
joinedload(Shout.authors),
selectinload(Shout.topics)
)
.filter(Shout.id == shout_id)
.first()
)
if not shout:
logger.warning(f"Shout not found for unpublish: ID {shout_id}")
return {"error": "Shout not found"}
# Если у публикации есть связанный черновик, загружаем его с relationships
if shout.draft:
# Отдельно загружаем черновик с его связями
draft = (
session.query(Draft)
.options(
selectinload(Draft.authors),
selectinload(Draft.topics)
)
.filter(Draft.id == shout.draft)
.first()
)
# Связываем черновик с публикацией вручную для доступа через API
if draft:
shout.draft_obj = draft
# TODO: Добавить проверку прав доступа, если необходимо
# if author_id not in [a.id for a in shout.authors]: # Требует selectinload(Shout.authors) выше
# logger.warning(f"Author {author_id} denied unpublishing shout {shout_id}")
# return {"error": "Access denied"}
# Запоминаем старый slug и id для формирования поля publication
shout_slug = shout.slug
shout_id_for_publication = shout.id
# Снимаем с публикации (устанавливаем published_at в None)
shout.published_at = None
session.commit()
# Формируем полноценный словарь для ответа
shout_dict = shout.dict()
# Добавляем связанные данные
shout_dict["topics"] = (
[
{"id": topic.id, "slug": topic.slug, "title": topic.title}
for topic in shout.topics
]
if shout.topics
else []
)
# Добавляем main_topic
shout_dict["main_topic"] = get_main_topic(shout.topics)
# Добавляем авторов
shout_dict["authors"] = (
[
{"id": author.id, "name": author.name, "slug": author.slug}
for author in shout.authors
]
if shout.authors
else []
)
# Важно! Обновляем поле publication, отражая состояние "снят с публикации"
shout_dict["publication"] = {
"id": shout_id_for_publication,
"slug": shout_slug,
"published_at": None # Ключевое изменение - устанавливаем published_at в None
}
# Инвалидация кэша
try:
cache_keys = [
"feed", # лента
f"author_{author_id}", # публикации автора
"random_top", # случайные топовые
"unrated", # неоцененные
]
await invalidate_shout_related_cache(shout, author_id)
await invalidate_shouts_cache(cache_keys)
logger.info(f"Cache invalidated after unpublishing shout {shout_id}")
except Exception as cache_err:
logger.error(f"Failed to invalidate cache for unpublish shout {shout_id}: {cache_err}")
except Exception as e:
session.rollback()
logger.error(f"Failed to unpublish shout {shout_id}: {e}", exc_info=True)
return {"error": f"Failed to unpublish shout: {str(e)}"}
# Возвращаем сформированный словарь вместо объекта
logger.info(f"Shout {shout_id} unpublished successfully by author {author_id}")
return {"shout": shout_dict}

View File

@@ -0,0 +1,25 @@
{
"include": [
"."
],
"exclude": [
"**/node_modules",
"**/__pycache__",
"**/.*"
],
"defineConstant": {
"DEBUG": true
},
"venvPath": ".",
"venv": ".venv",
"pythonVersion": "3.11",
"typeCheckingMode": "strict",
"reportMissingImports": true,
"reportMissingTypeStubs": false,
"reportUnknownMemberType": false,
"reportUnknownParameterType": false,
"reportUnknownVariableType": false,
"reportUnknownArgumentType": false,
"reportPrivateUsage": false,
"reportUntypedFunctionDecorator": false
}

View File

@@ -487,16 +487,12 @@ def apply_reaction_filters(by, q):
shout_slug = by.get("shout")
if shout_slug:
q = q.filter(Shout.slug == shout_slug)
shout_id = by.get("shout_id")
if shout_id:
q = q.filter(Shout.id == shout_id)
shouts = by.get("shouts")
if shouts:
q = q.filter(Shout.slug.in_(shouts))
created_by = by.get("created_by", by.get("author_id"))
created_by = by.get("created_by")
if created_by:
q = q.filter(Author.id == created_by)

View File

@@ -10,7 +10,7 @@ from orm.shout import Shout, ShoutAuthor, ShoutTopic
from orm.topic import Topic
from services.db import json_array_builder, json_builder, local_session
from services.schema import query
from services.search import search_text
from services.search import search_text, get_search_count
from services.viewed import ViewedStorage
from utils.logger import root_logger as logger
@@ -187,12 +187,10 @@ def get_shouts_with_links(info, q, limit=20, offset=0):
"""
shouts = []
try:
# logger.info(f"Starting get_shouts_with_links with limit={limit}, offset={offset}")
q = q.limit(limit).offset(offset)
with local_session() as session:
shouts_result = session.execute(q).all()
# logger.info(f"Got {len(shouts_result) if shouts_result else 0} shouts from query")
if not shouts_result:
logger.warning("No shouts found in query result")
@@ -203,7 +201,6 @@ def get_shouts_with_links(info, q, limit=20, offset=0):
shout = None
if hasattr(row, "Shout"):
shout = row.Shout
# logger.debug(f"Processing shout#{shout.id} at index {idx}")
if shout:
shout_id = int(f"{shout.id}")
shout_dict = shout.dict()
@@ -231,20 +228,16 @@ def get_shouts_with_links(info, q, limit=20, offset=0):
topics = None
if has_field(info, "topics") and hasattr(row, "topics"):
topics = orjson.loads(row.topics) if isinstance(row.topics, str) else row.topics
# logger.debug(f"Shout#{shout_id} topics: {topics}")
shout_dict["topics"] = topics
if has_field(info, "main_topic"):
main_topic = None
if hasattr(row, "main_topic"):
# logger.debug(f"Raw main_topic for shout#{shout_id}: {row.main_topic}")
main_topic = (
orjson.loads(row.main_topic) if isinstance(row.main_topic, str) else row.main_topic
)
# logger.debug(f"Parsed main_topic for shout#{shout_id}: {main_topic}")
if not main_topic and topics and len(topics) > 0:
# logger.info(f"No main_topic found for shout#{shout_id}, using first topic from list")
main_topic = {
"id": topics[0]["id"],
"title": topics[0]["title"],
@@ -252,10 +245,8 @@ def get_shouts_with_links(info, q, limit=20, offset=0):
"is_main": True,
}
elif not main_topic:
logger.warning(f"No main_topic and no topics found for shout#{shout_id}")
main_topic = {"id": 0, "title": "no topic", "slug": "notopic", "is_main": True}
shout_dict["main_topic"] = main_topic
# logger.debug(f"Final main_topic for shout#{shout_id}: {main_topic}")
if has_field(info, "authors") and hasattr(row, "authors"):
shout_dict["authors"] = (
@@ -282,7 +273,6 @@ def get_shouts_with_links(info, q, limit=20, offset=0):
logger.error(f"Fatal error in get_shouts_with_links: {e}", exc_info=True)
raise
finally:
logger.info(f"Returning {len(shouts)} shouts from get_shouts_with_links")
return shouts
@@ -401,8 +391,17 @@ async def load_shouts_search(_, info, text, options):
"""
limit = options.get("limit", 10)
offset = options.get("offset", 0)
if isinstance(text, str) and len(text) > 2:
# Get search results with pagination
results = await search_text(text, limit, offset)
# If no results, return empty list
if not results:
logger.info(f"No search results found for '{text}'")
return []
# Extract IDs and scores
scores = {}
hits_ids = []
for sr in results:
@@ -412,22 +411,42 @@ async def load_shouts_search(_, info, text, options):
scores[shout_id] = sr.get("score")
hits_ids.append(shout_id)
q = (
query_with_stat(info)
if has_field(info, "stat")
else select(Shout).filter(and_(Shout.published_at.is_not(None), Shout.deleted_at.is_(None)))
)
# Query DB for only the IDs in the current page
q = query_with_stat(info)
q = q.filter(Shout.id.in_(hits_ids))
q = apply_filters(q, options)
q = apply_sorting(q, options)
shouts = get_shouts_with_links(info, q, limit, offset)
q = apply_filters(q, options.get("filters", {}))
#
shouts = get_shouts_with_links(info, q, len(hits_ids), 0)
# Add scores from search results
for shout in shouts:
shout.score = scores[f"{shout.id}"]
shouts.sort(key=lambda x: x.score, reverse=True)
shout_id = str(shout['id'])
shout["score"] = scores.get(shout_id, 0)
# Re-sort by search score to maintain ranking
shouts.sort(key=lambda x: scores.get(str(x['id']), 0), reverse=True)
return shouts
return []
@query.field("get_search_results_count")
async def get_search_results_count(_, info, text):
"""
Returns the total count of search results for a search query.
:param _: Root query object (unused)
:param info: GraphQL context information
:param text: Search query text
:return: Total count of results
"""
if isinstance(text, str) and len(text) > 2:
count = await get_search_count(text)
return {"count": count}
return {"count": 0}
@query.field("load_shouts_unrated")
async def load_shouts_unrated(_, info, options):
"""

View File

@@ -10,7 +10,6 @@ from cache.cache import (
)
from orm.author import Author
from orm.topic import Topic
from orm.reaction import ReactionKind
from resolvers.stat import get_with_stat
from services.auth import login_required
from services.db import local_session
@@ -113,7 +112,7 @@ async def get_topics_with_stats(limit=100, offset=0, community_id=None, by=None)
shouts_stats_query = f"""
SELECT st.topic, COUNT(DISTINCT s.id) as shouts_count
FROM shout_topic st
JOIN shout s ON st.shout = s.id AND s.deleted_at IS NULL AND s.published_at IS NOT NULL
JOIN shout s ON st.shout = s.id AND s.deleted_at IS NULL
WHERE st.topic IN ({",".join(map(str, topic_ids))})
GROUP BY st.topic
"""
@@ -122,7 +121,7 @@ async def get_topics_with_stats(limit=100, offset=0, community_id=None, by=None)
# Запрос на получение статистики по подписчикам для выбранных тем
followers_stats_query = f"""
SELECT topic, COUNT(DISTINCT follower) as followers_count
FROM topic_followers tf
FROM topic_followers
WHERE topic IN ({",".join(map(str, topic_ids))})
GROUP BY topic
"""
@@ -144,8 +143,7 @@ async def get_topics_with_stats(limit=100, offset=0, community_id=None, by=None)
SELECT st.topic, COUNT(DISTINCT r.id) as comments_count
FROM shout_topic st
JOIN shout s ON st.shout = s.id AND s.deleted_at IS NULL AND s.published_at IS NOT NULL
JOIN reaction r ON r.shout = s.id AND r.kind = '{ReactionKind.COMMENT.value}' AND r.deleted_at IS NULL
JOIN author a ON r.created_by = a.id AND a.deleted_at IS NULL
JOIN reaction r ON r.shout = s.id
WHERE st.topic IN ({",".join(map(str, topic_ids))})
GROUP BY st.topic
"""

View File

@@ -92,14 +92,12 @@ input LoadShoutsOptions {
input ReactionBy {
shout: String
shout_id: Int
shouts: [String]
search: String
kinds: [ReactionKind]
reply_to: Int # filter
topic: String
created_by: Int
author_id: Int
author: String
after: Int
sort: ReactionSort # sort

View File

@@ -33,6 +33,7 @@ type Query {
get_shout(slug: String, shout_id: Int): Shout
load_shouts_by(options: LoadShoutsOptions): [Shout]
load_shouts_search(text: String!, options: LoadShoutsOptions): [SearchResult]
get_search_results_count(text: String!): CountResult!
load_shouts_bookmarked(options: LoadShoutsOptions): [Shout]
# rating

View File

@@ -107,12 +107,6 @@ type Shout {
score: Float
}
type PublicationInfo {
id: Int!
slug: String!
published_at: Int
}
type Draft {
id: Int!
created_at: Int!
@@ -135,9 +129,9 @@ type Draft {
deleted_at: Int
updated_by: Author
deleted_by: Author
authors: [Author]!
topics: [Topic]!
publication: PublicationInfo
authors: [Author]
topics: [Topic]
}
type Stat {
@@ -213,6 +207,7 @@ type CommonResult {
}
type SearchResult {
id: Int!
slug: String!
title: String!
cover: String
@@ -280,3 +275,7 @@ type MyRateComment {
my_rate: ReactionKind
}
type CountResult {
count: Int!
}

34
server.py Normal file
View File

@@ -0,0 +1,34 @@
import sys
from pathlib import Path
from granian.constants import Interfaces
from granian.log import LogLevels
from granian.server import Server
from settings import PORT
from utils.logger import root_logger as logger
if __name__ == "__main__":
logger.info("started")
try:
granian_instance = Server(
"main:app",
address="0.0.0.0",
port=PORT,
interface=Interfaces.ASGI,
workers=1,
websockets=False,
log_level=LogLevels.debug,
backlog=2048,
)
if "dev" in sys.argv:
logger.info("dev mode, building ssl context")
granian_instance.build_ssl_context(cert=Path("localhost.pem"), key=Path("localhost-key.pem"), password=None)
granian_instance.serve()
except Exception as error:
logger.error(error, exc_info=True)
raise
finally:
logger.info("stopped")

View File

@@ -1 +0,0 @@
# This file makes services a Python package

View File

@@ -259,3 +259,27 @@ def get_json_builder():
# Используем их в коде
json_builder, json_array_builder, json_cast = get_json_builder()
async def fetch_all_shouts(session=None):
"""Fetch all published shouts for search indexing"""
from orm.shout import Shout
close_session = False
if session is None:
session = local_session()
close_session = True
try:
# Fetch only published and non-deleted shouts
query = session.query(Shout).filter(
Shout.published_at.is_not(None),
Shout.deleted_at.is_(None)
)
shouts = query.all()
return shouts
except Exception as e:
logger.error(f"Error fetching shouts for search indexing: {e}")
return []
finally:
if close_session:
session.close()

View File

@@ -53,71 +53,3 @@ async def notify_follower(follower: dict, author_id: int, action: str = "follow"
except Exception as e:
# Log the error and re-raise it
logger.error(f"Failed to publish to channel {channel_name}: {e}")
async def notify_draft(draft_data, action: str = "publish"):
"""
Отправляет уведомление о публикации или обновлении черновика.
Функция гарантирует, что данные черновика сериализуются корректно, включая
связанные атрибуты (topics, authors).
Args:
draft_data (dict): Словарь с данными черновика. Должен содержать минимум id и title
action (str, optional): Действие ("publish", "update"). По умолчанию "publish"
Returns:
None
Examples:
>>> draft = {"id": 1, "title": "Тестовый черновик", "slug": "test-draft"}
>>> await notify_draft(draft, "publish")
"""
channel_name = "draft"
try:
# Убеждаемся, что все необходимые данные присутствуют
# и объект не требует доступа к отсоединенным атрибутам
if isinstance(draft_data, dict):
draft_payload = draft_data
else:
# Если это ORM объект, преобразуем его в словарь с нужными атрибутами
draft_payload = {
"id": getattr(draft_data, "id", None),
"slug": getattr(draft_data, "slug", None),
"title": getattr(draft_data, "title", None),
"subtitle": getattr(draft_data, "subtitle", None),
"media": getattr(draft_data, "media", None),
"created_at": getattr(draft_data, "created_at", None),
"updated_at": getattr(draft_data, "updated_at", None),
}
# Если переданы связанные атрибуты, добавим их
if hasattr(draft_data, "topics") and draft_data.topics is not None:
draft_payload["topics"] = [
{"id": t.id, "name": t.name, "slug": t.slug}
for t in draft_data.topics
]
if hasattr(draft_data, "authors") and draft_data.authors is not None:
draft_payload["authors"] = [
{
"id": a.id,
"name": a.name,
"slug": a.slug,
"pic": getattr(a, "pic", None),
}
for a in draft_data.authors
]
data = {"payload": draft_payload, "action": action}
# Сохраняем уведомление
save_notification(action, channel_name, data.get("payload"))
# Публикуем в Redis
json_data = orjson.dumps(data)
if json_data:
await redis.publish(channel_name, json_data)
except Exception as e:
logger.error(f"Failed to publish to channel {channel_name}: {e}")

View File

@@ -1,19 +0,0 @@
{
"include": ["."],
"exclude": ["**/node_modules", "**/__pycache__", "**/.*"],
"defineConstant": {
"DEBUG": true
},
"venvPath": ".",
"venv": ".venv",
"pythonVersion": "3.11",
"typeCheckingMode": "strict",
"reportMissingImports": true,
"reportMissingTypeStubs": false,
"reportUnknownMemberType": false,
"reportUnknownParameterType": false,
"reportUnknownVariableType": false,
"reportUnknownArgumentType": false,
"reportPrivateUsage": false,
"reportUntypedFunctionDecorator": false
}

View File

@@ -1,15 +1,14 @@
from asyncio.log import logger
import httpx
from ariadne import MutationType, ObjectType, QueryType
from ariadne import MutationType, QueryType
from services.db import create_table_if_not_exists, local_session
from settings import AUTH_URL
query = QueryType()
mutation = MutationType()
type_draft = ObjectType("Draft")
resolvers = [query, mutation, type_draft]
resolvers = [query, mutation]
async def request_graphql_data(gql, url=AUTH_URL, headers=None):

View File

@@ -2,231 +2,683 @@ import asyncio
import json
import logging
import os
import httpx
import time
from collections import defaultdict
from datetime import datetime, timedelta
import orjson
from opensearchpy import OpenSearch
from services.redis import redis
from utils.encoders import CustomJSONEncoder
# Set redis logging level to suppress DEBUG messages
# Set up proper logging
logger = logging.getLogger("search")
logger.setLevel(logging.WARNING)
logger.setLevel(logging.INFO) # Change to INFO to see more details
ELASTIC_HOST = os.environ.get("ELASTIC_HOST", "").replace("https://", "")
ELASTIC_USER = os.environ.get("ELASTIC_USER", "")
ELASTIC_PASSWORD = os.environ.get("ELASTIC_PASSWORD", "")
ELASTIC_PORT = os.environ.get("ELASTIC_PORT", 9200)
ELASTIC_URL = os.environ.get(
"ELASTIC_URL",
f"https://{ELASTIC_USER}:{ELASTIC_PASSWORD}@{ELASTIC_HOST}:{ELASTIC_PORT}",
)
REDIS_TTL = 86400 # 1 день в секундах
# Configuration for search service
SEARCH_ENABLED = bool(os.environ.get("SEARCH_ENABLED", "true").lower() in ["true", "1", "yes"])
TXTAI_SERVICE_URL = os.environ.get("TXTAI_SERVICE_URL", "none")
MAX_BATCH_SIZE = int(os.environ.get("SEARCH_MAX_BATCH_SIZE", "25"))
index_settings = {
"settings": {
"index": {"number_of_shards": 1, "auto_expand_replicas": "0-all"},
"analysis": {
"analyzer": {
"ru": {
"tokenizer": "standard",
"filter": ["lowercase", "ru_stop", "ru_stemmer"],
}
},
"filter": {
"ru_stemmer": {"type": "stemmer", "language": "russian"},
"ru_stop": {"type": "stop", "stopwords": "_russian_"},
},
},
},
"mappings": {
"properties": {
"body": {"type": "text", "analyzer": "ru"},
"title": {"type": "text", "analyzer": "ru"},
"subtitle": {"type": "text", "analyzer": "ru"},
"lead": {"type": "text", "analyzer": "ru"},
"media": {"type": "text", "analyzer": "ru"},
}
},
}
# Search cache configuration
SEARCH_CACHE_ENABLED = bool(os.environ.get("SEARCH_CACHE_ENABLED", "true").lower() in ["true", "1", "yes"])
SEARCH_CACHE_TTL_SECONDS = int(os.environ.get("SEARCH_CACHE_TTL_SECONDS", "900")) # Default: 15 minutes
SEARCH_MIN_SCORE = float(os.environ.get("SEARCH_MIN_SCORE", "0.1"))
SEARCH_PREFETCH_SIZE = int(os.environ.get("SEARCH_PREFETCH_SIZE", "200"))
SEARCH_USE_REDIS = bool(os.environ.get("SEARCH_USE_REDIS", "true").lower() in ["true", "1", "yes"])
expected_mapping = index_settings["mappings"]
search_offset = 0
# Создание цикла событий
search_loop = asyncio.get_event_loop()
# Import Redis client if Redis caching is enabled
if SEARCH_USE_REDIS:
try:
from services.redis import redis
logger.info("Redis client imported for search caching")
except ImportError:
logger.warning("Redis client import failed, falling back to memory cache")
SEARCH_USE_REDIS = False
# В начале файла добавим флаг
SEARCH_ENABLED = bool(os.environ.get("ELASTIC_HOST", ""))
def get_indices_stats():
indices_stats = search_service.client.cat.indices(format="json")
for index_info in indices_stats:
index_name = index_info["index"]
if not index_name.startswith("."):
index_health = index_info["health"]
index_status = index_info["status"]
pri_shards = index_info["pri"]
rep_shards = index_info["rep"]
docs_count = index_info["docs.count"]
docs_deleted = index_info["docs.deleted"]
store_size = index_info["store.size"]
pri_store_size = index_info["pri.store.size"]
logger.info(f"Index: {index_name}")
logger.info(f"Health: {index_health}")
logger.info(f"Status: {index_status}")
logger.info(f"Primary Shards: {pri_shards}")
logger.info(f"Replica Shards: {rep_shards}")
logger.info(f"Documents Count: {docs_count}")
logger.info(f"Deleted Documents: {docs_deleted}")
logger.info(f"Store Size: {store_size}")
logger.info(f"Primary Store Size: {pri_store_size}")
class SearchCache:
"""Cache for search results to enable efficient pagination"""
def __init__(self, ttl_seconds=SEARCH_CACHE_TTL_SECONDS, max_items=100):
self.cache = {} # Maps search query to list of results
self.last_accessed = {} # Maps search query to last access timestamp
self.ttl = ttl_seconds
self.max_items = max_items
self._redis_prefix = "search_cache:"
async def store(self, query, results):
"""Store search results for a query"""
normalized_query = self._normalize_query(query)
if SEARCH_USE_REDIS:
try:
serialized_results = json.dumps(results)
await redis.set(
f"{self._redis_prefix}{normalized_query}",
serialized_results,
ex=self.ttl
)
logger.info(f"Stored {len(results)} search results for query '{query}' in Redis")
return True
except Exception as e:
logger.error(f"Error storing search results in Redis: {e}")
# Fall back to memory cache if Redis fails
# First cleanup if needed for memory cache
if len(self.cache) >= self.max_items:
self._cleanup()
# Store results and update timestamp
self.cache[normalized_query] = results
self.last_accessed[normalized_query] = time.time()
logger.info(f"Cached {len(results)} search results for query '{query}' in memory")
return True
async def get(self, query, limit=10, offset=0):
"""Get paginated results for a query"""
normalized_query = self._normalize_query(query)
all_results = None
# Try to get from Redis first
if SEARCH_USE_REDIS:
try:
cached_data = await redis.get(f"{self._redis_prefix}{normalized_query}")
if cached_data:
all_results = json.loads(cached_data)
logger.info(f"Retrieved search results for '{query}' from Redis")
except Exception as e:
logger.error(f"Error retrieving search results from Redis: {e}")
# Fall back to memory cache if not in Redis
if all_results is None and normalized_query in self.cache:
all_results = self.cache[normalized_query]
self.last_accessed[normalized_query] = time.time()
logger.info(f"Retrieved search results for '{query}' from memory cache")
# If not found in any cache
if all_results is None:
logger.info(f"Cache miss for query '{query}'")
return None
# Return paginated subset
end_idx = min(offset + limit, len(all_results))
if offset >= len(all_results):
logger.warning(f"Requested offset {offset} exceeds result count {len(all_results)}")
return []
logger.info(f"Cache hit for '{query}': serving {offset}:{end_idx} of {len(all_results)} results")
return all_results[offset:end_idx]
async def has_query(self, query):
"""Check if query exists in cache"""
normalized_query = self._normalize_query(query)
# Check Redis first
if SEARCH_USE_REDIS:
try:
exists = await redis.get(f"{self._redis_prefix}{normalized_query}")
if exists:
return True
except Exception as e:
logger.error(f"Error checking Redis for query existence: {e}")
# Fall back to memory cache
return normalized_query in self.cache
async def get_total_count(self, query):
"""Get total count of results for a query"""
normalized_query = self._normalize_query(query)
# Check Redis first
if SEARCH_USE_REDIS:
try:
cached_data = await redis.get(f"{self._redis_prefix}{normalized_query}")
if cached_data:
all_results = json.loads(cached_data)
return len(all_results)
except Exception as e:
logger.error(f"Error getting result count from Redis: {e}")
# Fall back to memory cache
if normalized_query in self.cache:
return len(self.cache[normalized_query])
return 0
def _normalize_query(self, query):
"""Normalize query string for cache key"""
if not query:
return ""
# Simple normalization - lowercase and strip whitespace
return query.lower().strip()
def _cleanup(self):
"""Remove oldest entries if memory cache is full"""
now = time.time()
# First remove expired entries
expired_keys = [
key for key, last_access in self.last_accessed.items()
if now - last_access > self.ttl
]
for key in expired_keys:
if key in self.cache:
del self.cache[key]
if key in self.last_accessed:
del self.last_accessed[key]
logger.info(f"Cleaned up {len(expired_keys)} expired search cache entries")
# If still above max size, remove oldest entries
if len(self.cache) >= self.max_items:
# Sort by last access time
sorted_items = sorted(self.last_accessed.items(), key=lambda x: x[1])
# Remove oldest 20%
remove_count = max(1, int(len(sorted_items) * 0.2))
for key, _ in sorted_items[:remove_count]:
if key in self.cache:
del self.cache[key]
if key in self.last_accessed:
del self.last_accessed[key]
logger.info(f"Removed {remove_count} oldest search cache entries")
class SearchService:
def __init__(self, index_name="search_index"):
logger.info("Инициализируем поиск...")
self.index_name = index_name
self.client = None
self.lock = asyncio.Lock()
# Инициализация клиента OpenSearch только если поиск включен
if SEARCH_ENABLED:
try:
self.client = OpenSearch(
hosts=[{"host": ELASTIC_HOST, "port": ELASTIC_PORT}],
http_compress=True,
http_auth=(ELASTIC_USER, ELASTIC_PASSWORD),
use_ssl=True,
verify_certs=False,
ssl_assert_hostname=False,
ssl_show_warn=False,
)
logger.info("Клиент OpenSearch.org подключен")
search_loop.create_task(self.check_index())
except Exception as exc:
logger.warning(f"Поиск отключен из-за ошибки подключения: {exc}")
self.client = None
else:
logger.info("Поиск отключен (ELASTIC_HOST не установлен)")
def __init__(self):
logger.info(f"Initializing search service with URL: {TXTAI_SERVICE_URL}")
self.available = SEARCH_ENABLED
# Use different timeout settings for indexing and search requests
self.client = httpx.AsyncClient(timeout=30.0, base_url=TXTAI_SERVICE_URL)
self.index_client = httpx.AsyncClient(timeout=120.0, base_url=TXTAI_SERVICE_URL)
# Initialize search cache
self.cache = SearchCache() if SEARCH_CACHE_ENABLED else None
if not self.available:
logger.info("Search disabled (SEARCH_ENABLED = False)")
if SEARCH_CACHE_ENABLED:
cache_location = "Redis" if SEARCH_USE_REDIS else "Memory"
logger.info(f"Search caching enabled using {cache_location} cache with TTL={SEARCH_CACHE_TTL_SECONDS}s")
logger.info(f"Minimum score filter: {SEARCH_MIN_SCORE}, prefetch size: {SEARCH_PREFETCH_SIZE}")
async def info(self):
if not SEARCH_ENABLED:
"""Return information about search service"""
if not self.available:
return {"status": "disabled"}
try:
return get_indices_stats()
response = await self.client.get("/info")
response.raise_for_status()
result = response.json()
logger.info(f"Search service info: {result}")
return result
except Exception as e:
logger.error(f"Failed to get search info: {e}")
return {"status": "error", "message": str(e)}
def delete_index(self):
if self.client:
logger.warning(f"[!!!] Удаляем индекс {self.index_name}")
self.client.indices.delete(index=self.index_name, ignore_unavailable=True)
def create_index(self):
if self.client:
logger.info(f"Создается индекс: {self.index_name}")
self.client.indices.create(index=self.index_name, body=index_settings)
logger.info(f"Индекс {self.index_name} создан")
async def check_index(self):
if self.client:
logger.info(f"Проверяем индекс {self.index_name}...")
if not self.client.indices.exists(index=self.index_name):
self.create_index()
self.client.indices.put_mapping(index=self.index_name, body=expected_mapping)
else:
logger.info(f"Найден существующий индекс {self.index_name}")
# Проверка и обновление структуры индекса, если необходимо
result = self.client.indices.get_mapping(index=self.index_name)
if isinstance(result, str):
result = orjson.loads(result)
if isinstance(result, dict):
mapping = result.get(self.index_name, {}).get("mappings")
logger.info(f"Найдена структура индексации: {mapping['properties'].keys()}")
expected_keys = expected_mapping["properties"].keys()
if mapping and mapping["properties"].keys() != expected_keys:
logger.info(f"Ожидаемая структура индексации: {expected_mapping}")
logger.warning("[!!!] Требуется переиндексация всех данных")
self.delete_index()
self.client = None
else:
logger.error("клиент не инициализован, невозможно проверить индекс")
def is_ready(self):
"""Check if service is available"""
return self.available
async def verify_docs(self, doc_ids):
"""Verify which documents exist in the search index"""
if not self.available:
return {"status": "disabled"}
try:
logger.info(f"Verifying {len(doc_ids)} documents in search index")
response = await self.client.post(
"/verify-docs",
json={"doc_ids": doc_ids},
timeout=60.0 # Longer timeout for potentially large ID lists
)
response.raise_for_status()
result = response.json()
# Log summary of verification results
missing_count = len(result.get("missing", []))
logger.info(f"Document verification complete: {missing_count} missing out of {len(doc_ids)} total")
return result
except Exception as e:
logger.error(f"Document verification error: {e}")
return {"status": "error", "message": str(e)}
def index(self, shout):
if not SEARCH_ENABLED:
"""Index a single document"""
if not self.available:
return
logger.info(f"Indexing post {shout.id}")
# Start in background to not block
asyncio.create_task(self.perform_index(shout))
if self.client:
logger.info(f"Индексируем пост {shout.id}")
index_body = {
"body": shout.body,
"title": shout.title,
"subtitle": shout.subtitle,
"lead": shout.lead,
"media": shout.media,
}
asyncio.create_task(self.perform_index(shout, index_body))
async def perform_index(self, shout):
"""Actually perform the indexing operation"""
if not self.available:
return
try:
# Combine all text fields
text = " ".join(filter(None, [
shout.title or "",
shout.subtitle or "",
shout.lead or "",
shout.body or "",
shout.media or ""
]))
if not text.strip():
logger.warning(f"No text content to index for shout {shout.id}")
return
logger.info(f"Indexing document: ID={shout.id}, Text length={len(text)}")
# Send to txtai service
response = await self.client.post(
"/index",
json={"id": str(shout.id), "text": text}
)
response.raise_for_status()
result = response.json()
logger.info(f"Post {shout.id} successfully indexed: {result}")
except Exception as e:
logger.error(f"Indexing error for shout {shout.id}: {e}")
async def perform_index(self, shout, index_body):
if self.client:
async def bulk_index(self, shouts):
"""Index multiple documents at once with adaptive batch sizing"""
if not self.available or not shouts:
logger.warning(f"Bulk indexing skipped: available={self.available}, shouts_count={len(shouts) if shouts else 0}")
return
start_time = time.time()
logger.info(f"Starting bulk indexing of {len(shouts)} documents")
MAX_TEXT_LENGTH = 4000 # Maximum text length to send in a single request
max_batch_size = MAX_BATCH_SIZE
total_indexed = 0
total_skipped = 0
total_truncated = 0
total_retries = 0
# Group documents by size to process smaller documents in larger batches
small_docs = []
medium_docs = []
large_docs = []
# First pass: prepare all documents and categorize by size
for shout in shouts:
try:
await asyncio.wait_for(
self.client.index(index=self.index_name, id=str(shout.id), body=index_body), timeout=40.0
)
except asyncio.TimeoutError:
logger.error(f"Indexing timeout for shout {shout.id}")
text_fields = []
for field_name in ['title', 'subtitle', 'lead', 'body']:
field_value = getattr(shout, field_name, None)
if field_value and isinstance(field_value, str) and field_value.strip():
text_fields.append(field_value.strip())
# Media field processing remains the same
media = getattr(shout, 'media', None)
if media:
if isinstance(media, str):
try:
media_json = json.loads(media)
if isinstance(media_json, dict):
if 'title' in media_json:
text_fields.append(media_json['title'])
if 'body' in media_json:
text_fields.append(media_json['body'])
except json.JSONDecodeError:
text_fields.append(media)
elif isinstance(media, dict):
if 'title' in media:
text_fields.append(media['title'])
if 'body' in media:
text_fields.append(media['body'])
text = " ".join(text_fields)
if not text.strip():
total_skipped += 1
continue
# Truncate text if it exceeds the maximum length
original_length = len(text)
if original_length > MAX_TEXT_LENGTH:
text = text[:MAX_TEXT_LENGTH]
total_truncated += 1
document = {
"id": str(shout.id),
"text": text
}
# Categorize by size
text_len = len(text)
if text_len > 5000:
large_docs.append(document)
elif text_len > 2000:
medium_docs.append(document)
else:
small_docs.append(document)
total_indexed += 1
except Exception as e:
logger.error(f"Indexing error for shout {shout.id}: {e}")
logger.error(f"Error processing shout {getattr(shout, 'id', 'unknown')} for indexing: {e}")
total_skipped += 1
# Process each category with appropriate batch sizes
logger.info(f"Documents categorized: {len(small_docs)} small, {len(medium_docs)} medium, {len(large_docs)} large")
# Process small documents (larger batches)
if small_docs:
batch_size = min(max_batch_size, 15)
await self._process_document_batches(small_docs, batch_size, "small")
# Process medium documents (medium batches)
if medium_docs:
batch_size = min(max_batch_size, 10)
await self._process_document_batches(medium_docs, batch_size, "medium")
# Process large documents (small batches)
if large_docs:
batch_size = min(max_batch_size, 3)
await self._process_document_batches(large_docs, batch_size, "large")
elapsed = time.time() - start_time
logger.info(f"Bulk indexing completed in {elapsed:.2f}s: {total_indexed} indexed, {total_skipped} skipped, {total_truncated} truncated, {total_retries} retries")
async def _process_document_batches(self, documents, batch_size, size_category):
"""Process document batches with retry logic"""
# Check for possible database corruption before starting
db_error_count = 0
for i in range(0, len(documents), batch_size):
batch = documents[i:i+batch_size]
batch_id = f"{size_category}-{i//batch_size + 1}"
logger.info(f"Processing {size_category} batch {batch_id} of {len(batch)} documents")
retry_count = 0
max_retries = 3
success = False
# Process with retries
while not success and retry_count < max_retries:
try:
logger.info(f"Sending batch {batch_id} of {len(batch)} documents to search service (attempt {retry_count+1})")
response = await self.index_client.post(
"/bulk-index",
json=batch,
timeout=120.0 # Explicit longer timeout for large batches
)
# Handle 422 validation errors - these won't be fixed by retrying
if response.status_code == 422:
error_detail = response.json()
truncated_error = self._truncate_error_detail(error_detail)
logger.error(f"Validation error from search service for batch {batch_id}: {truncated_error}")
break
# Handle 500 server errors - these might be fixed by retrying with smaller batches
elif response.status_code == 500:
db_error_count += 1
# If we've seen multiple 500s, log a critical error
if db_error_count >= 3:
logger.critical(f"Multiple server errors detected (500). The search service may need manual intervention. Stopping batch {batch_id} processing.")
break
# Try again with exponential backoff
if retry_count < max_retries - 1:
retry_count += 1
wait_time = (2 ** retry_count) + (random.random() * 0.5) # Exponential backoff with jitter
await asyncio.sleep(wait_time)
continue
# Final retry, split the batch
elif len(batch) > 1:
mid = len(batch) // 2
await self._process_single_batch(batch[:mid], f"{batch_id}-A")
await self._process_single_batch(batch[mid:], f"{batch_id}-B")
break
else:
# Can't split a single document
break
# Normal success case
response.raise_for_status()
success = True
db_error_count = 0 # Reset error counter on success
except Exception as e:
error_str = str(e).lower()
if "duplicate key" in error_str or "unique constraint" in error_str or "nonetype" in error_str:
db_error_count += 1
if db_error_count >= 2:
logger.critical(f"Potential database corruption detected: {error_str}. The search service may need manual intervention. Stopping batch {batch_id} processing.")
break
if retry_count < max_retries - 1:
retry_count += 1
wait_time = (2 ** retry_count) + (random.random() * 0.5)
await asyncio.sleep(wait_time)
else:
if len(batch) > 1:
mid = len(batch) // 2
await self._process_single_batch(batch[:mid], f"{batch_id}-A")
await self._process_single_batch(batch[mid:], f"{batch_id}-B")
break
async def _process_single_batch(self, documents, batch_id):
"""Process a single batch with maximum reliability"""
max_retries = 3
retry_count = 0
while retry_count < max_retries:
try:
if not documents:
return
response = await self.index_client.post(
"/bulk-index",
json=documents,
timeout=90.0
)
response.raise_for_status()
return # Success, exit the retry loop
except Exception as e:
error_str = str(e).lower()
retry_count += 1
if "dictionary changed size" in error_str or "transaction error" in error_str:
wait_time = (2 ** retry_count) + (random.random() * 0.5)
await asyncio.sleep(wait_time) # Wait for txtai to recover
continue
if retry_count >= max_retries and len(documents) > 1:
for i, doc in enumerate(documents):
try:
resp = await self.index_client.post("/index", json=doc, timeout=30.0)
resp.raise_for_status()
except Exception as e2:
pass
return # Exit after individual processing attempt
def _truncate_error_detail(self, error_detail):
"""Truncate error details for logging"""
truncated_detail = error_detail.copy() if isinstance(error_detail, dict) else error_detail
if isinstance(truncated_detail, dict) and 'detail' in truncated_detail and isinstance(truncated_detail['detail'], list):
for i, item in enumerate(truncated_detail['detail']):
if isinstance(item, dict) and 'input' in item:
if isinstance(item['input'], dict) and any(k in item['input'] for k in ['documents', 'text']):
if 'documents' in item['input'] and isinstance(item['input']['documents'], list):
for j, doc in enumerate(item['input']['documents']):
if 'text' in doc and isinstance(doc['text'], str) and len(doc['text']) > 100:
item['input']['documents'][j]['text'] = f"{doc['text'][:100]}... [truncated, total {len(doc['text'])} chars]"
if 'text' in item['input'] and isinstance(item['input']['text'], str) and len(item['input']['text']) > 100:
item['input']['text'] = f"{item['input']['text'][:100]}... [truncated, total {len(item['input']['text'])} chars]"
return truncated_detail
async def search(self, text, limit, offset):
if not SEARCH_ENABLED:
"""Search documents"""
if not self.available:
return []
if not isinstance(text, str) or not text.strip():
return []
logger.info(f"Searching for: '{text}' (limit={limit}, offset={offset})")
# Check if we can serve from cache
if SEARCH_CACHE_ENABLED:
has_cache = await self.cache.has_query(text)
if has_cache:
cached_results = await self.cache.get(text, limit, offset)
if cached_results is not None:
return cached_results
# Not in cache or cache disabled, perform new search
try:
search_limit = limit
search_offset = offset
logger.info(f"Ищем: {text} {offset}+{limit}")
search_body = {
"query": {"multi_match": {"query": text, "fields": ["title", "lead", "subtitle", "body", "media"]}}
}
if self.client:
search_response = self.client.search(
index=self.index_name,
body=search_body,
size=limit,
from_=offset,
_source=False,
_source_excludes=["title", "body", "subtitle", "media", "lead", "_index"],
if SEARCH_CACHE_ENABLED:
search_limit = SEARCH_PREFETCH_SIZE
search_offset = 0
else:
search_limit = limit
search_offset = offset
response = await self.client.post(
"/search",
json={"text": text, "limit": search_limit, "offset": search_offset}
)
hits = search_response["hits"]["hits"]
results = [{"id": hit["_id"], "score": hit["_score"]} for hit in hits]
# если результаты не пустые
if results:
# Кэширование в Redis с TTL
redis_key = f"search:{text}:{offset}+{limit}"
await redis.execute(
"SETEX",
redis_key,
REDIS_TTL,
json.dumps(results, cls=CustomJSONEncoder),
)
return results
return []
response.raise_for_status()
result = response.json()
formatted_results = result.get("results", [])
valid_results = []
for item in formatted_results:
doc_id = item.get("id")
if doc_id and doc_id.isdigit():
valid_results.append(item)
if len(valid_results) != len(formatted_results):
formatted_results = valid_results
if SEARCH_MIN_SCORE > 0:
initial_count = len(formatted_results)
formatted_results = [r for r in formatted_results if r.get("score", 0) >= SEARCH_MIN_SCORE]
if SEARCH_CACHE_ENABLED:
await self.cache.store(text, formatted_results)
end_idx = offset + limit
page_results = formatted_results[offset:end_idx]
return page_results
return formatted_results
except Exception as e:
logger.error(f"Search error for '{text}': {e}", exc_info=True)
return []
async def check_index_status(self):
"""Get detailed statistics about the search index health"""
if not self.available:
return {"status": "disabled"}
try:
response = await self.client.get("/index-status")
response.raise_for_status()
result = response.json()
if result.get("consistency", {}).get("status") != "ok":
null_count = result.get("consistency", {}).get("null_embeddings_count", 0)
if null_count > 0:
logger.warning(f"Found {null_count} documents with NULL embeddings")
return result
except Exception as e:
logger.error(f"Failed to check index status: {e}")
return {"status": "error", "message": str(e)}
# Create the search service singleton
search_service = SearchService()
# API-compatible function to perform a search
async def search_text(text: str, limit: int = 50, offset: int = 0):
payload = []
if search_service.client:
# Использование метода search_post из OpenSearchService
if search_service.available:
payload = await search_service.search(text, limit, offset)
return payload
async def get_search_count(text: str):
"""Get total count of results for a query without fetching all results"""
if search_service.available and SEARCH_CACHE_ENABLED:
if await search_service.cache.has_query(text):
return await search_service.cache.get_total_count(text)
results = await search_text(text, SEARCH_PREFETCH_SIZE, 0)
return len(results)
# Проверить что URL корректный
OPENSEARCH_URL = os.getenv("OPENSEARCH_URL", "rc1a-3n5pi3bhuj9gieel.mdb.yandexcloud.net")
async def initialize_search_index(shouts_data):
"""Initialize search index with existing data during application startup"""
if not SEARCH_ENABLED:
return
if not shouts_data:
return
info = await search_service.info()
if info.get("status") in ["error", "unavailable", "disabled"]:
return
index_stats = info.get("index_stats", {})
indexed_doc_count = index_stats.get("document_count", 0)
index_status = await search_service.check_index_status()
if index_status.get("status") == "inconsistent":
problem_ids = index_status.get("consistency", {}).get("null_embeddings_sample", [])
if problem_ids:
problem_docs = [shout for shout in shouts_data if str(shout.id) in problem_ids]
if problem_docs:
await search_service.bulk_index(problem_docs)
db_ids = [str(shout.id) for shout in shouts_data]
try:
numeric_ids = [int(sid) for sid in db_ids if sid.isdigit()]
if numeric_ids:
min_id = min(numeric_ids)
max_id = max(numeric_ids)
id_range = max_id - min_id + 1
except Exception as e:
pass
if abs(indexed_doc_count - len(shouts_data)) > 10:
doc_ids = [str(shout.id) for shout in shouts_data]
verification = await search_service.verify_docs(doc_ids)
if verification.get("status") == "error":
return
missing_ids = verification.get("missing", [])
if missing_ids:
missing_docs = [shout for shout in shouts_data if str(shout.id) in missing_ids]
await search_service.bulk_index(missing_docs)
else:
pass
try:
test_query = "test"
test_results = await search_text(test_query, 5)
if test_results:
categories = set()
for result in test_results:
result_id = result.get("id")
matching_shouts = [s for s in shouts_data if str(s.id) == result_id]
if matching_shouts and hasattr(matching_shouts[0], 'category'):
categories.add(getattr(matching_shouts[0], 'category', 'unknown'))
except Exception as e:
pass

View File

@@ -1,25 +0,0 @@
import pytest
from typing import Dict
@pytest.fixture
def oauth_settings() -> Dict[str, Dict[str, str]]:
"""Тестовые настройки OAuth"""
return {
"GOOGLE": {"id": "test_google_id", "key": "test_google_secret"},
"GITHUB": {"id": "test_github_id", "key": "test_github_secret"},
"FACEBOOK": {"id": "test_facebook_id", "key": "test_facebook_secret"},
}
@pytest.fixture
def frontend_url() -> str:
"""URL фронтенда для тестов"""
return "https://localhost:3000"
@pytest.fixture(autouse=True)
def mock_settings(monkeypatch, oauth_settings, frontend_url):
"""Подменяем настройки для тестов"""
monkeypatch.setattr("auth.oauth.OAUTH_CLIENTS", oauth_settings)
monkeypatch.setattr("auth.oauth.FRONTEND_URL", frontend_url)

View File

@@ -1,222 +0,0 @@
import pytest
from unittest.mock import AsyncMock, MagicMock, patch
from starlette.responses import JSONResponse, RedirectResponse
from auth.oauth import get_user_profile, oauth_login, oauth_callback
# Подменяем настройки для тестов
with (
patch("auth.oauth.FRONTEND_URL", "https://localhost:3000"),
patch(
"auth.oauth.OAUTH_CLIENTS",
{
"GOOGLE": {"id": "test_google_id", "key": "test_google_secret"},
"GITHUB": {"id": "test_github_id", "key": "test_github_secret"},
"FACEBOOK": {"id": "test_facebook_id", "key": "test_facebook_secret"},
},
),
):
@pytest.fixture
def mock_request():
"""Фикстура для мока запроса"""
request = MagicMock()
request.session = {}
request.path_params = {}
request.query_params = {}
return request
@pytest.fixture
def mock_oauth_client():
"""Фикстура для мока OAuth клиента"""
client = AsyncMock()
client.authorize_redirect = AsyncMock()
client.authorize_access_token = AsyncMock()
client.get = AsyncMock()
return client
@pytest.mark.asyncio
async def test_get_user_profile_google():
"""Тест получения профиля из Google"""
client = AsyncMock()
token = {
"userinfo": {
"sub": "123",
"email": "test@gmail.com",
"name": "Test User",
"picture": "https://lh3.googleusercontent.com/photo=s96",
}
}
profile = await get_user_profile("google", client, token)
assert profile["id"] == "123"
assert profile["email"] == "test@gmail.com"
assert profile["name"] == "Test User"
assert profile["picture"] == "https://lh3.googleusercontent.com/photo=s600"
@pytest.mark.asyncio
async def test_get_user_profile_github():
"""Тест получения профиля из GitHub"""
client = AsyncMock()
client.get.side_effect = [
MagicMock(
json=lambda: {
"id": 456,
"login": "testuser",
"name": "Test User",
"avatar_url": "https://github.com/avatar.jpg",
}
),
MagicMock(
json=lambda: [
{"email": "other@github.com", "primary": False},
{"email": "test@github.com", "primary": True},
]
),
]
profile = await get_user_profile("github", client, {})
assert profile["id"] == "456"
assert profile["email"] == "test@github.com"
assert profile["name"] == "Test User"
assert profile["picture"] == "https://github.com/avatar.jpg"
@pytest.mark.asyncio
async def test_get_user_profile_facebook():
"""Тест получения профиля из Facebook"""
client = AsyncMock()
client.get.return_value = MagicMock(
json=lambda: {
"id": "789",
"name": "Test User",
"email": "test@facebook.com",
"picture": {"data": {"url": "https://facebook.com/photo.jpg"}},
}
)
profile = await get_user_profile("facebook", client, {})
assert profile["id"] == "789"
assert profile["email"] == "test@facebook.com"
assert profile["name"] == "Test User"
assert profile["picture"] == "https://facebook.com/photo.jpg"
@pytest.mark.asyncio
async def test_oauth_login_success(mock_request, mock_oauth_client):
"""Тест успешного начала OAuth авторизации"""
mock_request.path_params["provider"] = "google"
# Настраиваем мок для authorize_redirect
redirect_response = RedirectResponse(url="http://example.com")
mock_oauth_client.authorize_redirect.return_value = redirect_response
with patch("auth.oauth.oauth.create_client", return_value=mock_oauth_client):
response = await oauth_login(mock_request)
assert isinstance(response, RedirectResponse)
assert mock_request.session["provider"] == "google"
assert "code_verifier" in mock_request.session
assert "state" in mock_request.session
mock_oauth_client.authorize_redirect.assert_called_once()
@pytest.mark.asyncio
async def test_oauth_login_invalid_provider(mock_request):
"""Тест с неправильным провайдером"""
mock_request.path_params["provider"] = "invalid"
response = await oauth_login(mock_request)
assert isinstance(response, JSONResponse)
assert response.status_code == 400
assert "Invalid provider" in response.body.decode()
@pytest.mark.asyncio
async def test_oauth_callback_success(mock_request, mock_oauth_client):
"""Тест успешного OAuth callback"""
mock_request.session = {
"provider": "google",
"code_verifier": "test_verifier",
"state": "test_state",
}
mock_request.query_params["state"] = "test_state"
mock_oauth_client.authorize_access_token.return_value = {
"userinfo": {"sub": "123", "email": "test@gmail.com", "name": "Test User"}
}
with (
patch("auth.oauth.oauth.create_client", return_value=mock_oauth_client),
patch("auth.oauth.local_session") as mock_session,
patch("auth.oauth.TokenStorage.create_session", return_value="test_token"),
):
# Мокаем сессию базы данных
session = MagicMock()
session.query.return_value.filter.return_value.first.return_value = None
mock_session.return_value.__enter__.return_value = session
response = await oauth_callback(mock_request)
assert isinstance(response, RedirectResponse)
assert response.status_code == 307
assert "auth/success" in response.headers["location"]
# Проверяем cookie
cookies = response.headers.getlist("set-cookie")
assert any("session_token=test_token" in cookie for cookie in cookies)
assert any("httponly" in cookie.lower() for cookie in cookies)
assert any("secure" in cookie.lower() for cookie in cookies)
# Проверяем очистку сессии
assert "code_verifier" not in mock_request.session
assert "provider" not in mock_request.session
assert "state" not in mock_request.session
@pytest.mark.asyncio
async def test_oauth_callback_invalid_state(mock_request):
"""Тест с неправильным state параметром"""
mock_request.session = {"provider": "google", "state": "correct_state"}
mock_request.query_params["state"] = "wrong_state"
response = await oauth_callback(mock_request)
assert isinstance(response, JSONResponse)
assert response.status_code == 400
assert "Invalid state" in response.body.decode()
@pytest.mark.asyncio
async def test_oauth_callback_existing_user(mock_request, mock_oauth_client):
"""Тест OAuth callback с существующим пользователем"""
mock_request.session = {
"provider": "google",
"code_verifier": "test_verifier",
"state": "test_state",
}
mock_request.query_params["state"] = "test_state"
mock_oauth_client.authorize_access_token.return_value = {
"userinfo": {"sub": "123", "email": "test@gmail.com", "name": "Test User"}
}
with (
patch("auth.oauth.oauth.create_client", return_value=mock_oauth_client),
patch("auth.oauth.local_session") as mock_session,
patch("auth.oauth.TokenStorage.create_session", return_value="test_token"),
):
# Мокаем существующего пользователя
existing_user = MagicMock()
session = MagicMock()
session.query.return_value.filter.return_value.first.return_value = existing_user
mock_session.return_value.__enter__.return_value = session
response = await oauth_callback(mock_request)
assert isinstance(response, RedirectResponse)
assert response.status_code == 307
# Проверяем обновление существующего пользователя
assert existing_user.name == "Test User"
assert existing_user.oauth == "google:123"
assert existing_user.email_verified is True

View File

@@ -1,9 +0,0 @@
"""Тестовые настройки для OAuth"""
FRONTEND_URL = "https://localhost:3000"
OAUTH_CLIENTS = {
"GOOGLE": {"id": "test_google_id", "key": "test_google_secret"},
"GITHUB": {"id": "test_github_id", "key": "test_github_secret"},
"FACEBOOK": {"id": "test_facebook_id", "key": "test_facebook_secret"},
}

View File

@@ -1,7 +1,17 @@
import asyncio
import os
import pytest
from sqlalchemy import create_engine
from sqlalchemy.orm import Session
from starlette.testclient import TestClient
from main import app
from services.db import Base
from services.redis import redis
from tests.test_config import get_test_client
# Use SQLite for testing
TEST_DB_URL = "sqlite:///test.db"
@pytest.fixture(scope="session")
@@ -13,36 +23,38 @@ def event_loop():
@pytest.fixture(scope="session")
def test_app():
"""Create a test client and session factory."""
client, SessionLocal = get_test_client()
return client, SessionLocal
def test_engine():
"""Create a test database engine."""
engine = create_engine(TEST_DB_URL)
Base.metadata.create_all(engine)
yield engine
Base.metadata.drop_all(engine)
os.remove("test.db")
@pytest.fixture
def db_session(test_app):
def db_session(test_engine):
"""Create a new database session for a test."""
_, SessionLocal = test_app
session = SessionLocal()
connection = test_engine.connect()
transaction = connection.begin()
session = Session(bind=connection)
yield session
session.rollback()
session.close()
@pytest.fixture
def test_client(test_app):
"""Get the test client."""
client, _ = test_app
return client
transaction.rollback()
connection.close()
@pytest.fixture
async def redis_client():
"""Create a test Redis client."""
await redis.connect()
await redis.flushall() # Очищаем Redis перед каждым тестом
yield redis
await redis.flushall() # Очищаем после теста
await redis.disconnect()
@pytest.fixture
def test_client():
"""Create a TestClient instance."""
return TestClient(app)

View File

@@ -1,67 +0,0 @@
"""
Конфигурация для тестов
"""
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy.pool import StaticPool
from starlette.applications import Starlette
from starlette.middleware import Middleware
from starlette.middleware.base import BaseHTTPMiddleware
from starlette.testclient import TestClient
# Используем in-memory SQLite для тестов
TEST_DB_URL = "sqlite:///:memory:"
class DatabaseMiddleware(BaseHTTPMiddleware):
"""Middleware для внедрения сессии БД"""
def __init__(self, app, session_maker):
super().__init__(app)
self.session_maker = session_maker
async def dispatch(self, request, call_next):
session = self.session_maker()
request.state.db = session
try:
response = await call_next(request)
finally:
session.close()
return response
def create_test_app():
"""Create a test Starlette application."""
from services.db import Base
# Создаем движок и таблицы
engine = create_engine(
TEST_DB_URL,
connect_args={"check_same_thread": False},
poolclass=StaticPool,
echo=False,
)
Base.metadata.drop_all(bind=engine)
Base.metadata.create_all(bind=engine)
# Создаем фабрику сессий
SessionLocal = sessionmaker(bind=engine)
# Создаем middleware для сессий
middleware = [Middleware(DatabaseMiddleware, session_maker=SessionLocal)]
# Создаем тестовое приложение
app = Starlette(
debug=True,
middleware=middleware,
routes=[], # Здесь можно добавить тестовые маршруты если нужно
)
return app, SessionLocal
def get_test_client():
"""Get a test client with initialized database."""
app, SessionLocal = create_test_app()
return TestClient(app), SessionLocal

View File

@@ -53,11 +53,7 @@ async def test_create_reaction(test_client, db_session, test_setup):
}
""",
"variables": {
"reaction": {
"shout": test_setup["shout"].id,
"kind": ReactionKind.LIKE.value,
"body": "Great post!",
}
"reaction": {"shout": test_setup["shout"].id, "kind": ReactionKind.LIKE.value, "body": "Great post!"}
},
},
)

70
tests/test_validations.py Normal file
View File

@@ -0,0 +1,70 @@
from datetime import datetime, timedelta
import pytest
from pydantic import ValidationError
from auth.validations import (
AuthInput,
AuthResponse,
TokenPayload,
UserRegistrationInput,
)
class TestAuthValidations:
def test_auth_input(self):
"""Test basic auth input validation"""
# Valid case
auth = AuthInput(user_id="123", username="testuser", token="1234567890abcdef1234567890abcdef")
assert auth.user_id == "123"
assert auth.username == "testuser"
# Invalid cases
with pytest.raises(ValidationError):
AuthInput(user_id="", username="test", token="x" * 32)
with pytest.raises(ValidationError):
AuthInput(user_id="123", username="t", token="x" * 32)
def test_user_registration(self):
"""Test user registration validation"""
# Valid case
user = UserRegistrationInput(email="test@example.com", password="SecurePass123!", name="Test User")
assert user.email == "test@example.com"
assert user.name == "Test User"
# Test email validation
with pytest.raises(ValidationError) as exc:
UserRegistrationInput(email="invalid-email", password="SecurePass123!", name="Test")
assert "Invalid email format" in str(exc.value)
# Test password validation
with pytest.raises(ValidationError) as exc:
UserRegistrationInput(email="test@example.com", password="weak", name="Test")
assert "String should have at least 8 characters" in str(exc.value)
def test_token_payload(self):
"""Test token payload validation"""
now = datetime.utcnow()
exp = now + timedelta(hours=1)
payload = TokenPayload(user_id="123", username="testuser", exp=exp, iat=now)
assert payload.user_id == "123"
assert payload.username == "testuser"
assert payload.scopes == [] # Default empty list
def test_auth_response(self):
"""Test auth response validation"""
# Success case
success_resp = AuthResponse(success=True, token="valid_token", user={"id": "123", "name": "Test"})
assert success_resp.success is True
assert success_resp.token == "valid_token"
# Error case
error_resp = AuthResponse(success=False, error="Invalid credentials")
assert error_resp.success is False
assert error_resp.error == "Invalid credentials"
# Invalid case - отсутствует обязательное поле token при success=True
with pytest.raises(ValidationError):
AuthResponse(success=True, user={"id": "123", "name": "Test"})

View File

@@ -1 +0,0 @@

View File

@@ -1,38 +0,0 @@
"""
Модуль для обработки HTML-фрагментов
"""
def wrap_html_fragment(fragment: str) -> str:
"""
Оборачивает HTML-фрагмент в полную HTML-структуру для корректной обработки.
Args:
fragment: HTML-фрагмент для обработки
Returns:
str: Полный HTML-документ
Example:
>>> wrap_html_fragment("<p>Текст параграфа</p>")
'<!DOCTYPE html><html><head><meta charset="utf-8"></head><body><p>Текст параграфа</p></body></html>'
"""
if not fragment or not fragment.strip():
return fragment
# Проверяем, является ли контент полным HTML-документом
is_full_html = fragment.strip().startswith('<!DOCTYPE') or fragment.strip().startswith('<html')
# Если это фрагмент, оборачиваем его в полный HTML-документ
if not is_full_html:
return f"""<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title></title>
</head>
<body>
{fragment}
</body>
</html>"""
return fragment